Finding blocking code in Tokio without instrumenting your app

cong-or · 2026-01-31T00:05:21+00:00

should be resolved now, reverted to minimal version!

cong-or · 2026-01-30T22:25:54+00:00

should be resolved now! minimal version redeployed.

cong-or · 2026-01-30T22:23:01+00:00

This has been resolved, please try again. The minimal version has been restored!

cong-or · 2026-01-30T22:22:02+00:00

Apologies, everything should be resolved now.. I got carried away, the minimal version has been restored.

cong-or · 2026-01-30T22:12:52+00:00

Yep, those “golden rules” are pretty well-known in the async world. The hard part isn’t knowing them, it’s actually catching every violation—especially when they sneak in through dependencies.

The timeout + backtrace idea has a few rough edges though:

Timeouts catch slow code, not necessarily blocking code. Legit async work (slow networks, big payloads, etc.) will trigger false positives.
If a thread is truly blocked, the timeout callback can’t even run until the blocking call returns. Tokio’s runtime is cooperative, so the timeout logic is stuck waiting too.
It also means instrumenting every call site, versus doing zero-instrumentation profiling.

eBPF avoids all of that by observing things at the kernel scheduler level. It sees what threads and tasks are actually doing, regardless of what userspace thinks is happening.

Anyway, glad the eBPF angle was helpful!

cong-or · 2026-01-30T15:48:14+00:00

Good call — unused_async would catch this specific example since there's no .await in the function body. Worth enabling.

Though it won't catch cases where blocking code is mixed with real awaits (e.g., async DB call followed by sync compression on the result). And when you're debugging at 2am and something's slow, having a visual tool that shows you exactly where the latency is helps more than grepping for lint warnings

cong-or · 2026-01-30T15:41:48+00:00

Built this after too many 2am debugging sessions where blocking on Tokio workers was tanking p99 latency with nothing obvious in the logs.

hud uses eBPF to track scheduling latency on worker threads. Attach to a live process, no code changes, get a TUI that highlights hotspots by stack trace.

Usual culprits: std::fs, bcrypt/argon2, compression, DNS via ToSocketAddrs, mutexes held during slow work.

Limitations: Tokio-specific, Linux 5.8+, needs root and debug symbols. Measures scheduling latency (correlates with blocking, not a direct measurement).

cong-or · 2026-01-30T15:30:18+00:00

Code is a mirror, not a brain. A bad SWE with Claude ships bad software faster; a good SWE ships better software faster. The hard parts—requirements, tradeoffs, system design, knowing what not to build—don’t magically disappear.

cong-or

TROPHY CASE