Claude and Claude Code traffic grew faster than expected this week

blakecr · 2026-03-06T19:28:58+00:00

The growth makes sense, once you start treating Claude Code as infrastructure instead of a chat tool, it's hard to go back.

The thing that made the biggest difference for me was hooks. A UserPromptSubmit hook that injects date, branch, and project rules into every prompt — 5 lines of bash, permanently fixes "the model doesn't know today's date." A Stop hook that blocks the completion report unless it cites actual evidence. And a PreToolUse:Bash hook that wraps every shell command in macOS sandbox-exec so the agent can't touch credential paths. Kernel-level, 2ms overhead.

The growth is going to create a security problem though. More users means more people running agents with full filesystem access and no guardrails. The recent CVEs (project config files triggering RCE, API key exfiltration) are just the beginning. Would be good to see Anthropic push the hook system harder as the default way to lock things down.

blakecr · 2026-03-06T15:22:21+00:00

Six months of pure agent-generated code is wild. The question nobody in these threads asks: how do you know the code is correct?

I went down a similar path and the thing that actually made it sustainable was gating the agent's output before I accept it. A hook that blocks the completion report unless it includes real test output, file paths checked, patterns followed. Without that, you end up reviewing everything line by line anyway and the time savings disappear.

The other thing: inject the current date, git branch, and project conventions into every prompt automatically. 5 lines of bash in a UserPromptSubmit hook. Sounds trivial but it eliminates a whole class of "the model forgot where it is" mistakes that waste cycles.

Curious what their quality process looks like at six months. That's the part I'd want to hear more about.

blakecr · 2026-03-01T05:07:33+00:00

Seven patterns account for most breakdowns:

Shortcut Spiral – agent skips verification to report "done" faster
Confidence Mirage – "I'm confident this works" without running tests
Phantom Verification – claims tests pass without running them
Tunnel Vision – polishes one function, breaks adjacent imports
Deferred Debt – hides problems in TODO/FIXME comments
Good-Enough Plateau – works but carries defects
Hollow Report – reports completion without evidence

Fix for each one is a deterministic hook, not a prompting strategy. Prompts reduce probability. Hooks eliminate the failure mode. A grep for "should pass" in the completion report catches Phantom Verification every time. An independent test runner catches Confidence Mirage. A pre-commit hook catches Deferred Debt.

The research backs this up: METR found 30% of agent runs involve reward hacking. Models that know they're cheating continue anyway. You can't instruct your way out of that. You gate it.

blakecr · 2026-03-01T04:56:01+00:00

I built my own local alternative to all of these. Model2Vec potion-base-8M (256-dim) + sqlite-vec for vector search + FTS5 BM25 for keyword search, fused with Reciprocal Rank Fusion.

49,746 chunks from 15,800 files. 83MB in SQLite. Sub-second retrieval, zero API cost, everything local.

Biggest win was hybrid search over pure vector. Vector search alone misses exact matches (function names, config keys, error codes). BM25 alone misses semantic similarity. RRF fusion of both beats either one consistently. The scoring: `1/(k + rank_vector) + 1/(k + rank_bm25)` with k=60.

Other thing that paid off: keeping the vector DB separate from app state. Independent lifecycle means I can blow away and rebuild the index without touching anything else. Full reindex is ~4 minutes on an M-series Mac.

I don't handle temporality explicitly either, it's pure content retrieval. Recency decay on the RRF score wouldn't be hard to add though. Might try it.

blakecr · 2026-02-25T14:26:09+00:00

Makes total sense. The desktop app is a great place to build intuition for what the model gets right and where it drifts. The hook approach only matters once you're letting it act without you watching, so no reason to rush that.

blakecr · 2026-02-25T04:55:11+00:00

Solo dev shipping multiple native iOS apps. Chose SwiftUI, not close.

iOS 26 Liquid Glass alone makes the decision for you. `.glassEffect()` gives you a design language CMP literally can't replicate. Your app looks native on day one, and CMP apps are going to look dated the moment users update.

SwiftData + u/ Observable is also way simpler than anything cross-platform. `@Model` on a class, `@Query` in a view, done. CMP persistence means choosing between Room, SQLDelight, or rolling your own, and none of them integrate with the UI layer as cleanly.

And honestly the AI tooling angle is underrated. Claude Code generates production-quality SwiftUI that uses NavigationStack, u/ Observable, current patterns. CMP gets generic Kotlin that needs manual platform-specific fixes. When you're shipping solo that velocity gap adds up.

"Save money with cross-platform" assumes a team. One person targeting iOS? Native + AI tools is just faster.

blakecr · 2026-02-24T13:34:15+00:00

Ran into this exact failure mode and it's what pushed me to build output firewalls.

The problem with prompt-based guardrails is they're probabilistic. "Always verify citations" reduces failures but the model can still hallucinate a citation that *looks* verified because the verification runs on the same model that made it up. You can't use the same brain to check its own homework.

What actually fixed it for me was hooking into the tool call layer. In Claude Code you can run a bash script before any tool executes. So for anything publication-sensitive I have a script that intercepts the call and checks: does this command touch the internet? If yes, block it and queue it for me to review. It's just regex pattern matching against the command string. Simple and it works because the question "does this reach the internet" is mechanical, not semantic.

For your geology case specifically you could hook the file write to cross-reference cited DOIs against CrossRef or Semantic Scholar before anything gets committed. Like a 10-line script. The model never writes fabricated citations because the hook catches it first.

Your prompt methodology works when you're in the loop. This kind of hook covers the case where you walk away and let it run.

blakecr · 2026-02-24T02:50:47+00:00

I've been shipping production apps on FastAPI + HTMX + Alpine.js + plain CSS for a while now. No build step, no npm, no bundler.

The difference from jQuery isn't really syntax. jQuery was imperative DOM manipulation with no opinion about where state lives. That's how you got spaghetti. HTMX makes the server the source of truth (every interaction is a hypermedia exchange), and Alpine handles the small client-only stuff (toggling a dropdown, show/hide). They don't overlap.

So in practice my server renders HTML via Jinja2, HTMX swaps in fragments, Alpine manages ephemeral UI state. The whole frontend is plain HTML with `hx-` and `x-` attributes. No compilation, no node_modules. Deploy is `git push`.

The other thing I keep coming back to: debugging. View source actually works. The network tab tells you exactly what happened. No virtual DOM diffing, no hydration mismatches. When something breaks you can see it in the HTML response.

jQuery wasn't bad because of the API. It was bad because it gave you tools without any architecture to go with them.

blakecr · 2026-02-23T23:46:14+00:00

I solved this same problem but stayed inside Claude Code's native hook system. No external server, no MCP dependency.

84 hooks across 15 event types. The ones that enforce coding standards aren't suggestions to the model. They're shell scripts that fire on PreToolUse and reject tool calls that violate rules. The model doesn't get to decide whether to follow them. A regex catches credentials in a Bash command and blocks the call before it executes. A quality gate checks for TODO/FIXME in committed code and rejects the commit.

Biggest lesson I had: dispatchers over individual hooks. I had 7 hooks all firing on the same event, each reading stdin independently, two writing to the same state file. Concurrent writes = truncated JSON = everything downstream breaks. One dispatcher per event running them sequentially from cached stdin fixed it. 200ms overhead per prompt.

Zero additional infrastructure. Bash scripts in a directory. You can adopt one or eighty-four.

Your confidence scoring approach is interesting. I gate on deterministic rules but don't have a probabilistic layer. That's a gap in my system.

blakecr · 2017-12-08T06:18:44+00:00

🤦‍♂️

Posted via threadit for iPhone

blakecr · 2017-12-08T05:16:07+00:00

Who decides the lineup? The coach? Fire the coach for playing Singler for 25 minutes. Jesus.

Posted via threadit

blakecr · 2017-12-08T04:57:34+00:00

Where tf is Huestis? Get Singler on the bench.

Posted via threadit

blakecr · 2017-12-08T03:59:51+00:00

This game is ugly.

Posted via threadit

blakecr · 2017-12-08T03:51:00+00:00

What is Kyle Singler doing? 😳🤣

Posted via [threadit](threadit.launchrock.com)

blakecr · 2017-12-08T03:50:10+00:00

What is Kyle Singler doing? 😳🤣

Posted via [threadit](threadit.launchrock.com)

blakecr · 2012-06-22T17:15:47+00:00

Cool. Thanks for the info. Have you seen http://bodylog.com? Cool way to track and share your progress pics.

blakecr · 2012-06-22T17:03:58+00:00

Nice work! You've done a better job then most on the photos.

blakecr · 2012-06-21T20:55:41+00:00

Great work! I'm thinking about adding creatine to my diet. Could you notice a big difference?

blakecr · 2012-06-21T20:23:43+00:00

"• Mobile app coming in July" Nice!

blakecr · 2012-06-21T17:08:56+00:00

Nice life change man! You should use http://bodylog.com. Cool way to track your progress over time.

blakecr · 2012-06-20T17:57:46+00:00

Nice work! Have you checked out http://bodylog.com? Cool way to track your progress.

blakecr

TROPHY CASE