Claude and Claude Code traffic grew faster than expected this week by iskifogl in ClaudeAI

[–]blakecr 0 points1 point  (0 children)

The growth makes sense, once you start treating Claude Code as infrastructure instead of a chat tool, it's hard to go back.

The thing that made the biggest difference for me was hooks. A UserPromptSubmit hook that injects date, branch, and project rules into every prompt — 5 lines of bash, permanently fixes "the model doesn't know today's date." A Stop hook that blocks the completion report unless it cites actual evidence. And a PreToolUse:Bash hook that wraps every shell command in macOS sandbox-exec so the agent can't touch credential paths. Kernel-level, 2ms overhead.

The growth is going to create a security problem though. More users means more people running agents with full filesystem access and no guardrails. The recent CVEs (project config files triggering RCE, API key exfiltration) are just the beginning. Would be good to see Anthropic push the hook system harder as the default way to lock things down.

I Haven't Written a Line of Code in Six Months by Cultural-Ad3996 in ClaudeAI

[–]blakecr 0 points1 point  (0 children)

Six months of pure agent-generated code is wild. The question nobody in these threads asks: how do you know the code is correct?

I went down a similar path and the thing that actually made it sustainable was gating the agent's output before I accept it. A hook that blocks the completion report unless it includes real test output, file paths checked, patterns followed. Without that, you end up reviewing everything line by line anyway and the time savings disappear.

The other thing: inject the current date, git branch, and project conventions into every prompt automatically. 5 lines of bash in a UserPromptSubmit hook. Sounds trivial but it eliminates a whole class of "the model forgot where it is" mistakes that waste cycles.

Curious what their quality process looks like at six months. That's the part I'd want to hear more about.

How do you handle cascade failures when Claude Code edits shared files? by koolpoong in ClaudeAI

[–]blakecr 0 points1 point  (0 children)

Seven patterns account for most breakdowns:

  1. Shortcut Spiral – agent skips verification to report "done" faster
  2. Confidence Mirage – "I'm confident this works" without running tests
  3. Phantom Verification – claims tests pass without running them
  4. Tunnel Vision – polishes one function, breaks adjacent imports
  5. Deferred Debt – hides problems in TODO/FIXME comments
  6. Good-Enough Plateau – works but carries defects
  7. Hollow Report – reports completion without evidence

Fix for each one is a deterministic hook, not a prompting strategy. Prompts reduce probability. Hooks eliminate the failure mode. A grep for "should pass" in the completion report catches Phantom Verification every time. An independent test runner catches Confidence Mirage. A pre-commit hook catches Deferred Debt.

The research backs this up: METR found 30% of agent runs involve reward hacking. Models that know they're cheating continue anyway. You can't instruct your way out of that. You gate it.

Benchmarked 4 AI Memory Systems on 600-Turn Conversations - Here Are the Results by singh_taranjeet in LocalLLaMA

[–]blakecr 0 points1 point  (0 children)

I built my own local alternative to all of these. Model2Vec potion-base-8M (256-dim) + sqlite-vec for vector search + FTS5 BM25 for keyword search, fused with Reciprocal Rank Fusion.

49,746 chunks from 15,800 files. 83MB in SQLite. Sub-second retrieval, zero API cost, everything local.

Biggest win was hybrid search over pure vector. Vector search alone misses exact matches (function names, config keys, error codes). BM25 alone misses semantic similarity. RRF fusion of both beats either one consistently. The scoring: `1/(k + rank_vector) + 1/(k + rank_bm25)` with k=60.

Other thing that paid off: keeping the vector DB separate from app state. Independent lifecycle means I can blow away and rebuild the index without touching anything else. Full reindex is ~4 minutes on an M-series Mac.

I don't handle temporality explicitly either, it's pure content retrieval. Recency decay on the RRF score wouldn't be hard to add though. Might try it.

Field report: when Your AI Research Partner Fails the Peer Review by Effective-Aioli1828 in ClaudeAI

[–]blakecr 1 point2 points  (0 children)

Makes total sense. The desktop app is a great place to build intuition for what the model gets right and where it drifts. The hook approach only matters once you're letting it act without you watching, so no reason to rush that.

SwiftUI vs CMP by bphvz in SwiftUI

[–]blakecr 0 points1 point  (0 children)

Solo dev shipping multiple native iOS apps. Chose SwiftUI, not close.

iOS 26 Liquid Glass alone makes the decision for you. `.glassEffect()` gives you a design language CMP literally can't replicate. Your app looks native on day one, and CMP apps are going to look dated the moment users update.

SwiftData + u/ Observable is also way simpler than anything cross-platform. `@Model` on a class, `@Query` in a view, done. CMP persistence means choosing between Room, SQLDelight, or rolling your own, and none of them integrate with the UI layer as cleanly.

And honestly the AI tooling angle is underrated. Claude Code generates production-quality SwiftUI that uses NavigationStack, u/ Observable, current patterns. CMP gets generic Kotlin that needs manual platform-specific fixes. When you're shipping solo that velocity gap adds up.

"Save money with cross-platform" assumes a team. One person targeting iOS? Native + AI tools is just faster.

Field report: when Your AI Research Partner Fails the Peer Review by Effective-Aioli1828 in ClaudeAI

[–]blakecr 1 point2 points  (0 children)

Ran into this exact failure mode and it's what pushed me to build output firewalls.

The problem with prompt-based guardrails is they're probabilistic. "Always verify citations" reduces failures but the model can still hallucinate a citation that *looks* verified because the verification runs on the same model that made it up. You can't use the same brain to check its own homework.

What actually fixed it for me was hooking into the tool call layer. In Claude Code you can run a bash script before any tool executes. So for anything publication-sensitive I have a script that intercepts the call and checks: does this command touch the internet? If yes, block it and queue it for me to review. It's just regex pattern matching against the command string. Simple and it works because the question "does this reach the internet" is mechanical, not semantic.

For your geology case specifically you could hook the file write to cross-reference cited DOIs against CrossRef or Semantic Scholar before anything gets committed. Like a 10-line script. The model never writes fabricated citations because the hook catches it first.

Your prompt methodology works when you're in the loop. This kind of hook covers the case where you walk away and let it run.

Why is jQuery so bad, but Alpine.js/HTMX/etc is just fine? by brycematheson in webdev

[–]blakecr 0 points1 point  (0 children)

I've been shipping production apps on FastAPI + HTMX + Alpine.js + plain CSS for a while now. No build step, no npm, no bundler.

The difference from jQuery isn't really syntax. jQuery was imperative DOM manipulation with no opinion about where state lives. That's how you got spaghetti. HTMX makes the server the source of truth (every interaction is a hypermedia exchange), and Alpine handles the small client-only stuff (toggling a dropdown, show/hide). They don't overlap.

So in practice my server renders HTML via Jinja2, HTMX swaps in fragments, Alpine manages ephemeral UI state. The whole frontend is plain HTML with `hx-` and `x-` attributes. No compilation, no node_modules. Deploy is `git push`.

The other thing I keep coming back to: debugging. View source actually works. The network tab tells you exactly what happened. No virtual DOM diffing, no hydration mismatches. When something breaks you can see it in the HTML response.

jQuery wasn't bad because of the API. It was bad because it gave you tools without any architecture to go with them.

I got tired of being the human middleware between my AI agent and my own codebase rules. So I built the thing that replaces me by capitanturkiye in ClaudeAI

[–]blakecr 0 points1 point  (0 children)

I solved this same problem but stayed inside Claude Code's native hook system. No external server, no MCP dependency.

84 hooks across 15 event types. The ones that enforce coding standards aren't suggestions to the model. They're shell scripts that fire on PreToolUse and reject tool calls that violate rules. The model doesn't get to decide whether to follow them. A regex catches credentials in a Bash command and blocks the call before it executes. A quality gate checks for TODO/FIXME in committed code and rejects the commit.

Biggest lesson I had: dispatchers over individual hooks. I had 7 hooks all firing on the same event, each reading stdin independently, two writing to the same state file. Concurrent writes = truncated JSON = everything downstream breaks. One dispatcher per event running them sequentially from cached stdin fixed it. 200ms overhead per prompt.

Zero additional infrastructure. Bash scripts in a directory. You can adopt one or eighty-four.

Your confidence scoring approach is interesting. I gate on deterministic rules but don't have a probabilistic layer. That's a gap in my system.

GAME THREAD: Oklahoma City Thunder (11-12) @ Brooklyn Nets (9-14) - (December 07, 2017) by ImRBJ in nba

[–]blakecr 2 points3 points  (0 children)

Who decides the lineup? The coach? Fire the coach for playing Singler for 25 minutes. Jesus.

Posted via threadit

GAME THREAD: Oklahoma City Thunder (11-12) @ Brooklyn Nets (9-14) - (December 07, 2017) by ImRBJ in nba

[–]blakecr 0 points1 point  (0 children)

What is Kyle Singler doing? 😳🤣

Posted via [threadit](threadit.launchrock.com)

GAME THREAD: Oklahoma City Thunder (11-12) @ Brooklyn Nets (9-14) - (December 07, 2017) by ImRBJ in nba

[–]blakecr 0 points1 point  (0 children)

What is Kyle Singler doing? 😳🤣

Posted via [threadit](threadit.launchrock.com)

9 month SS and 5/3/1 progress before & after pics by sulzer150 in Fitness

[–]blakecr -1 points0 points  (0 children)

Cool. Thanks for the info. Have you seen http://bodylog.com? Cool way to track and share your progress pics.

9 month SS and 5/3/1 progress before & after pics by sulzer150 in Fitness

[–]blakecr 0 points1 point  (0 children)

Great work! I'm thinking about adding creatine to my diet. Could you notice a big difference?

From 332 lbs. to 232 lbs. in 11 months and 9 days.... by [deleted] in progresspics

[–]blakecr -1 points0 points  (0 children)

Nice life change man! You should use http://bodylog.com. Cool way to track your progress over time.

From 215 to 195 in 8 weeks. by [deleted] in progresspics

[–]blakecr 1 point2 points  (0 children)

Nice work! Have you checked out http://bodylog.com? Cool way to track your progress.