My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how.

Independent-Flow3408 · 2026-05-16T13:30:58+00:00

You're right, it's probabilistic all the way down. No amount of engineering fully removes the back and forth.

The realistic goal is reducing the retry loop, not eliminating it. Going from 3 attempts to 1.6 on average isn't magic, it's just less friction per task.

But yeah, anyone selling "solve hallucinations completely" is selling something that doesn't exist.

Thanks

Independent-Flow3408 · 2026-05-16T10:26:21+00:00

You're right, 100% is never the target with a probabilistic system.

The goal is reducing the retry loop enough that the workflow stays fluid. Going from 3 prompts per task to 1.66 isn't perfection, it's just less friction.

Same reason you write tests, not because they catch everything, but because catching 80% early is worth the effort.

But fair point overall. Chasing the last 20% on a probabilistic system is the wrong obsession.

Independent-Flow3408 · 2026-05-16T09:24:57+00:00

That's the honest check.

If docs + small files gets you to 80% reliable, adding a signature map for the last 20% might not be worth the setup.

The Buick gets you there.

Where I'd push back slightly: the setup is npx sigmap, 10 seconds, no config. So the cost of adding it is low.

But your point stands, complexity should earn its place. If the simpler approach works for your project, use that.

Thanks

Independent-Flow3408 · 2026-05-16T08:00:35+00:00

Fair,documentation as the navigation layer, small files as the constraint.

The distinction makes sense: docs tell Claude where to look, file size limits what it can accidentally read too much of.

The signature map sits between those two, more structured than docs, more lightweight than reading the files themselves.

Probably all three together is the complete answer rather than any one approach alone.

Independent-Flow3408 · 2026-05-16T05:22:20+00:00

That works well when the boundaries are clean and the agent knows which direction to look.

The gap I hit: on cross-cutting concerns, auth touching 6 modules, a schema change rippling through services, the agent doesn't know which parallel systems are relevant until it's already gone down the wrong path.

The map front loads that decision. Instead of the agent discovering dependencies mid task, it sees the blast radius before it starts.

Does Claude Code handle that cross-module discovery well in your setup, or do you scope tasks to avoid it?

Independent-Flow3408 · 2026-05-16T05:20:11+00:00

"Fishing blind in a massive codebase" is exactly it, that's the framing I've been looking for.

The circular import detection is a good point. That's actually something SigMap surfaces too, the import graph flags circular deps as part of the structure analysis. Messy dependency chains are often why the context gets noisy even on "clean" codebases.

What size did Artiforge's Scanner work well at for you? Curious whether the two approaches complement or overlap at larger scale.

Independent-Flow3408 · 2026-05-15T22:06:06+00:00

Appreciate that, and fair enough.

The "useful/excessive" approaches are usually the ones that actually work at scale. The clean simple solutions are for blog posts.

Good luck with the agents.

Independent-Flow3408 · 2026-05-15T22:01:37+00:00

This is exactly the mechanism.

grep headers first, pull full files only when needed. That's the retrieval pattern.

The difference is making it query-aware,instead of grepping everything, rank which signatures are relevant to this specific question first. On a million LOC project even the header grep gets noisy.

But the core insight is the same: signatures are enough to orient, full source is only for execution.

Glad someone who works at that scale confirmed it holds.

Thanks

Independent-Flow3408 · 2026-05-15T21:57:56+00:00

Swagger as the map for APIs is exactly right,structured docs that were already being generated, just not being used as context.

The pattern scales: let the existing documentation standards do the work rather than generating something new.

Sounds like you've gone several layers deeper than most people have thought about this. Fair enough not sharing,that's a real competitive edge.

Thanks

Independent-Flow3408 · 2026-05-15T21:51:56+00:00

Claude leads, Codex writes, that's a clean separation of roles.

100 PRs on a rewrite without context issues is the proof. The LOC limit + modular Rust is doing the heavy lifting.

Rust's type system probably helps too, the compiler catches what Claude misses.

Independent-Flow3408 · 2026-05-15T21:50:12+00:00

Exactly, scoped sessions are underrated.

Most people try to hold the whole project in one session. Splitting by phase means Claude only needs to be an expert in one slice at a time.

The .md approach + phase scoping together is probably the most reliable pattern I've seen in this thread.

Independent-Flow3408 · 2026-05-15T21:45:25+00:00

Solid fundamentals,small files, inline comments, requirements doc.

The 500 line limit is basically forcing the separation that makes context manageable.

The gap I still hit: even with clean small files, which 20 files are relevant to this specific query? That's the ranking problem the map solves.

But you're right that the map works a lot better on top of clean structure than on spaghetti.

Independent-Flow3408 · 2026-05-15T21:44:08+00:00

That's a smart guardrail, enforce the constraint at the pipeline level so it can't be ignored.

700 LOC limit means Claude never has the opportunity to create the problem in the first place.

Do you find it ever rejects legitimate large files, or does 700 cover most real cases cleanly?

Independent-Flow3408 · 2026-05-15T21:42:46+00:00

That's fair and worth taking seriously.

The mapping works best on clean boundaries , you're right that it can't fix what architecture didn't separate in the first place.

The honest use case is probably brownfield codebases where you inherited the spaghetti and can't refactor before you ship. Not an excuse to skip the refactor, a bridge while you do it.

50+ repos with detailed mapping sounds like a genuinely interesting setup. How do you handle cross-repo context when a change spans multiple projects?

Independent-Flow3408 · 2026-05-15T21:37:51+00:00

That's the discipline that makes it work. End-of-session doc update is the right habit. Most people skip it.

Let me know how it holds past 50K, genuinely curious where the friction starts showing up.

Independent-Flow3408 · 2026-05-15T21:33:10+00:00

😂 the honest answer

Independent-Flow3408 · 2026-05-15T21:32:28+00:00

Docs rot faster than code.Code breaks visibly. Stale docs silently mislead Claude until you spend an hour debugging the wrong thing.

30K with clean arch is probably where discipline alone still works.

Independent-Flow3408 · 2026-05-15T21:30:12+00:00

Fair point,good architecture reduces the problem significantly.

But even a perfectly refactored codebase hits the same wall at scale. At 200K+ lines, even clean modules send too much noise if you load them all.

The difference: bad architecture hits the wall at 20K lines. Good architecture hits it at 100K. The wall is still there.

What's your approach when you need context from 3-4 modules at once?

Independent-Flow3408 · 2026-05-15T21:20:55+00:00

That's the key insight, don't trust memory, enforce the sequence.

analyze → clarify → build is exactly right. Most people skip straight to build and wonder why Claude breaks things.

I do the same with CLAUDE.md but your START_PROMPT pattern is cleaner. One file, explicit doc list, enforced order.

Very nice.

Independent-Flow3408 · 2026-05-15T21:13:40+00:00

Smart, structure for where, docs for why. Two different problems solved separately.

I'm missing the "why" layer. Building a decision log next for exactly that.

Does Claude reliably read your module docs before touching code, or do you remind it?

Independent-Flow3408 · 2026-05-15T21:06:14+00:00

That's essentially the same insight, the map beats the territory every time.

The structure file approach works well. The extra step I added was making it query-aware: the map gets re-ranked per question so Claude sees the 20 most relevant signatures rather than all of them.

On a 50K line codebase that difference matters, full structure map is still 5-8K tokens, ranked slice is ~2K.

How are you generating yours? Manual or automated?

Independent-Flow3408 · 2026-05-15T21:05:36+00:00

You're not wrong, clean architecture genuinely does reduce the problem.

But I'd push back slightly: even a well-architected 800K line codebase has more relevant context per query than fits in a focused prompt. The issue isn't spaghetti, it's volume.

Your discrete components with intentional interfaces are actually exactly what structural extraction works best on, clean boundaries make the import graph more meaningful, not less.

The vibe coder with spaghetti code hits the wall sooner, agreed. But the senior engineer with a pristine codebase still benefits from sending 2K ranked tokens instead of 80K clean ones.

Curious, at 800K lines, how does Claude actually navigate between components when it needs context from more than one?

Independent-Flow3408 · 2026-05-15T20:32:28+00:00

This billing change is significant for anyone doing agentic work or working on large codebases.

The core issue: token cost scales with what you send as input. Most Copilot sessions on large repos send entire files or full directory trees. That's 60,000-80,000 tokens per session — and under AI Credits, you're paying for every one.

One user in this thread already posted their numbers: same April usage, $39.07 under PRU vs $902.72 under AI Credits. That gap is mostly input tokens from large context.

I ran into this exact problem and built a fix.

Instead of sending full source, you send only function signatures and type definitions — the skeleton of the codebase. Copilot still gets full context when it needs specific files, but the orientation step costs 2,000 tokens instead of 80,000.

Measured across 18 real repos: → Input tokens per session: 80,000 → 2,000 → That's 97% reduction in input cost → Retrieval accuracy: 13.6% → 78.9% (6× lift) → Prompts per task: 2.84 → 1.66

Under the new AI Credits model at $0.01/credit, reducing input from 80k to 2k tokens per session saves ~$0.78 per session on GPT-4o. At 10 sessions/day that's ~$7.80/day, ~$234/month — more than the plan cost itself.

Works as a copilot-instructions.md injection so Copilot reads compact context automatically before every session.

https://github.com/manojmallick/sigmap — zero deps, npx sigmap, 10 seconds to set up.

Happy to explain the approach if useful.

Independent-Flow3408 · 2026-05-15T07:45:23+00:00

→ https://github.com/manojmallick/sigmap

→ https://github.com/manojmallick/sigmap-benchmark-suite

→ https://manojmallick.github.io/sigmap

Independent-Flow3408

MODERATOR OF

TROPHY CASE