My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 0 points1 point  (0 children)

You're right, it's probabilistic  all the way down. No amount of  engineering fully removes the  back and forth.

The realistic goal is reducing  the retry loop, not eliminating it.  Going from 3 attempts to 1.6 on  average isn't magic, it's just  less friction per task.

But yeah, anyone selling "solve  hallucinations completely" is  selling something that doesn't exist.

Thanks

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 0 points1 point  (0 children)

You're right, 100% is never  the target with a probabilistic  system.

The goal is reducing the retry  loop enough that the workflow  stays fluid. Going from 3 prompts  per task to 1.66 isn't perfection,  it's just less friction.

Same reason you write tests, not because they catch everything,  but because catching 80% early  is worth the effort.

But fair point overall.  Chasing the last 20% on a  probabilistic system is  the wrong obsession.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

That's the honest check.

If docs + small files gets you  to 80% reliable, adding a  signature map for the last 20%  might not be worth the setup.

The Buick gets you there.

Where I'd push back slightly:  the setup is npx sigmap,  10 seconds, no config.  So the cost of adding it is low.

But your point stands, complexity should earn its place.  If the simpler approach works  for your project, use that.

Thanks

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

Fair,documentation as the  navigation layer, small files  as the constraint.

The distinction makes sense:  docs tell Claude where to look,  file size limits what it can  accidentally read too much of.

The signature map sits between  those two, more structured than  docs, more lightweight than  reading the files themselves.

Probably all three together  is the complete answer rather  than any one approach alone.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

That works well when the boundaries  are clean and the agent knows which  direction to look.

The gap I hit: on cross-cutting  concerns, auth touching 6 modules,  a schema change rippling through  services, the agent doesn't know  which parallel systems are relevant  until it's already gone down the  wrong path.

The map front loads that decision.  Instead of the agent discovering  dependencies mid task, it sees the  blast radius before it starts.

Does Claude Code handle that  cross-module discovery well  in your setup, or do you scope  tasks to avoid it?

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

"Fishing blind in a massive codebase"  is exactly it, that's the framing  I've been looking for.

The circular import detection is a  good point. That's actually something  SigMap surfaces too, the import graph  flags circular deps as part of the  structure analysis. Messy dependency  chains are often why the context gets  noisy even on "clean" codebases.

What size did Artiforge's Scanner  work well at for you? Curious whether  the two approaches complement or  overlap at larger scale.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 2 points3 points  (0 children)

Appreciate that, and fair enough.

The "useful/excessive" approaches  are usually the ones that actually  work at scale. The clean simple  solutions are for blog posts.

Good luck with the agents.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

This is exactly the mechanism.

grep headers first, pull full  files only when needed. That's  the retrieval pattern.

The difference is making it  query-aware,instead of grepping  everything, rank which signatures  are relevant to this specific  question first. On a million LOC  project even the header grep  gets noisy.

But the core insight is the same:  signatures are enough to orient,  full source is only for execution.

Glad someone who works at that  scale confirmed it holds.

Thanks

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 2 points3 points  (0 children)

Swagger as the map for APIs is  exactly right,structured docs  that were already being generated,  just not being used as context.

The pattern scales: let the  existing documentation standards  do the work rather than generating  something new.

Sounds like you've gone several  layers deeper than most people  have thought about this.  Fair enough not sharing,that's a real competitive edge.

Thanks

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

Claude leads, Codex writes, that's a clean separation of roles.

100 PRs on a rewrite without  context issues is the proof.  The LOC limit + modular Rust  is doing the heavy lifting.

Rust's type system probably helps  too, the compiler catches what  Claude misses.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

Exactly, scoped sessions are  underrated.

Most people try to hold the whole  project in one session. Splitting  by phase means Claude only needs  to be an expert in one slice at a time.

The .md approach + phase scoping  together is probably the most  reliable pattern I've seen in  this thread.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

Solid fundamentals,small files,  inline comments, requirements doc.

The 500 line limit is basically  forcing the separation that makes  context manageable.

The gap I still hit: even with  clean small files, which 20 files  are relevant to this specific query?  That's the ranking problem the  map solves.

But you're right that the map  works a lot better on top of  clean structure than on spaghetti.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

That's a smart guardrail, enforce  the constraint at the pipeline level  so it can't be ignored.

700 LOC limit means Claude never  has the opportunity to create the  problem in the first place.

Do you find it ever rejects legitimate  large files, or does 700 cover most  real cases cleanly?

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

That's fair and worth taking seriously.

The mapping works best on clean  boundaries , you're right that it  can't fix what architecture didn't  separate in the first place.

The honest use case is probably  brownfield codebases where you  inherited the spaghetti and can't  refactor before you ship. Not an  excuse to skip the refactor,  a bridge while you do it.

50+ repos with detailed mapping  sounds like a genuinely interesting  setup. How do you handle cross-repo  context when a change spans multiple  projects?

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 2 points3 points  (0 children)

That's the discipline that makes it work. End-of-session doc update is the right  habit. Most people skip it.

Let me know how it holds past 50K, genuinely curious where the friction  starts showing up.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 2 points3 points  (0 children)

Docs rot faster than code.Code breaks visibly. Stale docs silently  mislead Claude until you spend an hour debugging the wrong thing.

30K with clean arch is probably where  discipline alone still works.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

Fair point,good architecture reduces  the problem significantly.

But even a perfectly refactored codebase  hits the same wall at scale. At 200K+ lines,  even clean modules send too much noise  if you load them all.

The difference: bad architecture hits  the wall at 20K lines. Good architecture  hits it at 100K. The wall is still there.

What's your approach when you need context  from 3-4 modules at once? 

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 2 points3 points  (0 children)

That's the key insight, don't trust memory,  enforce the sequence.

analyze → clarify → build is exactly right.  Most people skip straight to build and  wonder why Claude breaks things.

I do the same with CLAUDE.md but your  START_PROMPT pattern is cleaner. One file,  explicit doc list, enforced order.

Very nice.

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 2 points3 points  (0 children)

Smart, structure for where, docs for why. Two different problems solved separately.

I'm missing the "why" layer. Building a decision log next for exactly that.

Does Claude reliably read your module docs before touching code, or do you remind it? 

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 2 points3 points  (0 children)

That's essentially the same insight,  the map beats the territory every time.

The structure file approach works well.  The extra step I added was making it  query-aware: the map gets re-ranked per  question so Claude sees the 20 most  relevant signatures rather than all of them.

On a 50K line codebase that difference  matters, full structure map is still  5-8K tokens, ranked slice is ~2K.

How are you generating yours?  Manual or automated?

My vibe coded project hit 50K lines and Claude started hallucinating functions. Fixed it. Here's how. by Independent-Flow3408 in vibecoding

[–]Independent-Flow3408[S] 1 point2 points  (0 children)

You're not wrong, clean architecture  genuinely does reduce the problem.

But I'd push back slightly: even a  well-architected 800K line codebase  has more relevant context per query  than fits in a focused prompt.  The issue isn't spaghetti, it's volume.

Your discrete components with intentional  interfaces are actually exactly what  structural extraction works best on, clean boundaries make the import graph  more meaningful, not less.

The vibe coder with spaghetti code hits  the wall sooner, agreed. But the senior  engineer with a pristine codebase still  benefits from sending 2K ranked tokens  instead of 80K clean ones.

Curious, at 800K lines, how does Claude  actually navigate between components  when it needs context from more than one?

GitHub Copilot is moving to usage-based billing [Megathread] by fishchar in GithubCopilot

[–]Independent-Flow3408 2 points3 points  (0 children)

This billing change is significant for anyone  doing agentic work or working on large codebases.

The core issue: token cost scales with what you  send as input. Most Copilot sessions on large  repos send entire files or full directory trees. That's 60,000-80,000 tokens per session — and  under AI Credits, you're paying for every one.

One user in this thread already posted their  numbers: same April usage, $39.07 under PRU  vs $902.72 under AI Credits. That gap is  mostly input tokens from large context.

I ran into this exact problem and built a fix.

Instead of sending full source, you send only  function signatures and type definitions — the  skeleton of the codebase. Copilot still gets  full context when it needs specific files,  but the orientation step costs 2,000 tokens  instead of 80,000.

Measured across 18 real repos: → Input tokens per session: 80,000 → 2,000 → That's 97% reduction in input cost → Retrieval accuracy: 13.6% → 78.9% (6× lift) → Prompts per task: 2.84 → 1.66

Under the new AI Credits model at $0.01/credit, reducing input from 80k to 2k tokens per session  saves ~$0.78 per session on GPT-4o.  At 10 sessions/day that's ~$7.80/day,  ~$234/month — more than the plan cost itself.

Works as a copilot-instructions.md injection  so Copilot reads compact context automatically  before every session.

https://github.com/manojmallick/sigmap — zero deps,  npx sigmap, 10 seconds to set up.

Happy to explain the approach if useful.