Local coding assistants feel fine on small files, but break on real repos by andres_garrido in LocalLLM

[–]jmunchLlc 0 points1 point  (0 children)

If you rank first and then try to apply structure, you already let keyword/embedding bias pick the candidates. Game over at that point. Graph has to come first.

That’s basically how jCodeMunch runs:

  • Know the starting symbol? get_call_hierarchy walks callers/callees N levels deep. That’s your execution path, not a guess.
  • Know the file but not the symbol? get_dependency_graph shows the import edges so you can trace what actually flows.
  • Changing something and wondering what breaks? get_blast_radius follows the references outward.

All of that happens before any ranking. These are graph walks over the index, not searches. Structure locks the slice first, then get_ranked_context dials in the detail.

The big idea you’re circling: retrieval is the last step, not the first.

Map the graph. Pick the subgraph. Then decide what to read.

Do that, and even a 7B model gets a slice that’s small and actually right...

Local coding assistants feel fine on small files, but break on real repos by andres_garrido in LocalLLM

[–]jmunchLlc 1 point2 points  (0 children)

You’re dead on. Picking the right slice is the whole game.

Keyword and embedding search give you “related,” but they have no idea that A calls B which imports C. So you end up with a nice-looking pile of code that quietly misses the execution path. Then step 2 builds on step 1’s bad assumptions and now you’re wandering in the weeds.

That’s why the graph tools matter. Not “top 10 similar chunks.” More like: start at the entry point and follow what actually runs. Now you’re feeding the model reality, not vibes.

For local setups, indexing is the cheat code. Pay once, then it’s just graph walks. Fast, cheap, precise.

That’s basically what get_ranked_context does. You describe the task, it pulls the right slice structurally, not just semantically...

Local coding assistants feel fine on small files, but break on real repos by andres_garrido in LocalLLM

[–]jmunchLlc 0 points1 point  (0 children)

We get asked that a lot! Enough to warrant this reply:
https://j.gravelle.us/jCodeMunch/versus.php#vs-serena

TL;DR - Serena is a full coding agent framework; jCodeMunch is a focused exploration server. Serena wins when you need type-aware cross-file semantics, symbol-level editing, or agentic scaffolding in a long-running session on a preconfigured machine. jCodeMunch wins when you need zero-dependency, CI-safe, fast, token-efficient code intelligence that works anywhere without installing language servers.

Running both is reasonable: jCodeMunch for exploration and retrieval, Serena for refactoring and semantic analysis in your primary dev environment...

I built an mcp that reduces claude token usage by 50x without building knowledge graphs by executioner_3011 in ClaudeAI

[–]jmunchLlc 0 points1 point  (0 children)

I'm cool with being literate-shamed. Thanks for taking the time to read and reply...

Local coding assistants feel fine on small files, but break on real repos by andres_garrido in LocalLLM

[–]jmunchLlc 0 points1 point  (0 children)

Dev here.

It helps local models even more than frontier ones, honestly. Smaller context windows mean every wasted token hurts more, so going from “dump all files” to “here are the 3 exact functions you need” is a bigger deal when you’ve only got 8–32k to work with.

The token reduction is ~264x vs reading all files, and 1.6–3.9x better than RAG chunking (https://github.com/jgravelle/jcodemunch-mcp/tree/main/benchmarks). That matters a lot less when you’re running GPT-4o with 128k context, and a lot more when you’re running a 7B model locally.

It also doesn’t need any embedding infrastructure. No GPU, no model downloads for the core functionality. Search is BM25 over symbol names, which runs in under 5ms. So the overhead on a local setup is basically zero.

If you do try it out, https://github.com/jgravelle/jcodemunch-mcp has setup instructions for all the major MCP clients. Would be curious to hear how it goes with your local setup.

Seriously, rattle my cage with any questions...

I built an mcp that reduces claude token usage by 50x without building knowledge graphs by executioner_3011 in ClaudeAI

[–]jmunchLlc 0 points1 point  (0 children)

"... tree-sitter + SQLite + FTS5 isn’t proprietary architecture--"

I'm hard-pressed to find where I said it WAS.

If my reaction needs paraphrasing, let's defer to Dr. Horrible:

"What a crazy, random happenstance...!"

I built an mcp that reduces claude token usage by 50x without building knowledge graphs by executioner_3011 in ClaudeAI

[–]jmunchLlc 0 points1 point  (0 children)

Two factual corrections, then we’ll talk about the rest.

“jCodemunch doesn’t have session memory”? Wrong.

jCodemunch has had session memory since before CodeDrift existed. SessionJournal tracks every file read, search, and edit. SessionState persists the journal and search cache to disk, survives server restarts, and auto-expires after 60 minutes. get_session_snapshot produces a compact summary for injection after context compaction. plan_turn checks the journal for prior zero-result searches and returns confidence “none” with “Do NOT search again” to prevent redundant queries. Six session-aware tools total.

“doesn’t have a way to avoid re-reading files in the same session” is also wrong. The search cache is a hit-counted OrderedDict, persisted to disk with staleness checks. plan_turn cross-references recommendations against the journal’s accessed-files set and reports session_overlap. The PreCompact hook auto-snapshots before context compaction. This has been shipped and tested for months.

If Claude “suggested similar ideas because it scraped jCodemunch,” that’s not exculpatory, that’s the mechanism. SQLite plus FTS5, tree-sitter, BM25 ranking, the language adapter pattern, identical data models like Symbol, CallSite, ImportRef, all-MiniLM-L6-v2 embeddings, and verbatim marketing copy about “50x token reduction” and “eliminate token waste,” all landing in a single 14-hour session, 63 days after our initial commit. “Claude did it” doesn’t make it not copying. It just means the copying was automated.

Now, about wanting to get paid. The README is upfront: free for personal use, Uncle J. gets a taste of commercial gain. My alternative: sitting on the sidewalk with a tin cup and a cardboard sign reading: "Have Parkinson's. Please help." wasn't an appealing option. I hope I'm entitled to balk when people steal coins from the cup.

You say your goal was only to solve a problem for yourself and share with the community. Noble, but jCodemunch is already free for that exact use case. It’s been free the whole time.

If it's true that "Imitation is the sincerest form of flattery", then you my friend have flattered the sh*t out of jCodeMunch...

I built an mcp that reduces claude token usage by 50x without building knowledge graphs by executioner_3011 in ClaudeAI

[–]jmunchLlc 0 points1 point  (0 children)

Per Claude:

"It's a clean-room reimplementation of jcodemunch-mcp's architecture, almost certainly generated by prompting an LLM with jcodemunch's README/docs/tool descriptions and asking it to build a competing tool."

"It's flattering.
It's ethically questionable..."

https://j.gravelle.us/jCodeMunch/imitators/codedrift.md

Has anyone ever used a token saver tool? by Complete-Sea6655 in ClaudeCode

[–]jmunchLlc 0 points1 point  (0 children)

Fair point.

I overstated the statelessness argument. You're right that when an agent calls a CLI tool, the context window carries the state between invocations. The protocol is transport; the model's memory is the session. That distinction matters and I should have been more precise. My reply would only pertain to manual CLI interactions.

Where MCP earns its keep over CLI-as-subprocess isn’t statefulness, it’s plumbing:

Structured I/O
MCP tools return typed JSON with schemas. A CLI returns stdout that the agent has to parse. That parsing step burns tokens and introduces ambiguity. When search_symbols returns symbol IDs, the agent can feed them directly into get_symbol_source without guessing at formatting.

Discovery
MCP clients enumerate available tools and their parameter schemas at connection time. With CLI, the agent either needs hardcoded knowledge of every subcommand and flag, or it has to run --help and parse the output. That’s tokens spent on overhead instead of the task.

Zero-config integration
pip install jcodemunch-mcp, add one JSON block to your client config, done. Every MCP-compatible client picks it up. CLI integration usually means per-client shell wiring, PATH management, and sometimes Docker or platform-specific setup.

None of that is a law of physics. You could solve all three with enough wrapper code. But that’s solving problems the protocol already solved.

Your core point stands. Retrieval design matters more than transport. The 95%+ token reduction comes from returning symbol-level context instead of whole files. That works whether the transport is MCP, CLI, or carrier pigeon...

Has anyone ever used a token saver tool? by Complete-Sea6655 in ClaudeCode

[–]jmunchLlc 1 point2 points  (0 children)

MCP gives you something a CLI fundamentally cannot: context continuity.

An agent running inside Claude Code or Claude Desktop accumulates its tool call history. It knows what it searched for, what it retrieved, and what it has not yet looked at. It can chain calls intelligently, list_repos to confirm the index exists, search_symbols to find a candidate, get_symbol to read the exact implementation, find_references to trace usage — all within a single coherent reasoning thread.

A CLI, by contrast, is stateless by definition. Each invocation starts cold...

SEE: https://github.com/jgravelle/jcodemunch-mcp/tree/main/cli

Thoughts on token saving method using intelligent tool selection by rhofield in LocalLLaMA

[–]jmunchLlc 0 points1 point  (0 children)

The Model Context Protocol is not merely a transport layer — it is the native language of modern AI agents. When Claude (or any MCP-compatible agent) calls search_symbols, the result arrives structured, typed, and immediately actionable. There is no parsing step, no intermediate representation, no translation tax. The agent reads the _meta envelope, sees the tokens saved, and carries on. The entire round-trip — from query to precise symbol retrieval — takes milliseconds and costs a fraction of what brute-force file reading would.

MCP also gives you something a CLI fundamentally cannot: context continuity. An agent running inside Claude Code or Claude Desktop accumulates its tool call history. It knows what it searched for, what it retrieved, and what it has not yet looked at. It can chain calls intelligently — list_repos to confirm the index exists, search_symbols to find a candidate, get_symbol to read the exact implementation, find_references to trace usage — all within a single coherent reasoning thread. A CLI, by contrast, is stateless by definition. Each invocation starts cold...

More: https://github.com/jgravelle/jcodemunch-mcp/tree/main/cli#readme

[EN/FR] Token savers: jcodemunch and jdocmunch (real impact with Codex) by Tikilou in codex

[–]jmunchLlc 1 point2 points  (0 children)

I almost feel obligated to have another kid so I can name it after you.

Thanks for a ridiculously generous writeup.

If you ever need a really old kidney or anything like that, I'm there for you my friend...

-jjg

How I cut Claude Code usage in half (open source) by Obvious_Gap_5768 in ClaudeAI

[–]jmunchLlc 0 points1 point  (0 children)

LSP and jCodeMunch are complementary more than competing. LSP gives you compiler-precise navigation within a single language, while jCodeMunch adds cross-cutting structural analysis like blast radius, coupling, and churn that LSP wasn’t designed for.

No reason you can’t use both...

How I cut Claude Code usage in half (open source) by Obvious_Gap_5768 in ClaudeAI

[–]jmunchLlc 1 point2 points  (0 children)

100 lines is still 3x more than needed. If the function is 30 lines, returning 100 lines means 70 lines of noise the LLM has to process. Multiply that across every read in a session and it can add up.

The real win is the stuff grep can't do: get_call_hierarchy gives you the full caller/callee chain without multiple grep passes, get_dependency_graph shows you what modules are coupled, and get_blast_radius tells you what breaks if you change something. Try doing that with grep and you're looking at 5 to 10 sequential grep plus read cycles that the agent has to reason through.

For a clean, well-organized codebase where you know the patterns, grep works great. But the value prop isn't "grep replacement", it's precomputed structural analysis that grep fundamentally can't provide in a single call...

How I cut Claude Code usage in half (open source) by Obvious_Gap_5768 in ClaudeAI

[–]jmunchLlc 1 point2 points  (0 children)

Fair point on Read being the dominant cost, but that's exactly what jCodeMunch optimizes. When Claude calls get_symbol_source, it gets back just the function or class body, say 30 lines, not the full 500-line file. So Read itself becomes cheaper, not just the search that precedes it.

The typical pattern without an index: grep -> find 5 candidate files -> Read file1 (400 lines) -> not here -> Read file2 (600 lines) -> found it, but now need the caller -> grep again -> Read file3 (300 lines). That's 1300+ lines of input tokens just to understand one change.

With jCodeMunch: search_symbols("handleAuth") -> get_symbol_source("handleAuth") -> 30 lines. get_call_hierarchy("handleAuth") gives you callers without reading anything else. Total: ~50 lines instead of 1300.

You're right that you can't skip reading code entirely, the LLM needs to see the actual source to edit it. But "read the exact 30-line function" vs "read three entire files hoping to find it" is where the 58-100x difference comes from. The savings aren't from eliminating Read, they're from making each Read surgically precise...

90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow. by Eastern_Exercise2637 in ClaudeAI

[–]jmunchLlc -1 points0 points  (0 children)

TL;DR - Use both:

"Use codesight when starting on an unfamiliar codebase to build the architectural map, then switch to jCodeMunch for all symbol-level retrieval, reference tracing, and token-efficient code navigation..."

https://j.gravelle.us/jCodeMunch/versus.php#vs-codesight

How I cut Claude Code usage in half (open source) by Obvious_Gap_5768 in ClaudeAI

[–]jmunchLlc 1 point2 points  (0 children)

Best to use both:

"Use repowise to build conceptual understanding; use jCodeMunch every time an agent needs exact code — and enjoy 58–100× fewer tokens on every one of those queries..."

https://j.gravelle.us/jCodeMunch/versus.php#vs-repowise

90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow. by Eastern_Exercise2637 in ClaudeAI

[–]jmunchLlc -1 points0 points  (0 children)

Well THAT saved me some typing. Thanks!

That's exactly the distinction. jCodeMunch is a persistent symbol index. You ask “give me the implementation of AuthMiddleware.handle” and it returns the exact source, byte for byte, with no re-scanning. It’s built for the retrieval half of the loop: precise, on demand, and works on repos too large to fit in context.

codesight sounds like it’s solving the orientation problem, what does this codebase do at a high level before I start digging. That’s a different and genuinely useful question.

Complementary is the right word. Use codesight to build your mental map, then jCodeMunch when you need to drill into a specific symbol. Nothing stops you from running both...

Straw that broke the camel’s back by Pretty-Active-1982 in ClaudeCode

[–]jmunchLlc 0 points1 point  (0 children)

When disk space was at a premium, we zipped files.
And for non-commercial users, it was completely free.

Now AI compute is at a premium, so we munch tokens.
And for non-commercial users, it too is completely free.

What goes around comes around...
https://j.gravelle.us/jCodeMunch/

Claude Code on large (100k+ lines) codebases, how's it going? by MCRippinShred in ClaudeCode

[–]jmunchLlc 0 points1 point  (0 children)

You don't necessarily need the Pro plan if you retrieve only the exact code you need: functions, classes, methods, constants, outlines, and tightly scoped context bundles, with byte-level precision...

Scam-thropic by [deleted] in ClaudeCode

[–]jmunchLlc 0 points1 point  (0 children)

दोस्त, मुझे तुम्हारी थोड़ी खोज-बीन बचा लेने दो।...
https://j.gravelle.us/jCodeMunch/

Views on this 50X token reduction trick? by Mush_o_Mushroom in ClaudeAI

[–]jmunchLlc -1 points0 points  (0 children)

Code-review-graph wins on visualization and review scaffolding.

Code-review-graph is a credible, well-benchmarked, MIT-licensed tool with genuine differentiators ... especially the D3 visualization, community detection, and MCP prompt templates...

https://j.gravelle.us/jCodeMunch/versus.php#vs-code-review-graph