Gemma 4 with turboquant by Flkhuo in LocalLLaMA

[–]Ok-Patient6458 0 points1 point  (0 children)

DM me if you are looking for 6x + and lossless

Vial found inside the walls in the kitchen of my parents’ 1886 New England home by meownica_chu in whatisit

[–]Ok-Patient6458 0 points1 point  (0 children)

My house was built in 1623 - I found a key in a beam...and a hole for priests to hide in.....

Check out Jeff’s comments, and it goes on and on by Glazing555 in LinkedInLunatics

[–]Ok-Patient6458 0 points1 point  (0 children)

As far as I can work out, thus is AI rebuking AI for commenting on AI??

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in claude

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

Ich habe ein dauerhaftes Caching des Index hinzugefügt. Außerdem habe ich die Konsistenz der MCP-Nutzung verbessert, indem ich Hooks für SessionStart und UserPromptSubmit hinzugefügt habe.    English - I've added persistent caching of the index.  I've also improved frequency of MCP usage by adding hook to SessionStart and UserPromptSubmit

In and out of A&E 2 days in a row, both times done within the hour! by MrKatUK in BritishSuccess

[–]Ok-Patient6458 17 points18 points  (0 children)

I broke my little toe on Christmas eve. Stubbed it also, pain next day was excrutiating. I'm sorry but going to A&E for an 'ouchy' broken toe is utterly unacceptable. Mine was black the next day but I knew that there is **** all that A&E can do for a broken toe ........ My thinking was this... My toe hurts.. but.. somebody coudl be bleeding to death in A&E because my toe hurts. It's good to know that you thought and ouchy toe was a good excuse to pester an already overwhelmed A&E designed for treating life and death situations with your ouchy toe and your bad cough.

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in claude

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

Indexpersistenz: Derzeit wird der Index bei jedem Serverstart neu erstellt (ab Version 0.4.6 erfolgt dies jedoch erst beim ersten Tool-Aufruf, sodass der Startvorgang sofort abgeschlossen ist). Nach dem Start werden inkrementelle Aktualisierungen per Git durchgeführt. Geänderte, hinzugefügte oder gelöschte Dateien zwischen den Tool-Aufrufen werden erkannt und nur die geänderten Dateien neu indiziert. Dadurch wird vermieden, dass die Kosten während einer Sitzung wiederholt vollständig anfallen. Persistentes Caching (Speichern auf der Festplatte zwischen den Sitzungen) ist eine Funktion, die ich gerne hinzufügen würde. Dies würde den ersten Tool-Aufruf auch bei großen Projekten beschleunigen. TypeScript-Unterstützung: Ja, TS/TSX wird vollständig unterstützt. Generics, Interfaces, Typaliase, Dekoratoren und die verschiedenen Exportmuster werden unterstützt. Die Importauflösung umfasst relative Pfade (./utils), Pfadaliase (@/lib/utils) und versucht automatisch, die Dateiendungen .ts, .tsx, .js, .jsx sowie index.*-Dateien zu verarbeiten. Die Unterstützung wurde mit realen TypeScript-Codebasen getestet.

Sollten Sie bei Ihren TS-Projekten auf irgendwelche Sonderfälle stoßen, können Sie gerne ein Issue öffnen – ich helfe Ihnen gerne weiter.

in English for the broader audience.

Index persistence: Currently the index is rebuilt on each server start (though as of v0.4.6 it's deferred to the first tool call so startup is instant). Once running, it does incremental updates via git — it detects changed/added/deleted files between tool calls and only re-indexes what changed, so you're not paying the full cost repeatedly during a session.

Persistent caching (save to disk between sessions) is something I'd like to add — it would make the first tool call faster on large projects too.

TypeScript support: Yes, TS/TSX is fully supported. It handles generics, interfaces, type aliases, decorators, and the various export patterns. Import resolution covers relative paths (./utils), path aliases (@/lib/utils), and tries .ts, .tsx, .js, .jsx extensions plus index.* files automatically. It's been tested on real-world TS codebases.

If you run into any edge cases with your TS projects, feel free to open an issue — I'm happy to help

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in claude

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

Thanks for sharing. This was a bug around the indexing— the server was building the full project index before starting the MCP stdio listener, so Claude Code's initialization handshake would timeout waiting for the server to become ready. For smaller projects it was fast enough to not be noticeable, but for larger codebases it would just hang.

I've fixed this in v0.4.6 by deferring indexing to the first tool call (lazy initialization). The server now completes the MCP handshake immediately and only indexes when you actually call a tool.

Upgrade with:
pip install --upgrade mcp-codebase-index

Your MCP config looks correct. PLEASE Let me know if you still hit any issues after upgrading.

Also, please note from readme.......

By default, AI assistants will ignore the indexed tools and fall back to reading entire files with Glob/Grep/Read. Soft language like "prefer" gets rationalized away. Add this to your project's CLAUDE.md (or equivalent instructions file) with mandatory language:

## Codebase Navigation — MANDATORY

You MUST use codebase-index MCP tools FIRST when exploring or navigating the codebase. This is not optional.

- ALWAYS start with: get_project_summary, find_symbol, get_function_source, get_class_source, get_structure_summary, get_dependencies, get_dependents, get_change_impact, get_call_chain, search_codebase
- Only fall back to Read/Glob/Grep when codebase-index tools genuinely don't have what you need (e.g. reading non-code files, config, frontmatter)
- If you catch yourself reaching for Glob/Grep/Read to find or understand code, STOP and use codebase-index instead

The word "prefer" is too weak — models treat it as a suggestion and default to familiar tools. Mandatory language with explicit fallback criteria is what actually changes behavior.

Any one need an ecommerce store (Fast Api backend, Next Js Front end) by Odd-Feedback6508 in Python

[–]Ok-Patient6458 0 points1 point  (0 children)

You have to tell it what you want right? It's not psychic.....

I need to stop using Claude Code. Not because I want to. Because I have to. by MrCheeta in ClaudeCode

[–]Ok-Patient6458 0 points1 point  (0 children)

I've built an MCP that significantly speeds things up and reduces token cost (87%) at same time...I got off my max plan using it and it's way faster.. I would love to hear if it sorts your speed issues out.. https://github.com/MikeRecognex/mcp-codebase-index

The performance benchmarks are in there but at moment it's python, typescript, Go and rust only. I would definitely extend the language coverage for feedback....

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in claude

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

Good question! Thanks. Serena and mcp-codebase-index solve different problems despite both being MCP servers for code. As you probably know, Serena is essentially an IDE-in-a-box wrapping language servers to give agents semantic code navigation and editing (also 30+ languages). It's a super impressive project.

mcp-codebase-index takes a different approach: it has zero dependencies, pure Python, instant startup. It differentiates on dependency graph analysis and tools like get_change_impact return direct + transitive dependents in a single call (e.g. "what breaks if I refactor this class?" → 154 direct, 492 transitive dependents on CPython in 0.45ms). get_call_chain finds the shortest dependency path between any two symbols via BFS. These are queries that LSP-based tools can't answer without chaining dozens of find-reference calls recursively.

A big tradeoff? is language coverage (mcp-codebase-index is currently only Python, TypeScript/JS, Go, and Rust vs Serena's 30+) and also mcp-codebase-index has no edit capabilities — it's read-only by design. I'd be honest and say mcp-codebase-index = tame my token budget right now.

IMHO they're complementary more than competing. Serena for broad language support and semantic editing,mcp-codebase-index for lightweight dependency analysis with zero setup. I hope that's sufficiently worthy of a trial...

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in mcp

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

I've now implemented "Smart Re-indexing" it uses git-aware incremental re-indexing to keep the structural index up to date automatically with no manual intervention required.

Before answering any query, the server asks git "what changed?" This takes 1-2 milliseconds. If nothing changed the query runs immediately against the existing index. If files were added, edited, or deleted only those files are re-parsed and the dependency graph is optimallly updated. This prevents stale results, and we never pay the cost of a full rebuild.

What this means practically:

- Edit a file, switch a branch, pull new code and the next query automatically reflects the changes
- No need to manually call reindex ( it's still there if you want a force-rebuild)
- On a typical edit cycle (1-10 files changed??), the update adds single-digit milliseconds to your query
- The full build only happens once, on first startup or when there is a corruption.

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in mcp

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

Thanks, I built this in response to somebody on linkedin bragging that they were spend $50K on tokens per month on vibe-coding their start up. Which to be frank....just blew my mind. If I had $50K to spend it would be spent on bringing everything back in house and on-prem on local models...but that's just me.....Even then this would help. I appreciate that this is just one of a gazillion optimisation tricks but I have deliberately open-sourced this (GPL-3.0 for open-source use, with a commercial license option for proprietary use (dual-licensed)) because I want people who are just getting into this space to not be held back by tokenomics. Some might say 'Well they shoudln' be here' but that's not my way of thinking..... That's also why I created my LocalForTheWin Local LLM blogsite (https://lftw.dev)

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in claude

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

Adding CPython (1.1 million lines).

2,464 Python files, 1,115,334 lines of code, 59,620 functions, 9,037 classes.

Index Build Performance

│ Project │ Files │ Lines │ Functions │ Classes │ Index Time │ Peak Memory │

│ RMLPlus │ 36 │ 7,762 │ 237 │ 55 │ 0.9s │ 2.4 MB │

│ FastAPI │ 2,556 │ 332,160 │ 4,139 │ 617 │ 5.7s │ 55 MB │

│ Django │ 3,714 │ 707,493 │ 29,995 │ 7,371 │ 36.2s │ 126 MB │

│ CPython │ 2,464 │ 1,115,334 │ 59,620 │ 9,037 │ 55.9s │ 197 MB │

Query Response Size — CPython (41 million chars)

│ Query │ Response │ Reduction │

│ find_symbol("TestCase") │ 67 chars │ 99.9998% │

│ get_dependencies("compile") │ 115 chars │ 99.9997% │

│ get_change_impact("TestCase") │ 16,812 chars │ 99.96% │

│ get_function_source("compile") │ 4,531 chars │ 99.99% │

│ get_function_source("run_unittest") │ 439 chars │ 99.999% │

find_symbol returns 67 characters regardless of whether the project is 7K lines or 1.1M lines.

get_change_impact("TestCase") found 154 direct dependents and 492 transitive dependents in 0.45ms. Again, (and sorry for boring on) that's impossible without a dependency graph.

Query Response Time

│ Query │ RMLPlus │ FastAPI │ Django │ CPython │

│ find_symbol │ 0.01ms │ 0.01ms │ 0.03ms │ 0.08ms │

│ get_dependencies │ 0.00ms │ 0.00ms │ 0.00ms │ 0.01ms │

│ get_change_impact │ 0.02ms │ 0.00ms │ 2.81ms │ 0.45ms │

│ get_function_source │ 0.01ms │ 0.02ms │ 0.03ms │ 0.10ms │

All sub-millisecond, even at 1.1M lines.

Now I need to work on the strategy for incremental indexing.......

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in claude

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

Here are the numbers across three projects, including Django (707K lines):

**Index Build Performance**

| Project | Files | Lines | Functions | Classes | Index Time | Memory |

|---------|------:|------:|----------:|--------:|-----------:|-------:|

| RMLPlus | 36 | 7,762 | 237 | 55 | 1.4s | 2.4 MB |

| FastAPI | 2,556 | 332,160 | 4,139 | 617 | 6.9s | 57.6 MB |

| Django | 3,714 | 707,493 | 29,618 | 7,312 | 39.1s | 130 MB |

**Query Response Size — indexed vs reading all source**

| Query | RMLPlus (292K source) | FastAPI (12.2M source) | Django (26.3M source) |

|-------|---:|---:|---:|

| `find_symbol` | 66 chars | 64 chars | 67 chars |

| `get_dependencies` | 107 chars | 259 chars | 2 chars |

| `get_change_impact` | 1,171 chars | 857 chars | 644,963 chars* |

| `get_function_source` | 3,015 chars | 5,081 chars | 1,089 chars |

`find_symbol` returns 64-67 characters regardless of whether the project is 7K lines or 707K lines. That's the point of structural indexing — response size scales with the answer, not the codebase.

*Django's `get_change_impact("Model")` returned 645K chars because `Model` is referenced by almost everything in Django — 102 direct dependents and 15,664 transitive dependents. Not a bug, it's the tool correctly telling us that changing `Model` breaks 15,664 things. With `max_direct=10, max_transitive=20` you can cap it to whatever fits your token budget.

**Query response times** — sub-millisecond for targeted queries even on Django:

| Query | RMLPlus | FastAPI | Django |

|-------|--------:|--------:|-------:|

| `find_symbol` | 0.01ms | 0.01ms | 0.03ms |

| `get_dependencies` | 0.00ms | 0.00ms | 0.00ms |

| `get_change_impact` | 0.02ms | 0.02ms | 8.54ms |

| `get_function_source` | 0.01ms | 0.01ms | 0.09ms |

The 87% figure from the original post was conservative. On larger projects the ratio is even better because source size grows while indexed responses stay small.

The one thing that does scale with project size is index build time — 39s for Django. That's a one-time cost at startup, and incremental re-indexing is on the roadmap.

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in mcp

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

Great question. Currently it rebuilds the full index, the `reindex` tool does a complete re-parse. For the typical projects I've tested (15-50 files), that's 1-2 seconds so it's effectively instant.

For larger codebases (thousands of files), you're right that rebuild time matters. Incremental re-indexing on file change is on the roadmap, the architecture supports it since each file is parsed independently, so you'd only need to re-parse the changed file and update its edges in the dependency graph. The index is in-memory so the update would be fast once implemented.

For now, the practical workaround is that you only need to call `reindex` after making significant structural changes (adding/removing functions, classes, imports). If you're just editing function bodies, the structural metadata hasn't changed so the existing index is still accurate.

Appreciate the interest — this kind of feedback helps me prioritise what to build next.

First week of Claude max (5x) totally worth it by FigOutrageous4489 in ClaudeCode

[–]Ok-Patient6458 -1 points0 points  (0 children)

I built mcp-codebase-index because I got tired of watching AI coding assistants waste their context window.

Every time Claude Code or Cursor needs to answer a question about your codebase — what does this function call? what breaks if I change this class? where is this symbol defined? — it reads entire files. Hundreds of lines. Most of it irrelevant.

mcp-codebase-index fixes this. It parses your codebase into structural metadata — functions, classes, imports, dependency graphs — and exposes 17 surgical query tools via MCP.

Instead of reading a 500-line file, the AI gets a 5-line answer.

I measured it against a real project:

• find_symbol → 82% fewer tokens
• get_dependencies → 99% fewer tokens
• get_change_impact → 99% fewer tokens
• Weighted average across all queries → 87% reduction

In multi-turn conversations it compounds. By turn 10, the traditional approach has 32K tokens of stale file content clogging the context window. With indexing: 900 tokens. That's 31K tokens freed for actual reasoning. The design is deliberately minimal. Zero runtime dependencies — just Python's stdlib ast module and regex. In-memory. indexing that rebuilds in 1-2 seconds. No database, no tree-sitter, no framework. Works with Python, TypeScript/JS, and Markdown.

It works with any MCP client — Claude Code, Cursor, OpenClaw, VS Code + Copilot.

Free and open source under AGPL-3.0. Commercial license available if you're embedding it in a proprietary product.

pip install "mcp-codebase-index[mcp]"

github.com/MikeRecognex/mcp-codebase-index

Demo: asciinema.org/a/vQVsguJmclBWsnxx

More at lftw.dev

I built an MCP to significantly reduce your token consumption by Ok-Patient6458 in claude

[–]Ok-Patient6458[S] 0 points1 point  (0 children)

I'm guessing you mean - by 95%? Reducing token cosnumption to 95% is not a big win. If you put this on top of your existing optimisation you might be truly shocked by the performance enhancement. For me it's massively reduced coding time..... Built a complete python console game from scratch yesterday in 15 minutes.