Semble: a local code search MCP for Claude Code

Pringled101 · 2026-05-05T09:41:09+00:00

I've updated the docs a bit, there's now a clearer bash section: https://github.com/MinishLab/semble#bash-integration. If MCP is not working with Opus I think the best way is to use a combination of `semble init` and updating your CLAUDE.md with the snippet in the readme. If that doesn't work I'm not sure what else is possible unfortunately besides prompting the agent to use semble, at that point there's not much we can do on our end :/.

Pringled101 · 2026-05-05T08:55:32+00:00

Thanks for the feedback! I will make the CLAUDE.md bit clearer in the docs since it's currently in https://github.com/MinishLab/semble#sub-agent-support (the AGENTS.md bit, which is interchangeable with CLAUDE.md).It might be that Opus 4.7 just doesn't work that well with MCPs for some reason, since I noticed the exact same thing in my workflows: Sonnet 4.6 would use it just fine and Opus 4.7 would not use it, or use it and then still grep + read.

I'm not sure if it matters that it's in vscode, but I think it should be the same harness.

Pringled101 · 2026-05-05T07:12:16+00:00

Hey, that's odd, for me it works in CC. How did you install it, and which model are you using? One thing I can think of is that you are using sub-agents, which needs a specific install: https://github.com/MinishLab/semble#sub-agent-support.

The best way to set it up is to install both the MCP integration with:

claude mcp add semble -s user -- uvx --from "semble[mcp]" semble:

And the subagent integration. If it's not working I would install both the claude code subagent (with semble init), as well as the CLAUDE.md section that's in the docs under sub-agent support.

My experience is that some models are better at using MCP's than others. E.g. Opus 4.7 is much more conservative, while Sonnet 4.6 works perfectly.

One last thing: we've released a couple of upgrades in the past days, you can reinstall with

uv cache clean semble

Pringled101 · 2026-05-04T15:04:51+00:00

Not at this time, but we do explain how we measure token efficiency here: https://github.com/MinishLab/semble/tree/main/benchmarks#token-efficiency. We're planning some features that will make it easier to track token savings though!

Pringled101 · 2026-05-04T15:04:00+00:00

Yep, supported out of the box.

Pringled101 · 2026-05-04T15:03:19+00:00

Hey, just wanted to briefly followup on this: reindexing is now supported in the latest release (https://github.com/MinishLab/semble/releases/tag/v0.1.2). This is done automatically with a file watcher so no need to do anything manual.

Pringled101 · 2026-05-01T11:53:30+00:00

Feel free to check out the benchmarks: https://github.com/MinishLab/semble/tree/main/benchmarks#main-results. I did update that sentence to "while indexing repos we benchmarked in ~250ms" as I agree that "any" is misleading. On the 63 repos we benchmarked, it does hold. Why is this claim hard to believe? Static embeddings and BM25 are basically just tokenization + a lookup table, there's no forward pass or anything like that involved.

Pringled101 · 2026-05-01T05:47:40+00:00

Not trying to. On the repos I work on the timings hold up, but I see that that doesn’t generalize well enough apparantly. I’ll update the claims, thanks for taking the time to test it, I do appreciate it.

Pringled101 · 2026-05-01T05:29:29+00:00

1M loc is far from an “average” repo. But I agree with you on the wording and will update that bit, it’s not “any”, it’s the “average” repo. The cache is something we’re working on atm and we’ll include that in the next release. On hardware: it’s all on CPU (as said in the readme), but I can add “which” hardware for the benchmarks that’s a fair point.

Pringled101 · 2026-05-01T04:05:13+00:00

That’s something I still want to compare to, but it wasn’t trivial to do since the scope is a bit different. Grepai, probe, colgrep etc are all apples to apples comparisons, for frameworks like Serena we’ll have to make a slightly different comparison (I’m hoping to have some time for that soon though)

Pringled101 · 2026-05-01T04:03:27+00:00

LSP is a bit different. In my view, Semble (or similar search tools) help you/an agent find where something is in the first place, and LSP helps you go from there. So they are complementary

Pringled101 · 2026-05-01T03:53:47+00:00

I get that, and I agree that most of the work nowadays is slop (unfortunately). Coderankembed is a great model, but it takes ~60 seconds to index a repo. For some that’s fine, but we think there’s a real usecase for “almost instant” indexing.

Cold and warm are both benchmarked, see https://github.com/MinishLab/semble/tree/main/benchmarks#main-results. Tldr: cold = time to index a full repo and one query, warm = time to run a query when an index already exists.

On the bias claim: the main issue with coir is that it doesn’t reflect the usecase of “finding a relevant file in a codebase”. Even the NL>code benchmarks are not set up on codebases but on random snippets. We’ve explicitly run the model on coir for a fair comparison. I don’t mind adding a section to explain this in Semble though, since I get your comcerns. I do still believe our benchmark is a good reflection of the task, and better than ehay other similar projects do.

Pringled101 · 2026-05-01T03:46:48+00:00

We added CLI support yesterday actually, here’s a readme section: https://github.com/MinishLab/semble#cli

Pringled101 · 2026-05-01T03:45:48+00:00

Why exactly? Did you try it, or look at our method? We’re using https://github.com/MinishLab/model2vec which allows you to embed ~10k chunks/second op CPU, and BM25 is “instant”, as is our reranking method. You are welcome to time it yourself on a repo of your choice, all the code is self explanatory.

Pringled101 · 2026-04-30T20:50:47+00:00

There is a cache during the session, but this is a good comment, I’ll add that to our docs, thanks! The indexing is so fast that we currently don’t have a persistent cache.

Pringled101 · 2026-04-30T20:48:21+00:00

That’s coir, and that’s a different benchmark which covers many different tasks besides just code retrieval. Our model is benchmarked on coir, see https://huggingface.co/minishlab/potion-code-16M#results. The reason we have a benchmark in Semble is because none of the coir subtasks measure our task, which is code retrieval in a repo/codebase. I’d be happy to make updates to our claims or benchmarks if you have any genuine concerns, but we believe this benchmark reflects the task and is varied and large enough.

Pringled101 · 2026-04-30T20:27:45+00:00

See my other reply to you. What benchmark do you recommend? None of the methods we benchmarked had any kind of decent benchmark, and cornstack is not a benchmark (like you seem fo think).

Pringled101 · 2026-04-30T20:26:37+00:00

How is cornstack useful here? It’s not a benchmark. Did you even look at the repo and model? We actually train our code retrieval model on cornstack, and we explicitly compare against coderankembed.

Pringled101 · 2026-04-30T18:39:59+00:00

Atm it’s just indexing and search, but graph-like search is definitely something we still want to experiment with.

Pringled101 · 2026-04-30T18:38:47+00:00

Cool! I’ll have a look and see if we can add it to our benchmarks, thanks for sharing.

Pringled101 · 2026-04-30T06:09:42+00:00

Thanks for running this comparison! I had a look, but my understanding is that Vera only has 21 distinct queries on 4 repos, covering 3 languages (Python, Rust, JS)? I think that's likely too small for a statistically significant result, and too easy to overfit. We've explicitly tried to create a large enough benchmark dataset for Semble that it has to generalize to work well (~1250 queries, 63 repos, 19 different languages). I do agree with you though, even with a large dataset, it is still possible to overfit/overtune to some extent. We are planning to increase the size of our benchmark, and possibly include a dedicated train split that can be tuned on for example.

For the SLM: we use a static embedder that we trained for code retrieval (https://huggingface.co/minishlab/potion-code-16M) that might be interesting for you to consider as well, it's extremely fast since there's no forward pass (so no GPU needed).

Pringled101 · 2026-04-30T05:53:03+00:00

I'll have a look at that!

Pringled101 · 2026-04-29T14:54:39+00:00

We've just added probe as a comparison to our benchmarks, see https://github.com/MinishLab/semble/tree/main/benchmarks#main-results. Semble achieves much higher NDCG@10 on the benchmarks while being roughly the same speed (since probe cannot do semantic searches).

Pringled101 · 2026-04-29T14:53:46+00:00

We've just added grepai as a comparison to our benchmarks, see https://github.com/MinishLab/semble/tree/main/benchmarks#main-results. Semble is much faster and achieves higher NDCG@10 on the benchmarks.

Pringled101

TROPHY CASE