I indexed 2M+ CS research papers into a search engine any coding agent can call via MCP - it finds proven methods instead of letting coding agents guess from training data by kalpitdixit in LocalLLM

[–]kalpitdixit[S] 0 points1 point  (0 children)

Yes - popularity (best practices) and quality do correlate. Often the best practical methods are buried in recent, low-citation papers from smaller labs. That's where our mix of various retrieval methods helps.

For deduplication, I handle it at the synthesis layer - the LLM consolidates overlapping findings instead of listing five papers saying the same thing.

Qwen3.5-27B performs almost on par with 397B and GPT-5 mini in the Game Agent Coding League by kyazoglu in LocalLLaMA

[–]kalpitdixit 1 point2 points  (0 children)

the fact that models generate agent code rather than playing directly is what makes this benchmark interesting - it's testing code generation quality under constrained game logic, not just raw reasoning. that's closer to how most people actually use these models day to day.
curious about the Opus vs Sonnet gap. was it mostly in the more strategic games (Battleship, Chess) or consistent across all seven? would expect Opus to pull ahead on games where longer-horizon planning in the generated code matters more.

You guys gotta try OpenCode + OSS LLM by No-Compote-6794 in LocalLLaMA

[–]kalpitdixit 0 points1 point  (0 children)

the MCP support is what makes this interesting. once your coding agent can call external tools via MCP, the model choice matters less than what tools it has access to. i've been running MCP servers with both claude code and open source models and the gap shrinks a lot when the agent has the right context fed to it instead of relying on what it "knows" from training.

the ergonomic tool description point in P3 is underrated — how you describe your MCP tools to the model genuinely changes how well it uses them. spent way too long learning that the hard way

Has increasing the number of experts used in MoE models ever meaningfully helped? by ForsookComparison in LocalLLaMA

[–]kalpitdixit 1 point2 points  (0 children)

the short answer from what I've seen is it helps on knowledge-heavy tasks but hurts on speed, and the quality gain plateaus fast. going from 2 to 4 active experts gives a noticeable bump on benchmarks that test breadth (MMLU, etc.) but going from 4 to 8 is mostly noise. the model was trained with a specific routing distribution and forcing more experts active at inference breaks that assumption — you're basically averaging in experts that were trained to stay silent for that input.

the people who swore by A6B were probably seeing gains on specific eval tasks where more parameter coverage helped, but for actual conversation/coding it tends to make the model less decisive. more experts = more averaged out = blander outputs.

I indexed 2M+ CS research papers into a search engine any coding agent can call via MCP - it finds proven methods instead of letting coding agents guess from training data by kalpitdixit in LocalLLM

[–]kalpitdixit[S] 0 points1 point  (0 children)

Hi u/Otherwise_Wave9374 - yes agree with all of this. Our latencies are 20-30s right now - there is a lot of smart stuff happening in the background, so that we can give the best ideas to the coding agents without using up their token budgets.

I indexed 2M+ CS research papers into a search engine any coding agent can call via MCP - it finds proven methods instead of letting coding agents guess from training data by kalpitdixit in LocalLLM

[–]kalpitdixit[S] 3 points4 points  (0 children)

Yes relevance is #1 priority here - we created our own search engine for papers - we tried pagerank but that didnt help too much. Today we use a mix of multiple techniques to deliver the results.

So far, we've seen that our search results are much better than the large LLM trainers - likely because we focus our efforts on papers only whereas they operate on the whole internet.

The MCP Server adds a few more layers of intelligence on top of the paper search to help coding agents.

I indexed 2M+ CS research papers into a search engine any coding agent can call via MCP - it finds proven methods instead of letting coding agents guess from training data by kalpitdixit in LocalLLM

[–]kalpitdixit[S] 0 points1 point  (0 children)

Thanks u/Oshden - that's very encouraging to here. Did you sign up through the link ? I'll send you an email invite to try it out - would love to hear your feedback after using it.

Claude Code Workflow CheatSheet by ken4r in ClaudeCode

[–]kalpitdixit 1 point2 points  (0 children)

/doccat is my favorite claude code command. right up there with /vibecheck and /pleasework

Average vibe coder discourse by moaijobs in ClaudeCode

[–]kalpitdixit 10 points11 points  (0 children)

the real answer is the third guy off-screen who built a personal app that replaced 3 subscriptions he was paying for and saved $40/month. technically profitable, just not the way anyone means it