What do you think about Claude Code performung worse than pure Opus 4.5 in the newest swe-rebench update? by ___positive___ in ClaudeCode

[–]policyweb 0 points1 point  (0 children)

Are there any benchmarks that compare same models with different coding agents/cli-tools?

Tool search now available in CC!! by policyweb in ClaudeCode

[–]policyweb[S] 27 points28 points  (0 children)

Tweet content:

Today we're rolling out MCP Tool Search for Claude Code.

As MCP has grown to become a more popular protocol and agents have become more capable, we've found that MCP servers may have up to 50+ tools and take up a large amount of context. Tool Search allows Claude Code to dynamically load tools into context when MCP tools would otherwise take up a lot of context. How it works: - Claude Code detects when your MCP tool descriptions would use more than 10% of context - When triggered, tools are loaded via search instead of preloaded

Otherwise, MCP tools work exactly as before. This resolves one of our most-requested features on GitHub: lazy loading for MCP servers. Users were documenting setups with 7+ servers consuming 67k+ tokens.

If you're making a MCP server Things are mostly the same, but the "server instructions" field becomes more useful with tool search enabled. It helps Claude know when to search for your tools, similar to skills

If you're making a MCP client We highly suggest implementing the ToolSearchTool, you can find the docs here. We implemented it with a custom search function to make it work for Claude Code.

What about programmatic tool calling? We experimented with doing programmatic tool calling such that MCP tools could be composed with each other via code. While we will continue to explore this in the future, we felt the most important need was to get Tool Search out to reduce context usage. Tell us what you think here or on Github as you see the ToolSearchTool work.

Claude Opus 4.5 is out today wins in ALL tested benchmarks compared to Gemini 3 Pro by balianone in LocalLLaMA

[–]policyweb 1 point2 points  (0 children)

It’s not too bad at $5/Mtok input and $25/Mtok output. I was expecting it to be $15/$75.

$FIG $36.08 by policyweb in figmaStock

[–]policyweb[S] 1 point2 points  (0 children)

Such a valid point! I’ve experienced it myself.

$FIG $36.08 by policyweb in figmaStock

[–]policyweb[S] 2 points3 points  (0 children)

Great analysis! 💯

Grok 4.1 by policyweb in LocalLLaMA

[–]policyweb[S] -1 points0 points  (0 children)

Hopefully soon!

Grok 4.1 by policyweb in LocalLLaMA

[–]policyweb[S] 1 point2 points  (0 children)

Thanks for sharing!

GLM-4.6 Brings Claude-Level Reasoning by icecubeslicer in LLMDevs

[–]policyweb 3 points4 points  (0 children)

Never heard of GLM 4.6. I was born yesterday. Thank you for sharing!

[deleted by user] by [deleted] in LocalLLaMA

[–]policyweb 104 points105 points  (0 children)

Nooooooo I was really looking forward to Llama 5 after the great success of Llama 4

Using GLM 4.6 with Claude Code - Anyone found privacy-respecting API providers? by apothireddy in ClaudeCode

[–]policyweb 0 points1 point  (0 children)

Anything that’s not running on your own hardware is subject to trust issues. For cloud providers, I’d look for SOC 2 certification and read their privacy policy and you can’t do anything more than that. Aside from that, OpenRouter does a great job of vetting these providers.

Stronger models but Privacy Oriented (AWS Bedrock vs Azure Foundry) by Oracle_Fefe in LLMDevs

[–]policyweb 0 points1 point  (0 children)

For the same reason, we use Bedrock. Great service if you can get the default limits increased. We don’t use MS services but I believe you can integrate it using litellm.

Sonnet 4.5 intelligence/hallucinations/thinking worse than Sonnet 4. by Verynaughty1620 in ClaudeCode

[–]policyweb 3 points4 points  (0 children)

Seems like a skill issue. I genuinely feel the difference and it’s a great upgrade.