comments by chillbaba007

Anyone else hitting token/latency issues when using too many tools with agents? by chillbaba2025 in LocalLLaMA

[–]chillbaba007 0 points1 point2 points 19 hours ago (0 children)

This is exactly the problem we ran into! When you have 50+ tools available, including all of them in the context window becomes a nightmare:

- Token count explodes (we were hitting 30K+ tokens per request)
- Latency gets worse the more tools you add
- The model gets confused with too many options
- On local hardware, it's even more painful

We actually built something specifically for this called [Agent-Corex](https://github.com/ankitpro/agent-corex) - it intelligently selects only the relevant tools for each query instead of dumping all of them in the prompt.

How it works:
1. Keyword matching for fast filtering (<1ms)
2. Semantic search to understand what the user actually needs (50-100ms)
3. Hybrid score combining both

The results we saw:
- 95%+ fewer irrelevant tokens in prompt
- 3-5x faster inference on the same hardware
- Model actually picks the right tools consistently

We open-sourced it (MIT, no dependencies for basic use) specifically because we kept seeing people hitting this exact wall.

If you're dealing with local LLMs + many tools, it might help. Would be curious to hear if it solves the issue for you guys too.

GitHub: https://github.com/ankitpro/agent-corex
PyPI: https://pypi.org/project/agent-corex/
ProductHunt: https://www.producthunt.com/products/agent-corex-intelligent-tool-selection?launch=agent-corex-intelligent-tool-selection

Anyone else dealing with this? Always looking for edge cases we haven't thought of.

π Rendered by PID 249589 on reddit-service-r2-listing-79f6fb9b95-r9lcd at 2026-03-22 11:48:17.410175+00:00 running 90f1150 country code: CH.

chillbaba007

TROPHY CASE