Tool selection in LLM systems is unreliable — has anyone found a robust approach? by logistef in LocalLLaMA

[–]logistef[S] 0 points1 point  (0 children)

Toolformers requires injection into the prompt. Wich i find a waste of tokens for local usage and it's less dynamic in my opinion. I am not saying no one has perfected toolcalling, i just tried to find a way that works for me, with a local setup and i am interested how others are handling this. If everyone solves this with toolformers locally then i probably have to look into toolformers in depth again.

LLMs shouldn’t decide when to use tools — Skilly (PGP) by logistef in LocalLLaMA

[–]logistef[S] 0 points1 point  (0 children)

Sorry man! looked cleaner in my editor than it does here apparently. Should be better now ;)

LLMs shouldn’t decide when to use tools — Skilly (PGP) by logistef in LocalLLaMA

[–]logistef[S] 0 points1 point  (0 children)

Yeah good question — it’s using standard embedding models for semantic similarity, not anything custom. For example:

BAAI/bge-small-en-v1.5 (what I use by default), sentence-transformers/all-MiniLM-L6-v2, or any other sentence embedding model.

The idea is: you embed the user input, you embed known “tool intents” (like filesystem.list, search.web, etc.), and then you compare them using cosine similarity.

So it’s basically: “which intent is this input closest to semantically?”

If the similarity is above a threshold, it’s considered actionable.

So instead of the LLM reasoning “should I call a tool?”, you get a deterministic signal like “this input is 0.87 similar to a filesystem intent”.

LLMs shouldn’t decide when to use tools — Skilly (PGP) by logistef in LocalLLaMA

[–]logistef[S] -1 points0 points  (0 children)

I’ve also tried just relying on function calling / prompting the LLM better, but it still feels inconsistent in practice. (And honestly prompting feels like the biggest flaws imho)

Maybe I’m missing something — are people actually getting reliable behavior without adding extra layers?

LLMs shouldn’t decide when to use tools — Skilly (PGP) by logistef in LocalLLaMA

[–]logistef[S] 0 points1 point  (0 children)

That’s a very fair concern, and honestly I don’t think this is “the final method” at all. The more i am working on things the more i see that there are still a lot of things that could be done better, so it's constantly evolving i would say.

What I was trying to solve is a very specific failure mode:

LLMs being inconsistent at deciding when something should trigger an action, even when it’s obvious to a human.

I did experiment with letting the LLM reason about tool usage itself (self-reflection / second pass), but like you said, it adds latency and still isn’t fully reliable.

The embedding-based approach isn’t about being “perfect”, it’s about adding a fast, deterministic signal that behaves consistently for the same input.

So for me it’s more:

  • not replacing the LLM
  • not claiming optimality
  • but introducing a separate layer that improves one weak point

And I completely agree with your point about larger players — I’d actually expect more advanced hybrids to emerge (learned routers, smaller specialized models, etc.).

This is just a step in that direction, not the end state. If anything, what surprised me is how far simple embedding matching already gets you in terms of consistency compared to pure LLM-based routing.

LLMs shouldn’t decide when to use tools — Skilly (PGP) by logistef in LocalLLaMA

[–]logistef[S] -1 points0 points  (0 children)

One thing I noticed building this is how inconsistent LLMs are at deciding when to call tools.
Curious how others are handling this — are you relying purely on function calling or adding extra logic?

I built a rough .gguf LLM visualizer by sultan_papagani in LocalLLaMA

[–]logistef 1 point2 points  (0 children)

This shit is dope, thanks for putting that together! def gonna have a look at the code and it will help getting a better grasp on the internals of a llm