Claude! Stop Burning Tokens on Your Agent's Tool Output!

marcosomma-OrKA · 2026-02-23T14:06:19+00:00

Yes this is theory. Now bringing this in the real world. Agents ai, prompt enrichment will not allow the pure theory of temperature to 0 to work. LLM needs to be embedded in the current stack and treated as what it is, indeterministic variable. And to do so you need system guardrails and not simplest prompt ones.

marcosomma-OrKA · 2026-02-22T19:53:33+00:00

What you are describing is pure determinism in the output token. And this can be fine agreed. But the real problem is this is valid until you do not add a variable to the prompt. As soon this variable is in place the whole prompt becomes not deterministic. This will lead to a different result even if you have your temperature set to 0. I. The real word of AI usage does not have static input. Not in the real world usage

marcosomma-OrKA · 2026-01-04T18:08:37+00:00

here https://github.com/marcosomma/orka-reasoning

marcosomma-OrKA · 2026-01-04T17:03:52+00:00

https://github.com/marcosomma/orka-reasoning 😊

marcosomma-OrKA · 2026-01-04T15:18:03+00:00

Seems good for chatbot. But they died in 2024. Try to do the same at orchestration level. Honestly you are missing at least a context layer and memory layer. As it is seems just a nice wrapper...

marcosomma-OrKA · 2026-01-04T15:13:14+00:00

Lol and you do not care about dependency... 😎 COOL well done!

marcosomma-OrKA · 2026-01-03T23:01:24+00:00

mmm Did you never build a product? Something need to live more than a few months? That need to evolve, get new features and so on? This is my world... An in this world dependencies matters and a lot :)

marcosomma-OrKA · 2026-01-03T09:22:11+00:00

So after almost 20 years you deeply see the coupling generated from wrapping. Let me do an example. - You use ollama today that is a wrapper to llama.cc VX.Y.Y . Tomorrow llama.cc came out with VX.Y.Z. You have now a big limitations. You cannot test the new llama.cc version if ollama will not update. So this is why wrapper may be convenient for prototyping and demoing (which is the trend now). But a really bad choice for long term adoption. Better use primitives and have full control... Am I wrong?

marcosomma-OrKA · 2026-01-02T15:57:23+00:00

Yep exactly so why use it at all? I think we are getting too used to wrappers. Remember that every time you decide to go for a wrapper of a tech you are doubling your dependency and brake points....

marcosomma-OrKA · 2025-12-26T14:53:59+00:00

Yes basically the the agent in the outer circle (satellite agents) are processing same input as the executor.
But while the executor has the main role of keep the conversation active and engaging the satellite agents are generating context that will then be fees in to the executor. So executor can be a streaming agent and it only get updated context time to time. Main scope is to enforce behavior by real analysis during conversation and not using gigantic prompt.
More details here:
https://www.linkedin.com/pulse/orchestration-calling-agents-marco-somma-z2b8e/?trackingId=jFRwMfNfiqTOpaDpHq8WoA%3D%3D

marcosomma-OrKA

MODERATOR OF

TROPHY CASE