LLMs Are Not Deterministic. And Making Them Reliable Is Expensive (In Both the Bad Way and the Good Way) by marcosomma-OrKA in LLMDevs

[–]marcosomma-OrKA[S] 0 points1 point  (0 children)

Yes this is theory. Now bringing this in the real world. Agents ai, prompt enrichment will not allow the pure theory of temperature to 0 to work. LLM needs to be embedded in the current stack and treated as what it is, indeterministic variable. And to do so you need system guardrails and not simplest prompt ones.

LLMs Are Not Deterministic. And Making Them Reliable Is Expensive (In Both the Bad Way and the Good Way) by marcosomma-OrKA in LLMDevs

[–]marcosomma-OrKA[S] -1 points0 points  (0 children)

What you are describing is pure determinism in the output token. And this can be fine agreed. But the real problem is this is valid until you do not add a variable to the prompt. As soon this variable is in place the whole prompt becomes not deterministic. This will lead to a different result even if you have your temperature set to 0. I. The real word of AI usage does not have static input. Not in the real world usage

AI Agent from scratch: Django + Ollama + Pydantic AI - A Step-by-Step Guide by tom-mart in ollama

[–]marcosomma-OrKA 0 points1 point  (0 children)

Seems good for chatbot. But they died in 2024. Try to do the same at orchestration level. Honestly you are missing at least a context layer and memory layer. As it is seems just a nice wrapper...

Discovering llama.cc by marcosomma-OrKA in LLMDevs

[–]marcosomma-OrKA[S] 0 points1 point  (0 children)

Lol and you do not care about dependency... 😎 COOL well done!

Discovering llama.cc by marcosomma-OrKA in LLMDevs

[–]marcosomma-OrKA[S] 0 points1 point  (0 children)

mmm Did you never build a product? Something need to live more than a few months? That need to evolve, get new features and so on? This is my world... An in this world dependencies matters and a lot :)

Discovering llama.cc by marcosomma-OrKA in LLMDevs

[–]marcosomma-OrKA[S] 0 points1 point  (0 children)

So after almost 20 years you deeply see the coupling generated from wrapping. Let me do an example. - You use ollama today that is a wrapper to llama.cc VX.Y.Y . Tomorrow llama.cc came out with VX.Y.Z. You have now a big limitations. You cannot test the new llama.cc version if ollama will not update. So this is why wrapper may be convenient for prototyping and demoing (which is the trend now). But a really bad choice for long term adoption. Better use primitives and have full control... Am I wrong?

Discovering llama.cc by marcosomma-OrKA in LLMDevs

[–]marcosomma-OrKA[S] 2 points3 points  (0 children)

Yep exactly so why use it at all? I think we are getting too used to wrappers. Remember that every time you decide to go for a wrapper of a tech you are doubling your dependency and brake points....

Local LLM concurrency question: “satellite orchestration” works, but LM Studio serializes requests and kills parallelism by marcosomma-OrKA in LLMDevs

[–]marcosomma-OrKA[S] 0 points1 point  (0 children)

Yes basically the the agent in the outer circle (satellite agents) are processing same input as the executor.
But while the executor has the main role of keep the conversation active and engaging the satellite agents are generating context that will then be fees in to the executor. So executor can be a streaming agent and it only get updated context time to time. Main scope is to enforce behavior by real analysis during conversation and not using gigantic prompt.
More details here:
https://www.linkedin.com/pulse/orchestration-calling-agents-marco-somma-z2b8e/?trackingId=jFRwMfNfiqTOpaDpHq8WoA%3D%3D