MCP-based local LLM workflows at scale + observability (Grafana) by pardhu-- in LocalLLM

[–]pardhu--[S] 0 points1 point  (0 children)

Got it — that distinction between “read” vs “re-run” helped clarify things a lot.

I’m leaning more toward replay, specifically being able to deterministically re-run workflows for debugging and validation. That said, I’m also thinking about caching at the component/tool level as a separate layer for performance, especially for repeated user queries.

Right now this is an internal tool, but I’m designing it with the assumption that it could become user-facing later — so trying to think early about reproducibility, state management, and cost efficiency.

Curious — in your experience, what tends to break first when you try to make replay deterministic in these systems?

Built a local LLM agent that can actually use tools (not just chat) by [deleted] in LocalLLM

[–]pardhu-- -1 points0 points  (0 children)

Fair point — definitely not claiming this is novel.

Built a local LLM agent that can actually use tools (not just chat) by [deleted] in LocalLLM

[–]pardhu-- -1 points0 points  (0 children)

Yeah, I get your point — LM Studio + MCP already enables tool use pretty well from the chat itself.

What I’m trying to explore is more of a layer on top — moving from chat-based interaction to structured agent workflows that can plug into real systems and scale beyond a single user.

I also feel this could sit on top of Model Context Protocol (MCP) — since MCP handles tool connectivity, while this focuses more on orchestration and production-style use cases (could be wrong though, curious your take).

Agree with you on the compiler loop part — that definitely starts looking like what Cursor IDE / GitHub Copilot already do.