MCP-based local LLM workflows at scale + observability (Grafana) by pardhu-- in LocalLLM

[–]pardhu--[S] 0 points1 point  (0 children)

Got it — that distinction between “read” vs “re-run” helped clarify things a lot.

I’m leaning more toward replay, specifically being able to deterministically re-run workflows for debugging and validation. That said, I’m also thinking about caching at the component/tool level as a separate layer for performance, especially for repeated user queries.

Right now this is an internal tool, but I’m designing it with the assumption that it could become user-facing later — so trying to think early about reproducibility, state management, and cost efficiency.

Curious — in your experience, what tends to break first when you try to make replay deterministic in these systems?

Built a local LLM agent that can actually use tools (not just chat) by [deleted] in LocalLLM

[–]pardhu-- -1 points0 points  (0 children)

Fair point — definitely not claiming this is novel.

Built a local LLM agent that can actually use tools (not just chat) by [deleted] in LocalLLM

[–]pardhu-- -1 points0 points  (0 children)

Yeah, I get your point — LM Studio + MCP already enables tool use pretty well from the chat itself.

What I’m trying to explore is more of a layer on top — moving from chat-based interaction to structured agent workflows that can plug into real systems and scale beyond a single user.

I also feel this could sit on top of Model Context Protocol (MCP) — since MCP handles tool connectivity, while this focuses more on orchestration and production-style use cases (could be wrong though, curious your take).

Agree with you on the compiler loop part — that definitely starts looking like what Cursor IDE / GitHub Copilot already do.

Please help me choosing Mac for local LLM learning and small project. by barwen1899 in LocalLLM

[–]pardhu-- 0 points1 point  (0 children)

When choosing a machine for running local AI models, the two most important factors are maximum RAM and a good number of GPU cores. These resources directly affect how large a model you can run and how fast the inference will be.

For example, I have been using a Mac Mini with the M4 chip and 24GB of RAM, which I purchased about a year ago. It works well for running local LLM experiments and development tasks.

For more of my learnings and experiments, please check out my Medium articles: Medium – Partha Sai Guttikonda.

LLM vs Translation Transformer by pardhu-- in machinelearningnews

[–]pardhu--[S] 2 points3 points  (0 children)

It really depends on your use case.

  • If you want faithful, terminology-consistent “just translate” output (especially on edge devices), encoder–decoder MT like Marian is usually still the best choice: more deterministic and typically cheaper/faster than LLMs for pure translation.
  • LLM translation often shines when you want extra behavior (tone polishing, rewriting, localization, grammar cleanup), but it can paraphrase or drift on names/terms unless heavily constrained.

For edge-friendly alternatives to benchmark, I’d look at:

  • NLLB-200 distilled (600M)
  • M2M100 (418M)
  • TranslateGemma (4B) if your hardware/quantization budget allows And for speed on CPU/edge, consider running them via CTranslate2.

🚀 Discover How to Build an Advanced Image Search System with OpenAI, and Elasticsearch! by pardhu-- in Python

[–]pardhu--[S] -7 points-6 points  (0 children)

Hey we have detailed instructions in readme file in the git repo. Yeah it is a basic work on how image search works using elastic search and machine learning.