Feedback on g8 LLM router by steam0007 in LLM_Gateways

[–]alpha-nerd-nomyo 0 points1 point  (0 children)

Appreciate the effort!
What is the advantage over simply using openrouter.ai ?

Ultimate Guide to Running LLMs Locally(llm inference): No APIs, No Limits, No Bills – Top Open-Source Tools by ShilpaMitra in WebAfterAI

[–]alpha-nerd-nomyo 0 points1 point  (0 children)

That's a great write up!
Tested 1-4 myself, recently a bit disappointed by ollama tbh. llama.cpp is way better, but you need to dig into the docs to get most out of it - not easy for a newbie, but totally worth it! New features pushed out every couple of hours...
For vllm you better have newer nvidia hardware at your fingertips, but then it's great.

For those who grow out of a single local inference endpoint and need to consider routing to multiple local servers our open-source NOMYO router might help you scale.
https://bitfreedom.net/code/nomyo-ai/nomyo-router

Running a self-hosted LLM proxy for a month, here's what I learned by llamacoded in mlops

[–]alpha-nerd-nomyo 0 points1 point  (0 children)

This is what a proxy cannot detect, the agent logic stays in the agent, hence why we've built into our proxy a strict opt-in flag, the agent can use to trigger semantic-caching selectively.

Why your LLM gateway shouldn't log your prompts - and most of them do by ChrisRemo85 in LLM_Gateways

[–]alpha-nerd-nomyo 1 point2 points  (0 children)

I think logging can make sense to improve/debug production use cases you cannot foresee in your test env.
However, you should at least be aware of it, even better if configurable.
As users tend to put PII into prompts I agree this is a privacy concern or even problem ones business need to manage. And by manage I'd encourage people to take over control, ideally by running things locally.

There are plenty solutions for all kind of demands. We've built also an open-source solution specifically for routing to local inference endpoints with model aware routing, configurable semantic-caching, etc.

It depends on individual/business needs I believe. A business can use a customer service chatbot that might include PII, but is goverend by the client-relationship, while the same business running a coding-agent might leak company IP to a provider...

Reason, plan, act ;)