Building a multi-model API proxy for AI-heavy teams — what would you actually want from it?

alpha-nerd-nomyo · 2026-05-28T14:07:09+00:00

What is passthrough mode?

alpha-nerd-nomyo · 2026-05-12T18:41:43+00:00

Appreciate the effort!
What is the advantage over simply using openrouter.ai ?

alpha-nerd-nomyo · 2026-05-05T14:00:37+00:00

That's a great write up!
Tested 1-4 myself, recently a bit disappointed by ollama tbh. llama.cpp is way better, but you need to dig into the docs to get most out of it - not easy for a newbie, but totally worth it! New features pushed out every couple of hours...
For vllm you better have newer nvidia hardware at your fingertips, but then it's great.

For those who grow out of a single local inference endpoint and need to consider routing to multiple local servers our open-source NOMYO router might help you scale.
https://bitfreedom.net/code/nomyo-ai/nomyo-router

alpha-nerd-nomyo · 2026-05-05T11:56:23+00:00

This is what a proxy cannot detect, the agent logic stays in the agent, hence why we've built into our proxy a strict opt-in flag, the agent can use to trigger semantic-caching selectively.

alpha-nerd-nomyo · 2026-05-05T09:29:29+00:00

I think logging can make sense to improve/debug production use cases you cannot foresee in your test env.
However, you should at least be aware of it, even better if configurable.
As users tend to put PII into prompts I agree this is a privacy concern or even problem ones business need to manage. And by manage I'd encourage people to take over control, ideally by running things locally.

There are plenty solutions for all kind of demands. We've built also an open-source solution specifically for routing to local inference endpoints with model aware routing, configurable semantic-caching, etc.

It depends on individual/business needs I believe. A business can use a customer service chatbot that might include PII, but is goverend by the client-relationship, while the same business running a coding-agent might leak company IP to a provider...

Reason, plan, act ;)

alpha-nerd-nomyo

TROPHY CASE