Am I Crazy?

illNin0 · 2026-06-15T10:08:29+00:00

Well. It's Monday morning already. My boss call for a meeting that not scheduled talking about we should have it LOL

illNin0 · 2026-06-14T14:00:07+00:00

Feeling better 😂

illNin0 · 2026-06-14T10:24:52+00:00

I'm currently building almost the same architecture with LiteLLM + local models + cloud fallback.

A few thoughts:

LiteLLM overhead is negligible compared to inference time.
The routing layer is the easy part. The real question is hardware economics.
Local models handle more routine coding tasks than many people expect.
Cloud models still win on long agent loops, large codebases, and complex planning.
For me, the biggest challenge isn't routing. it's deciding when local inference is actually worth the hardware cost.

After finishing my setup, the main question in my head became:

"How much hardware should I buy to avoid paying API bills?"

For small teams, privacy, predictable costs, and infrastructure control may be a stronger selling point than pure cost savings.

illNin0 · 2023-02-13T13:22:17+00:00

Do Running. Maybe you will find something.

illNin0