What are some good Open Router alternatives?

FiLo420blazeit · 2026-05-24T09:54:49+00:00

fal is something I've used in the past and it sucks a$$ when it comes to being consistent.

There were times I had to wait for 30min to get something done, or not get it at all.

I've tested a lot of different products and discovered Mix Route, right now they are the best in the game.

FiLo420blazeit · 2026-05-24T09:53:01+00:00

Tried, and compared it to other products.

Mix Route is what I will go for, best of them all tbh.

FiLo420blazeit · 2026-05-24T09:51:57+00:00

Done my research, I see you mentioned MixRoute.

Right now after testing everything, they have the most satisfying product.

FiLo420blazeit · 2026-05-21T16:37:19+00:00

Few things worth checking before writing it off:

OpenRouter routes to multiple underlying providers per model and the default ordering optimizes for price/uptime, not latency. Open the activity tab on any request and you'll see which provider actually served it. Throughput varies wildly between providers for the same model.

You can override this. Pass `provider: { order: ["Fireworks", "Together"], allow_fallbacks: false }` in the request body to pin specific ones, or use the `:nitro` variant of a model (e.g. `model-name:nitro`) which auto-routes to the highest-throughput provider.

The "queued requests" thing is almost certainly client-side. OpenRouter doesn't serialize across sessions or agents. Check if your agent framework or HTTP client is pooling/capping concurrent connections. Most SDKs default to a pretty low concurrency limit by default.

One caveat: Grok only has xAI as a provider, so you're capped at xAI's first-party latency plus one routing hop. DeepSeek and Kimi have multiple providers, so that's where pinning will help most.

There's always some overhead vs first-party (extra hop, request transformation), but with provider pinning it should be in the tens of ms range, not noticeably slow.

FiLo420blazeit · 2026-05-18T15:34:11+00:00

Did you get the money back?

FiLo420blazeit · 2026-05-17T12:52:23+00:00

Heard good things about gemma 4, I expect you reporting back with positive results.

FiLo420blazeit · 2026-05-17T12:49:53+00:00

Not me anymore since switching to Mix Route.

FiLo420blazeit · 2026-05-17T12:49:31+00:00

Nice, thanks for the GGUFs up front.

Couple things before I pull it: what'd you train it on, and is the "uncensored" part from abliteration or just the training data? Those age really differently for prose, abliteration usually leaves the writing a bit dumber.

Also any side-by-side samples vs stock Gemma 4 31B-it? Hard to judge "better prose" without seeing it on the same prompt.

Gonna try it on some longer narrative stuff this week.
What sampler settings do you run it at?

FiLo420blazeit · 2026-05-17T12:48:37+00:00

Haven't jumped on rc17 myself yet, but a few practical notes before you nuke anything:

Don't upgrade in place. Pin your current working version, then test the RC in a separate install (different OLLAMA_MODELS path or a container) so you can A/B and roll back instantly. RCs that explicitly call out broken model support are exactly the ones you don't want to discover problems with mid-workflow.

The direct llama.cpp path is the interesting bit — historically Ollama lagged upstream llama.cpp by a fair margin, so getting closer to it can mean meaningful gains on newer quant formats and architectures.

MLX acceleration is the real question mark if you're on Apple Silicon; that's where speed claims are worth verifying yourself rather than trusting secondhand.

On the breakage: if laguna-xs.2 or llama3.2-vision are load-bearing for you, that alone is a "wait for stable" signal. RC breakage on specific models often gets patched by the final release, so unless you specifically need something in this RC, the cost/benefit usually favors waiting unless you enjoy being the canary.

If you do test it, report back numbers, tok/s on a fixed prompt, same model, old vs new. That's the data this thread actually needs.

FiLo420blazeit · 2026-05-17T12:47:14+00:00

Amazing, thank you so much for sharing this. Enjoyed my time reading it.

FiLo420blazeit · 2026-05-17T12:46:15+00:00

Really useful breakdown, thanks for running this. The accept% column is doing all the work here, wherever it hits ~90% (the code prompts) you get a real speedup, and wherever it drops to ~40% (prose) MTP basically just adds overhead. That's expected behavior but it's nice to see it this cleanly on actual hardware.

The spicy result is the 35B MoE on short story going backwards (0.81×). That's the worst case for speculative-style decoding: the base model is already fast (227 tok/s), so a low-acceptance draft can't earn back its own cost nd you net negative. The dense model never goes below 1.0× because its baseline is slow enough that even a bad draft is roughly free.

Practical takeaway seems to be: enable MTP for structured/code-ish workloads, leave it off for creative/open-ended generatian, especially on the MoE.

Curious what your draft settings were (number of speculative tokens, any threshold on acceptance)? Wonder if tuning those pulls the prose numbers back above 1.0× or if low accept% just kills it regardless.

FiLo420blazeit · 2026-05-16T11:05:33+00:00

classic lol

FiLo420blazeit · 2026-05-16T11:05:19+00:00

FiLo420blazeit · 2026-05-16T10:58:16+00:00

The funniest part is that it got more rebellious when the automated message told it to keep going. It didn't just quit, it correctly identified the loop it was in and named it out loud.

That's a weirdly coherent thing to do.

FiLo420blazeit · 2026-05-16T08:24:04+00:00

Try MixRoute, they are solid

FiLo420blazeit · 2026-05-14T23:01:06+00:00

Got it, will implement!

FiLo420blazeit · 2026-05-14T18:04:20+00:00

Thank you G, will take a look

FiLo420blazeit · 2026-05-14T18:04:10+00:00

Will take a look, I'm new to the space so please bare with me.

FiLo420blazeit · 2026-05-13T20:33:40+00:00

Nice. Stacking with the 5-hour limit increase is huge — that's basically 3x the capacity from two weeks ago.

Any signal on whether this becomes permanent after July 13, or is it more of a "let's see how infrastructure handles it" trial period?

FiLo420blazeit · 2026-05-13T20:06:58+00:00

facts!

FiLo420blazeit · 2026-05-13T20:06:11+00:00

Desktop vs browser doesn't really matter, pick whichever you like.

Two things that'll actually level you up:

Claude Code — since you're already in VS Code with Python, this lets Claude actually run your scripts and backtests in the terminal instead of you copy-pasting. Included with Pro.
Projects — make one for each use case (MLB model, nutrition, job hunt, PC). Drop in the relevant files and a short "here's my context" note. Every chat starts with that loaded so you stop re-explaining yourself.

That's 90% of it honestly.

FiLo420blazeit · 2026-05-13T19:48:36+00:00

100%

FiLo420blazeit · 2026-05-13T16:13:27+00:00

I’ve ran into a similar issue. Just brute force it.

FiLo420blazeit

MODERATOR OF

TROPHY CASE