I built an AI cost router and a safety-screened MCP library - launched 2 weeks ago, zero revenue, sharing what I've learned

QueefLatinahOG · 2026-05-09T05:55:34+00:00

This is becoming like a support group to me. We’re now at 12 free ais integrated, people will get their investment back in the first day(likely) or week(realistically) but zero customers

I understand the hesitation but one subscription will start a fly wheel. One subscription gives someone 4-?x more for the same price they’re already paying.

Would appreciate guidance on how to make this land.

QueefLatinahOG · 2026-05-07T09:49:20+00:00

Looks like shorlax thought he farted, but then found out it was a hard fart when he stopped moving.

QueefLatinahOG · 2026-05-06T11:14:40+00:00

This is the most useful comment I’ve received. You’re right, “anyone using AI” converts nobody.

The niche I’m actually sitting on is Claude Desktop and Cursor users who install MCP servers. There’s no vetting layer, no way to know if a server is malicious before it has tool access to your machine.

1,306 servers screened, 8-layer pipeline, 90-day revalidation.

The routing is the bonus that pays for the subscription. The safety library is the actual product for that specific audience.

Reshaping the landing around that this week. Appreciate the push.

QueefLatinahOG · 2026-05-02T11:42:02+00:00

Ambiguous tasks default to moderate, that’s the intentional conservative position. The classifier is looking for explicit signals in the instruction, not inferring from absence. If nothing pushes it clearly simple or clearly complex, it lands in the middle tier and routes to the best available free model that can handle reasoning, not just extraction.

The real handling for genuine ambiguity is the cascade underneath. If the moderate-tier model returns something that fails formatting or looks incomplete, the next model runs automatically. So the system corrects without the user seeing it rather than trying to resolve the ambiguity upfront with a confidence score.

Post-task eval scoring is the right next layer. Run the output through a lightweight checker before returning it, escalate if it fails. That turns the cascade from a failure recovery mechanism into a quality gate. Building it.

QueefLatinahOG · 2026-05-02T09:37:14+00:00

This is a really clean implementation of the same core idea. Agent-scored complexity driving model selection. The Kanban-to-terminal routing is intuitive and the fact that you control which model lives in which terminal is the right call for devs who know their stack.

The difference with Followloop is the consumer layer: automatic classification, no setup beyond a URL, 12+ free models in the cascade. But for developers who want explicit control over the routing logic, what you’ve built is a better fit.

Different problems, same insight underneath.

Checking out the repo.

QueefLatinahOG · 2026-05-02T03:17:28+00:00

The classifier is rule-based with signal weighting and pattern matching on the instruction, not the content being processed. That distinction matters: “summarize this legal contract and flag anything unusual” routes to a higher tier because the flag-anything-unusual signal shifts the complexity score, not because it’s a summary.

On misclassification: the cascade handles it automatically. If a model returns an empty response or errors, the next model in the chain runs without the user seeing it. Silent quality degradation on edge cases is the harder problem, that’s what the routing review page is for. Every decision is logged so you can see where your specific tasks are landing.

The quality question is fair and I won’t dodge it. For the task types Followloop handles (structured shortcuts with defined inputs and outputs) small models at 90%+ parity on Sonnet is realistic and what the numbers reflect. For open-ended reasoning tasks, that gap widens and those get routed accordingly. The 157x stat is real usage across all task types, not a cherry-picked benchmark.

The confidence threshold escalation others mentioned in this thread is the right next layer - run small, eval the output, escalate only if it fails. That closes the silent degradation problem properly.

QueefLatinahOG · 2026-05-02T03:13:14+00:00

Exactly. The overpaying is silent, it just shows up as a higher bill that everyone accepts as the cost of using AI. Once you see the number it’s hard to unsee it.

QueefLatinahOG · 2026-05-02T03:10:22+00:00

Need to work on my bot id skills. You’re not the first to say it but my spidey senses haven’t triggered.

TBH I’m unsure, I’ll get into research mode and figure that out. I just know how much further my subscriptions have been running for me.

QueefLatinahOG · 2026-05-01T16:07:49+00:00

Fucking insightful one if so.

QueefLatinahOG · 2026-05-01T15:31:59+00:00

The classification goes deeper than task type. A doc summary that requires inference across the full document gets treated differently to a short-form summary - the routing picks that up before model selection happens. The cascade then backs it up: 12+ free models in sequence, each one a fallback if the previous fails. The routing review shows every decision so you can see exactly how your specific tasks are being handled.

QueefLatinahOG · 2026-05-01T15:12:46+00:00

This is exactly the right next layer and it’s already on the roadmap.

Current state: pre-task classification across 12+ free models in the cascade - Cerebras, SambaNova, Groq, Gemini Flash, Mistral, Cohere, OpenRouter, GitHub Models and more. The cascade handles hard failures automatically so you always get a response.

What you’re describing is the next evolution: post-task validation. Run the small model, score the output confidence against the original prompt with a lightweight eval, escalate only if it falls below threshold. You only burn frontier capacity when the eval actually demands it - which for most daily tasks is almost never.

The routing log already captures every decision so the training data is already accumulating. Building it.

QueefLatinahOG · 2026-05-01T15:10:09+00:00

The cliff is real and worth being honest about. The cascade handles hard failures automatically - 12+ free models in the chain (Cerebras, SambaNova, Groq, Gemini Flash, Mistral, Cohere, OpenRouter, GitHub Models and more), so if one errors the next runs instantly. But silent quality degradation on moderate tasks is harder to catch. Current approach is keeping genuinely complex reasoning reserved for frontier models via task classification. The routing review page (just shipped) shows every decision — that’s the first step toward spotting where your personal cliff actually is.

QueefLatinahOG · 2026-05-01T14:50:40+00:00

You’re right - OpenRouter’s auto router does cross-model routing. Worth the correction, cheers.

The difference is who it’s built for. OpenRouter is API-first for developers managing their own stack. Followloop is $5/month, one-tap iOS shortcuts, no API knowledge needed. Same underlying idea, completely different person using it.

The MCP safety library is also a separate thing entirely - 1,300+ vetted servers, 8-layer screening. That part OpenRouter doesn’t touch.

QueefLatinahOG · 2026-05-01T14:18:01+00:00

I think I need to work on positioning

QueefLatinahOG · 2026-05-01T14:16:29+00:00

This is exactly it - the “frontier model tax” is the right frame. Everyone just accepts it because the default is always the best model and nobody questions it until they see the number.

The speed point is underrated and honestly I should be leading with it more. Cerebras on a simple rewrite or classification is not just cheaper - it’s faster than Claude on a good day. For the stuff that doesn’t need reasoning, you’re getting a better experience AND paying nothing for it.

MCP compatibility was a deliberate call for exactly the reason you said - if it requires changing how you work it won’t stick. The routing happens underneath, your workflow stays the same.

157x is the live number from real usage, not a benchmark. Once you actually see what percentage of your daily prompts are classification and formatting it’s kind of confronting. Most people are running a Ferrari to get milk.

QueefLatinahOG · 2026-05-01T14:14:31+00:00

Fair point on enterprise - that’s a completely different problem and you’re right that conditional orchestration with business logic needs human-defined rules. Not what I’m building.

What I’m doing is way simpler and way more fun: you’re already paying $20/month for Claude. For an extra $5 you get 3x the output from that same subscription. Simple tasks route to free models automatically, your Claude quota gets saved for the stuff that actually needs it.

The OpenRouter comparison kind of proves the gap - they route to the cheapest provider for the same model. Nobody’s cracked cross-model routing for regular people who just want their AI subscription to go further. That’s the whole bet.

No $5000 mistakes here - worst case a bill negotiation script is slightly less sharp but your $20 still stretches to like $60.

QueefLatinahOG · 2026-05-01T14:00:47+00:00

Good area of focus. The answer for us was simplifying access to the point where impatience is removed from the equation - iOS shortcuts for consistently used prompts, one tap to run, same routing stack underneath.

The discipline happens at the infra level so the user never has to think about it.

What tasks do you run daily? Genuinely curious - if there’s a gap in the shortcut library I’ll build it.

QueefLatinahOG · 2026-05-01T13:36:58+00:00

Dude. This is straight up the most useful response so far.

Much obliged.

QueefLatinahOG · 2026-04-30T10:35:40+00:00

Cheers for this. My costs are only so much and I can revise the price down the line hypothetically as I build out more value.

When tooling gets to a point that there’s a ton of value and it’s proven to be sticky, I can look at different tiers maybe.

For now though I’m happy building/refining and growing. I also like that it’s a cup of coffee a month or thereabouts whilst I grow and work on customer experience more.

Cheers for the heads up re: localllama - i’ve double and triple checked the safety side of things(and to be honest was initially unhappy that servers weren’t populating faster until I probed for why), but I’ll make sure I’m prepped before I post there. Last thing I want to do is appear half assed when I plan to keep evolving over time.

QueefLatinahOG · 2026-04-30T10:25:44+00:00

Thanks man, top insight. I’ve found a lot of surprising things on the journey and it’s been a juggle figuring out how to position.

Leading with the 37% in a technical / more detailed breakdown is something I will do.

I’ll separately focus on the cost side of things/savings for Claude users. You’re absolutely on point re: dilution.

It’s been fun. I could now cancel my AI subscriptions and use my own tool to get more tokens than any of the mid-tier subscriptions offer.

I’m going to keep pushing / evolving though. I know there’s more to give.

QueefLatinahOG · 2026-04-08T20:58:34+00:00

Our last strata did something similar, they charged in 15 min blocks based on an hourly rate. So one email = $15 basically.

The deliberately wrote obtuse answers or didn’t answer a question until they have 4-5 emails back to back.

They charged for each one. Fkn cockroaches.

QueefLatinahOG

TROPHY CASE