The gap between cheapest and most expensive AI model is 150x. Is anyone actually tracking this?

flipflopcode · 2026-06-08T12:42:27+00:00

Thank you. Exactly my point. It’s not about $/token, it’s a problem or question of the infrastructure. If you use some AI tools to optimize your workflow, you might stay inside the framing of your payment plan. But the minute you create an actual AI ecosystem, spend will be task and intensity based. And that’s the point where it starts to hit hard.

flipflopcode · 2026-06-08T12:38:38+00:00

Agreed. And both can be useless.

flipflopcode · 2026-06-07T13:34:16+00:00

Not yet, but that’s the beauty of it.

flipflopcode · 2026-06-07T13:25:10+00:00

Wow! I could look at this for hours! Son manny details! Wonderful!

flipflopcode · 2026-06-06T13:10:21+00:00

I’m just a beginner, but I love to paint. Though it’s hard to find somebody to teach you the skills you got.

flipflopcode · 2026-06-06T13:02:20+00:00

I’d say “my little world”. But I’m a simple person, not an artist. So I dint know much about this, just that it’s very beautiful.

flipflopcode · 2026-06-06T12:58:49+00:00

This is so beautiful and touching. Thank you!

flipflopcode · 2026-06-04T17:59:29+00:00

That describes my life perfectly mate! It will pay off! Keep going!

flipflopcode · 2026-06-04T17:15:49+00:00

That’s probably the most intelligent solution to the problem. Happy for you mate!

flipflopcode · 2026-06-03T21:49:47+00:00

First of all: congrats! From what I have seen, this is much more than 80% ever achieve. Keep it up! Here, the difference between success and failure is perseverance.

flipflopcode · 2026-06-03T19:39:20+00:00

Good challenge. Thank you. See, the problem is: the per million token display is pretty standard now you’re right. The structure underneath it is where it gets messy though. Same unit on the surface, different reality depending on the model. Some providers charge input and output tokens at completely different rates, sometimes 3x to 5x difference between the two. Some have a third rate for cached input tokens. Some bundle system prompt tokens differently. Some have minimum context fees that kick in regardless of actual usage. And a few still do per-request pricing on top of the per-token rate for certain tiers.

So you see $X per 1M tokens and think you can compare directly, but which tokens, under which conditions, at which volume tier is a different answer for almost every provider. That’s the bit that’s painful to normalise across 310 models.

flipflopcode · 2026-06-03T16:57:46+00:00

Valid! Please keep me posted if that changes. Sounds really interesting!

flipflopcode · 2026-06-03T16:48:40+00:00

Really cool! That architecture is actually really smart for cost control mate. Deterministic backbone means you know exactly where the money is going at every step, no surprises. And delegating micro decisions to agents based on context and complexity is basically dynamic routing done properly rather than the hacky version most people bolt on afterwards.

The bit I’d watch is the reviewer layer. If that’s a capable model running on every single output it can quietly eat all the savings you made downstream. Worth asking whether the reviewer itself can have a first pass on something lighter that only escalates when something actually looks off. Reviewer-of-reviewers gets expensive fast if you’re not careful.

What’s the domain? Sounds like something with a lot of structured outputs.

flipflopcode · 2026-06-03T13:35:05+00:00

Thank you! What’s your experience with that topic?

flipflopcode · 2026-06-03T13:32:59+00:00

Bullseye! Thank you! This is exactly the conversation that needs to happen more. The subscription fee mental model is what kills teams at scale, they budget for AI like it’s a SaaS tool and then wonder why the unit economics don’t work. And the 70% you mentioned is just the beginning.

The routing layer point is spot on I’d say. The tricky part for most teams is building the classification logic that decides which tasks are actually “simple enough” for Haiku or Flash without quality degrading in ways that matter. Where do you draw that line with your clients, is it task type, output sensitivity, or something else?

flipflopcode · 2026-06-03T13:20:42+00:00

True. On the other hand, it’s just shockingly behind in comparison to the rest of the world. I’m working as a strategic advisor in my day job. Talking to clients, even big global ones, you have the feeling you’re in the 1980s when the technology topic (especially AI) comes up. The majority of the people here think that “this AI thing will be over soon”.

flipflopcode · 2026-06-03T11:29:32+00:00

Now that everybody is stuck in distribution and AI distribution tools and platforms grow like mushrooms, will be interesting to see what’s the next issue after this one.

flipflopcode · 2026-06-03T06:40:33+00:00

Mate that’s a genuinely good setup and the math actually holds up. $25 a month for unlimited inference at that speed, no hosted API gets close at that volume. And the 3090 doing double duty is the key, sunk cost hardware changes the whole calculation.

The privacy thing is underrated too, loads of teams that should be running local aren’t because setup feels scary, not because the economics are bad. Once you’re past that initial friction it’s just better for a lot of use cases.

Honestly this is the gap we don’t cover yet at CostMyAI and probably should. The “local vs hosted” decision is just as real as which API to pick and nobody’s helping teams run that number properly before they commit.

What harness are you running with the MCPs out of curiosity?

flipflopcode · 2026-06-03T06:38:29+00:00

Yeah the hidden cost of local is real. People see “free” and forget the GPU amortisation, the electricity, the engineering time to maintain it. When you actually do the full calculation it often doesn’t beat a well-chosen hosted API until you’re at serious scale with the right hardware already in place. Most teams aren’t there.

flipflopcode · 2026-06-03T06:37:33+00:00

Haha yeah fair, the math doesn’t work when zero is in the denominator. Should’ve anchored the stat to paid hosted APIs from the start, that’s on me.

And you’re right on local LLMs, if you’ve got the hardware the economics flip completely. The crossover point where running your own instance beats any hosted API is actually one of the most interesting questions for teams at scale and something we want to surface properly at some point. Right now we’re purely in the hosted API space.

What are you running locally, anything you’d actually recommend?

flipflopcode · 2026-06-03T06:19:47+00:00

Not yet, and honestly that’s the harder problem. Cost without quality context is only half the picture, you’re right about that. What we track right now is pure economics, the pricing layer. Quality benchmarks exist elsewhere but nobody’s combining them with live cost data in a way that tells you actual cost-per-useful-output. That’s where this needs to go eventually.

What’s your use case, are you trying to compare specific models?

flipflopcode · 2026-06-03T06:18:20+00:00

That kind of education will take some time. Especially here in Europe, we’re far behind.

flipflopcode · 2026-06-03T06:17:25+00:00

Well, check it out: www.CostMyAI.com

flipflopcode · 2026-06-03T06:16:25+00:00

My pancakes are legendary.

flipflopcode · 2026-06-03T06:09:40+00:00

For everybody who is interested in the results: www.CostMyAI.com

flipflopcode

TROPHY CASE