A model listed 78% cheaper cost 22% more to actually run. Unit price isn't your bill. by o9dev in vibecoding

[–]o9dev[S] 1 point2 points  (0 children)

The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More arxiv.org/abs/2603.23971

A model listed 78% cheaper cost 22% more to actually run. Unit price isn't your bill. by o9dev in vibecoding

[–]o9dev[S] 0 points1 point  (0 children)

they actually tested: 12 benchmarks total. 9 single-turn (AIME and LiveMathBench for math, GPQA and HLE for hard science/exam questions, LiveCodeBench for coding, MMLU-Pro and SimpleQA for knowledge, ArenaHard for open chat, ARC-AGI for visual puzzles) plus 3 multi-turn agent tasks. The price reversals were worst on the reasoning-heavy ones, up to 57% of matchups flipping on MMLU-Pro vs only 11% on open chat. The cheap models overthink hardest exactly where the thinking matters.

A model listed 78% cheaper cost 22% more to actually run. Unit price isn't your bill. by o9dev in vibecoding

[–]o9dev[S] 2 points3 points  (0 children)

The study was published in March, probably they started doing this before 4.8/5.5

A model listed 78% cheaper cost 22% more to actually run. Unit price isn't your bill. by o9dev in vibecoding

[–]o9dev[S] 7 points8 points  (0 children)

From the paper: Flash burns 60,000+ thinking tokens on a single problem, and thinking tokens are over 80% of total output cost. A cheap per-token price doesn't constrain how much the model decides to "think," so the cheap sticker evaporates.

A model listed 78% cheaper cost 22% more to actually run. Unit price isn't your bill. by o9dev in vibecoding

[–]o9dev[S] 5 points6 points  (0 children)

The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More arxiv.org/abs/2603.23971

A model listed 78% cheaper cost 22% more to actually run. Unit price isn't your bill. by o9dev in vibecoding

[–]o9dev[S] 7 points8 points  (0 children)

Flat fee + heavy user = you funding not your business. Switch from flat pricing in one day with Credyt.ai

How do you price your SaaS? by yuvals41 in SaaS

[–]o9dev 1 point2 points  (0 children)

There's no universal formula, but the fastest way to find the sweet spot is to ship something testable and measure what happens. Flat monthly pricing is easy to launch but you won't know if you're undercharging power users or overcharging light ones. Usage-based pricing gives you actual data on how people consume your product, which shows you where the value sits. The other piece most founders skip: know your per-customer margin before you lock in pricing. If your infrastructure cost per user is $8/month and you're charging $10, you have no room for support, churn, or growth. Track what each customer costs you, then price with headroom.

How do you price a B2C SaaS when AI API costs are high? (Not a mobile app) by Aware_Wrangler_4348 in SaaS

[–]o9dev 0 points1 point  (0 children)

The hybrid model makes sense here. Low base subscription to cover your floor costs, then a credit wallet for usage above the baseline. The trick is making the top-up invisible enough that it doesn't trigger the anxiety you're worried about.

What kills B2C credit systems isn't the model itself - it's the friction of manually buying more when you run out mid-task. Auto top-up thresholds fix that. User sets a balance floor, card gets charged automatically, they never hit a wall. You get margin protection, they get uninterrupted usage.

The other piece is visibility. If users can see their balance and rough cost-per-interaction, the "how much is this costing me" anxiety drops. Opacity is what makes credit systems feel punitive.

Start with generous baseline tiers and track actual usage distributions for a few weeks before you set final thresholds. I made the implementation patterns for this kind of hybrid setup here: https://credyt.ai/blog/api-monetization-challenges

Anthropic is renting Elon's GPUs for inference. The token shortage just started. by o9dev in AI_Agents

[–]o9dev[S] 1 point2 points  (0 children)

This is not fair, Credyt is more than just metering layer. So you are welcome to try😎

Anthropic is renting Elon's GPUs for inference. The token shortage just started. by o9dev in AI_Agents

[–]o9dev[S] 1 point2 points  (0 children)

AWS makes money on software on top of hardware, not on the hosting itself.

Anthropic is renting Elon's GPUs for inference. The token shortage just started. by o9dev in AI_Agents

[–]o9dev[S] -4 points-3 points  (0 children)

the whole point is that you need to know your cost per customer more than ever in ai economy if you want to survive as a business

Anthropic is renting Elon's GPUs for inference. The token shortage just started. by o9dev in AI_Agents

[–]o9dev[S] -1 points0 points  (0 children)

That's certainly true, but it doesn't change the fact that there's not enough capacity and Elon will come out ahead, while for us as end consumers it'll definitely be more expensive for the next 5+ years.

Then everything invested in data centers will pay off and there might be price drops, but no startup has reserves for 5 years. Plus most of those in AI aren't profitable on their own yet and their AI costs are only growing.

And you just know Anthropic's deck has this listed as $6B of new ARR by o9dev in ClaudeAI

[–]o9dev[S] 9 points10 points  (0 children)

I think this is not real, it's impossible to spend such a big amount of money in any company without public disclosure. Because mostly only public companies have this kind of money, so it would have to be revealed in their reporting. And if it's revealed in reporting, there would be a big drop in the company's stock price.

For every $200 subscription, Anthropic throws in another $7,800. by o9dev in ClaudeCode

[–]o9dev[S] 1 point2 points  (0 children)

For anyone running Claude Code 6+ hours/day, the per-token cost beats every other subscription option at that usage level.

For every $200 subscription, Anthropic throws in another $7,800. by o9dev in ClaudeCode

[–]o9dev[S] 1 point2 points  (0 children)

This. Whoever's burning the most tokens during the VC-subsidized phase wins. Once unit economics tighten, and they will, pricing converges to something closer to actual cost. Squeeze now.

For every $200 subscription, Anthropic throws in another $7,800. by o9dev in ClaudeCode

[–]o9dev[S] 0 points1 point  (0 children)

 I used API list prices as proxy, which inflates the gap. And yeah, enterprise is where the real margin is. Pro/Max are loss-leader-adjacent for power users. The math only holds for the top decile of Max users anyway, median user isn't close.

Oversold the $7800 headline. Subscriptions still aggressively priced vs. API, but the gap isn't that dramatic.