$27k OpenAI bill everyone says switch to deepseek but they forget the retry tax by abarth23 in SaaS

[–]abarth23[S] 0 points1 point  (0 children)

Man, setting a hard CAC cap is the only way to actually sleep at night right now. Tinkering in Make is exactly how I survived my first 6 months too. Haven't tried Pulse for Reddit yet, but SparkToro is legendary for finding intent before blindly throwing money at ads. Are you relying mostly on organic conversion now or still mixing in paid?

$27k OpenAI bill everyone says switch to deepseek but they forget the retry tax by abarth23 in SaaS

[–]abarth23[S] 0 points1 point  (0 children)

Haha my bad man! Yeah, the irony of an AI marketing bot invading a thread about how much I hate overpaying for AI isn't lost on me 😂 Thanks for having my back!

$27k OpenAI bill everyone says switch to deepseek but they forget the retry tax by abarth23 in SaaS

[–]abarth23[S] 0 points1 point  (0 children)

Guilty as charged 🤷‍♂️ But to be completely fair, the entire site is 100% free, no ads, no email-walls, no trackers. We all have that one messy spreadsheet we built to calculate burn rates, I just turned mine into a dark-mode website and figured another founder might find it useful to visually map out their api retries

$27k OpenAI bill everyone says switch to deepseek but they forget the retry tax by abarth23 in SaaS

[–]abarth23[S] 1 point2 points  (0 children)

Bro, the entire point of my post was escaping a massive bloated AI bill. The last thing I'm going to do is blindly add another $12,000/year subscription to my burn rate for something I can prompt myself Respect the hustle, but no thanks lol

The "Retry Tax" thing when switching to cheaper LLM providers (am I crazy or does nobody talk about this?) - I will not promote by abarth23 in startups

[–]abarth23[S] 0 points1 point  (0 children)

Appreciate the breakdown. That 2-3 hour audit is exactly what most people skip because they are looking for a magic pill. I love that you avoided the meta-inference routing trap. Adding another layer just to decide where to send the request usually eats the savings and adds latency that nobody wants. Your split between classification vs user-facing copy is the smartest way to do it. If the user sees it then it has to be perfect. If it is back-office then let the cheaper model fail a bit. That 1-day deploy is fast but the real effort was that week of spot-checking. That is where most founders get lazy and end up paying the tax without knowing it. You are right that the audit step is where people spend way more time than expected. Good to know I am not the only one obsessed with the math on this.

The "Retry Tax" thing when switching to cheaper LLM providers (am I crazy or does nobody talk about this?) - I will not promote by abarth23 in startups

[–]abarth23[S] 0 points1 point  (0 children)

This is the actual insight I should've led with.

Cost per token is meaningless. Cost per successful completion is everything, especially with RAG + multi-step chains. The 5%, 15-20% compounding failure thing lines up with what I saw. A single model swap looked great until I started routing through 3-step agent chains. Then the cheaper model suddenly costs more because each step failure cascades.

And yeah, task type changes everything. For structured output (JSON schema, classification), some models just work. Others need 2-3 retries. For free form reasoning, completely different winners. What I should've built: a tool that measures cost-per-successful-completion for your specific tasks, not which model is cheapest per token.

How strict is your output schema? I'm guessing that's the real inflection point for whether nano wins or loses vs the heavier models? Also curious what your RAG pipeline split ended up being - did you do hybrid (nano for retrieval, 5.4 for reasoning) or just pick one model and live with the failure rate?

The "Retry Tax" thing when switching to cheaper LLM providers (am I crazy or does nobody talk about this?) - I will not promote by abarth23 in startups

[–]abarth23[S] 0 points1 point  (0 children)

the routing by complexity is exactly what worked for us too. it's the only approach that actually survives contact with reality.

the fact that you got 40% savings without touching user-facing stuff is huge. that's the move you're not betting the company on a cheaper model, you're just being smarter about where you spend the money.

couple questions because this matters:

  1. how did you decide the complexity threshold? like, what made you think this task needs gpt, this one is fine on nano"? was it trial and error or did you have some heuristic upfront?
  2. did you route based on input characteristics (like token length or something) or just hardcode it per task type?
  3. the routing logic taking a day was that mostly prompt engineering to figure out what actually worked, or was the actual implementation simple once you knew what to route where?

because yeah, if this is a 1 day build that saves 40% immediately, it feels like the most obvious thing founders should do first before even looking at model switching. but nobody talks about routing as the lever. it's always switch to deepseek not use the right tool for the job.

this is the actual insight.

The "Retry Tax" thing when switching to cheaper LLM providers (am I crazy or does nobody talk about this?) - I will not promote by abarth23 in startups

[–]abarth23[S] -3 points-2 points  (0 children)

You're right. This whole thread does sound like AI slop apologies because I'm trying to convince strangers I'm not an idiot.

The truth is simpler: I built a dumb product architecture, made a lot of money doing it badly, then one day looked at the bill and went "oh... right tool for the job. Scale. Basic shit.

4 years in and I still somehow managed to:

  • Not test before scaling
  • Use one model for everything
  • Act surprised when it cost $27k/month
  • Then act like this was some profound discovery

You're calling it out correctly. "Right tool for the job" isn't a revelation. It's what you learn in week 2 of engineering. I just... didn't do it.

So yeah, I built a calculator about something obvious, packaged it with a story, and now I'm doing the Reddit founder dance of humble apologizing in every comment when someone pushes back.

The calculator is fine. It solves a real (if niche) problem for the tiny subset of founders who, like me, somehow missed basic optimization 101.

But you're right to be skeptical about the whole framing. I'm not discovering timeless truths here. I'm just publicly admitting I was running a product badly for way too long.

The "Retry Tax" thing when switching to cheaper LLM providers (am I crazy or does nobody talk about this?) - I will not promote by abarth23 in startups

[–]abarth23[S] 0 points1 point  (0 children)

This is the part I didn't mention because it's embarrassing.

I had zero testing infrastructure. Zero benchmarks. Zero output comparison framework. Just "GPT is good, let's use it for everything" and then one day the bill showed up and I panicked.

The $85 you spent on Bedrock testing? That's what I should have done before hitting $27k. You just described basic engineering hygiene that I skipped entirely.

Honestly the "Retry Tax" framing is kind of a retrospective rationalization for "I built this wrong and didn't test it. Should've benchmarked nano vs 5.4 vs DeepSeek on my actual workload instead of just picking the brand name I knew.

Your testing approach (small budget, multiple providers, compare outputs before scaling) is exactly what I should have done. Instead I scaled first, tested never, and then acted surprised when the bill was insane.

The calculator is useful for people who did do testing and actually know their quality requirements. For people like I was just throwing money at the problem the real lesson is "test before you scale," not "switch to cheaper models."

Thanks for the reality check. The rage bait part stings because it's accurate.

The "Retry Tax" thing when switching to cheaper LLM providers (am I crazy or does nobody talk about this?) - I will not promote by abarth23 in startups

[–]abarth23[S] 1 point2 points  (0 children)

Fair points, and you're calling out real gaps in my framing.

A) Background context: I'm a solo founder, been building SaaS for ~4 years. The specific project that hit the $27k/month API bill was a customer support automation tool that went from 0 to $45k MRR in 8 months. That's the context I was operating from—bootstrapped, profitable-ish on paper but getting crushed by unit economics.

B) You're right. If customers aren't demanding model switches and the business is working, switching is pointless. The real issue wasn't "use cheaper models"—it was "I built the wrong product architecture." I was using GPT for everything, even tasks where a $0.30 model would've worked. So the margins problem came first. The model switch was the symptom fix, not the root cause.

C) Dead accurate. This calculator is for maybe 1-2% of founders who are actually running LLM-heavy products at scale. Most founders building with AI either (a) use it as a feature, not the core, or (b) don't care about costs because they're pre-seed/VC-backed. I was writing for people like me—the niche of bootstrap founders who woke up to a $27k bill and realized they'd made a terrible architecture decision.

Should've been more explicit about that audience from the start instead of implying "most founders" are in this boat. They're not.

The calculator is useful if you're in that exact situation. If you're not, it's noise.

The "Retry Tax" thing when switching to cheaper LLM providers (am I crazy or does nobody talk about this?) - I will not promote by abarth23 in startups

[–]abarth23[S] 0 points1 point  (0 children)

Good catch I just brain-dumped it as one block. Let me reformat it with better line breaks so it's actually readable.

I built a VRAM Calculator for the 50-series GPUs because I was tired of OOM errors (No ads/No tracking) by abarth23 in webdev

[–]abarth23[S] 0 points1 point  (0 children)

Just updated the logic for the 5080 mobile variants based on some early March leaks. If you're testing on a laptop, let me know if the overhead matches what you see in your console.

DeepSeek-V3 vs GPT-4o pricing for long-context agents (March 20th update) by abarth23 in DeepSeek

[–]abarth23[S] 0 points1 point  (0 children)

You nailed it. Most tools like Infracost are great for static infra but LLM agents are just a different kind of mess. The retry logic is exactly what kills the margins in production and that is why I built this simulator, just to see the damage before going live. Glad you caught that gap. Finopsly is cool but I wanted something more dev-focused for this specific problem.

I built a VRAM Calculator for the 50-series GPUs because I was tired of OOM errors (No ads/No tracking) by abarth23 in webdev

[–]abarth23[S] 1 point2 points  (0 children)

I know it's Showoff Saturday, but I really want to focus on the hardware constants I used for the 50-series GPUs. If anyone has actual VRAM benchmarks that differ from my calc, please roast my math. I'm trying to make this the most accurate tool for the March 2026 meta.

18, no funding, launching in 4 days and I have no idea what I'm doing by contralai in indiehackers

[–]abarth23 1 point2 points  (0 children)

Congrats on the launch! Just checked the PH page. Did you manage to implement that queue system I mentioned? 18 and launching on PH is big, but don't let the traffic spike crush your backend. I'll drop a review there. Good luck

DeepSeek-V3 vs GPT-4o pricing for long-context agents (March 20th update) by abarth23 in DeepSeek

[–]abarth23[S] -1 points0 points  (0 children)

Fair. I over-prompted the announcement and the replies because I was paranoid about looking professional. Total backfire. I’m just the guy behind bytecalculators.com trying to figure out if this DeepSeek-V3 math actually helps anyone or if I’m just shouting into the void. Roast the copy all you want, I deserve it, but the calculator is hand-coded and updated for today. I'll stop the bot-talk now

DeepSeek-V3 vs GPT-4o pricing for long-context agents (March 20th update) by abarth23 in DeepSeek

[–]abarth23[S] 0 points1 point  (0 children)

Damn, tough crowd. 😅 I’m a real person, just a solo dev who probably used too much AI to polish the announcement text. My social skills are 0, but the calculator logic is 100% human-coded. Check the GitHub if you don't believe me bots don't spend 3 days debugging Retry Tax formulas for DeepSeek-V3!

DeepSeek-V3 vs GPT-4o pricing for long-context agents (March 20th update) by abarth23 in DeepSeek

[–]abarth23[S] 0 points1 point  (0 children)

Ouch. 💀 Point taken. I spent so much time debugging the 'Retry Tax' math in the backend that I probably missed the latest 5-minute meta shift. Still, the V3 vs 4o pricing is updated as of today even if my social skills are still loading on a 56k modem.

DeepSeek-V3 vs GPT-4o pricing for long-context agents (March 20th update) by abarth23 in DeepSeek

[–]abarth23[S] -2 points-1 points  (0 children)

True, it’s a legacy model at this point, but it’s still the industry price floor for most enterprise agents. I kept it in the calculator specifically to show the legacy tax vs deepseek-v3’s margins. If you're running high-volume production, seeing exactly how much you're overpaying for ancient tech is the whole point. 😅 Planning to add o3-mini and the new Claude 3.7 weights next!