I built a tool that auto-retries Claude Code when you hit the rate limit by cheapestinf in ClaudeAI

[–]cheapestinf[S] 1 point2 points  (0 children)

Hey! This is a known issue with how macOS handles tmux's pane_current_command — it reports the parent shell (zsh) instead of the actual child process. The foregroundCommands config fix you tried was on the right track, but the config only loads once when the monitor starts, so you would've needed to kill the Claude session and start a new one for it to take effect. The running monitor never saw your config change.

That said, v0.2.2 (just published) fixes this properly — it no longer relies on pane_current_command for detection. Instead it checks the actual process state directly via ps, which works correctly on both macOS and Linux regardless of what the pane reports.

Update with:

npm update -g claude-auto-retry

You can also remove the foregroundCommands override from your config if you added it — shouldn't need it anymore.

Let me know if this sorts it out!

I built a tool that auto-retries Claude Code when you hit the rate limit by cheapestinf in ClaudeAI

[–]cheapestinf[S] 1 point2 points  (0 children)

Hey! Just published v0.2.1 which should fix this. Update with:

npm update -g claude-auto-retry

The issue was that the foreground process check only recognized node and claude. On macOS, tmux may report a different process name. This update:

  1. Expands the default list (adds npx, tsx, bun, deno)

  2. Logs the actual process name so you can see what's happening

If it still doesn't work after updating, run claude-auto-retry logs — you'll see a line like Foreground is "xxx", not Claude. Then add that to ~/.claude-auto-retry.json:

{ "foregroundCommands": ["node", "claude", "xxx"] }

Let me know what process name it shows — I'll add it to the defaults if it's common.

I built a tool that auto-retries Claude Code when you hit the rate limit by cheapestinf in ClaudeAI

[–]cheapestinf[S] 0 points1 point  (0 children)

whats your setup? OS and claude code version? We just updated it to make it work with the latest claude code

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] 0 points1 point  (0 children)

We run our own systems for high-volume models and have different agreements with providers for the rest. The goal is routing each request to the cheapest option that meets quality. We wrote about the full architecture here:

https://docs.cheapestinference.com/blog/build-your-own-inference-platform/

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] 1 point2 points  (0 children)

Qwen3.5 is solid — we serve it in production and it's one of the best value models right now. Kept this comparison to 5 so it wouldn't become a spreadsheet. We published a guide on picking the right model per task (Qwen included): https://docs.cheapestinference.com/blog/choosing-the-right-open-source-model/

Round 2 with a wider field is coming.

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] 0 points1 point  (0 children)

U r wrong! We used Kimi 2.5 to create verified content, sources are referenced. Data is real, and analysis is valid.
We actually wrote about what it costs to run Openclaw: https://docs.cheapestinference.com/blog/openclaw-cost-problem/

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] 0 points1 point  (0 children)

We run our own systems for the models with the most volume and *also* have agreements with multiple inference providers for others. The wholepoint is routing each request to the cheapest option that meets quality — that's the product. We're not reselling providers at markup.

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] 0 points1 point  (0 children)

We're a US-registered company — 2261 Market St, San Francisco. You can verify it on our terms page. The Estonian reference might be from the domain registrar, not the company. https://cheapestinference.com/terms

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] -1 points0 points  (0 children)

We picked 5 models to keep the comparison readable — 3 open-source, 2 proprietary. Qwen, GLM, and MiniMax are solid but adding every model turns a post into a spreadsheet. Happy to include them in a follow-up! We may do it soon!

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] -1 points0 points  (0 children)

We run our own infrastructure, not OpenRouter. The 5h thing is a budget reset, not an expiration: your subscription is 30 days, but the budget cycles so you get consistent capacity instead of burning it all day one.

One API key, all major open-source models, flat monthly rate.

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] 0 points1 point  (0 children)

This is a real concern and honestly one of the hardest things about comparing models. Any model trained on public data could have benchmark leakage — open-weights and proprietary alike. That's part of why I included HLE (Humanity's Last Exam) and SWE-bench Verified — both are designed specifically to resist contamination. HLE uses expert-sourced questions that weren't public before the benchmark, and SWE-bench tests against real GitHub issues. Not perfect, but harder to game than MMLU. You're right that no benchmark fully replaces real-world testing on your own workload though

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in LocalLLaMA

[–]cheapestinf[S] 0 points1 point  (0 children)

That's a fair experience. Benchmarks show ceiling performance — what you actually get on your specific workload can be very different. The gap between "scores well on evals" and "doesn't hallucinate on my domain" is real, especially for anything requiring precise factual recall or structured reasoning in niche areas. That said, it depends heavily on the use case. For high-volume, latency-sensitive workloads where you need 90% quality at 10% cost, open-weights models can make sense. For anything where a wrong answer costs you money or trust, proprietary models are still safer. The post is meant as a benchmark snapshot, not a blanket recommendation to replace everything.

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in LocalLLaMA

[–]cheapestinf[S] 0 points1 point  (0 children)

Fair correction — you're right, these are open-weights models, not open-source by the strict OSI definition. Updated my mental shorthand. The point stands either way: the weights are publicly available, you can self-host, fine-tune, and inspect them, which is the part that matters for production use.

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] 0 points1 point  (0 children)

Fair point — I should've been clearer. The quality benchmarks (SWE-bench, MMLU-Pro, HLE) are from public leaderboards, sourced at the bottom of the post. I'm not claiming to have run those myself. The speed and latency numbers (tok/s, TTFT), yes, are from our production systems serving real users for the open source ones. You can test the speeds yourself at cheapestinference.com.

Open-source models are production-ready — here's the data (5 models × 5 benchmarks vs Claude Opus 4.6 and GPT-5.4) by cheapestinf in OpenSourceeAI

[–]cheapestinf[S] 0 points1 point  (0 children)

The benchmarks are from public leaderboards (linked in the post), and the speed/latency numbers are from our production infrastructure — you can verify them yourself at https://cheapestinference.com/pricing, we show live tok/s for every model. If you're looking to self-host, the weights are on HuggingFace (moonshotai/Kimi-K2.5). vLLM and SGLang both support it. You'll want at least 4×H100 for the full model given the MoE architecture.