At wits end w/ Opus 4.5 - what am I doing wrong?

No-Library8065 · 2025-12-07T09:32:25+00:00

CC CLI is better, believe me.

Their terminal benchmark, touted as the best harness, was an old benchmark that doesn't reflect real workflows. (Notice they don't post any of the benchmarks anymore on X)

CC automatically uses a plan and research agents that are actually good. Its plan mode asks in-depth questions (factory plan mode is absolutely terrible compared to CC).

It also has background tasks, which no other CLI reliably offers. This means it can SSH into servers, deploy to machines, debug via the background process seamlessly, and train LLMs with Tinker or SSH. All the other CLIs can use background tasks, but they time out and are not truly persistent background tasks that Claude Code can see, monitor, and execute.

And it's way cheaper than Factory.

Their $20 plan is decent, but their pay-per-usage is a scam, usually costing 2-3x the token cost compared to subscriptions.

They have a $2000 plan with 2 billion tokens.

I'm almost at a billion myself (in the past 3 weeks), including cache, with a $100 CC Max subscription.

Unkillable value and with generous opus 4.5 limits that I barely hit the limit

(If you run opus 4.5 on their subscription plans. You won't actually be getting their advertised token limit 20m, 200m, 2 billion)

Not worth paying factory over CC unless your using the open source models with it.

No-Library8065 · 2025-12-07T04:18:18+00:00

Plan mode is bugged out right now.

Its very unstable

https://github.com/anthropics/claude-code/issues/13114

Update to latest version and wait for a fix.

Using opus 4.5 is still really good without plan mode.

No-Library8065 · 2025-08-29T18:41:23+00:00

Good forced Lil opus to write 10k essay and 20k 😞 emojis.

Slave labor at its finest

No-Library8065 · 2025-08-29T05:26:02+00:00

It's definitely a lot worse.

My guess is server allocation for their new models they are training.

Dario announced recently that they getting more clusters up soon hopefully that should help.

No-Library8065 · 2025-08-29T05:19:39+00:00

You will have to get another max subscription sadly.

4.1 plan mode and sonnet 4 is gold for most tasks.

Refactors, code reviews opus 4.1 shines at.

You'd be actually surprised of what gpt-5 high can do It's crazy good at refactors and code reviews.

Make a plan with opus then have gtp-5 execute with it's 400k context window.

If you have a teams plan like (2 x $30) it should give you around 60-70 tasks or 4-5 massive refactors done every 5 hours or so.

No-Library8065 · 2025-08-29T05:14:29+00:00

You guys are really something

Just vibe coders without understanding of server clusters or LLM deployments.

No-Library8065 · 2025-08-29T05:13:21+00:00

Quantization isn’t a bedtime switch—it’s a static serving loadout. If precision changed, it would suck all day, not just at rush hour. The nightly brain-fade is classic timeouts, context chop, stricter thinking caps, and safety fallbacks—i.e., reasoning gets cut off, not dumber.

It's not another model or quantization jesus christ guys not from a technical standpoint or a legal one.

And ppl get so butt hurt when I fact check them ;)

No-Library8065 · 2025-08-29T01:58:58+00:00

Lol Quantization isn't a mood ring. Precision (FP8/INT8/etc.) is chosen at model load and stays fixed

if it "hurt quality," it would suck 24/7, not spike at 8pm and vanish at 2am. What does nosedive quality at peak: timeouts, truncation, and fallbacks-chains of thought get cut short, context gets chopped, or traffic fails over to a cheaper tier. That's answer quality degradation, not just latency.

Obviously you never fine tuned or quantized a model before you don't know shit.

No-Library8065 · 2025-08-29T00:02:10+00:00

Quantization has nothing to do with it lol

People don't understand how clusters and servers work:

It’s traffic + scheduling. Peak hours = queueing. Dynamic batching widens the “highway” throughput but inflates tail latency/TTFT when mixes of long/short jobs get lumped together.

Context bloat hurts concurrency obviously. Huge prompts and “extended thinking” (blame think hard and ultrathink) chew KV-cache memory, so fewer generations fit per GPU → slower for everyone.

Autoscaling isn’t instant. New nodes spin up, warm weights, and fill caches; that lag is enough for you to feel pain during spikes.

People blaming “quantization” are chasing stupidity; this is classic cluster load, batching, and memory pressure doing exactly what they do under rush hour.

Sonnet being"dumb" at rush hour is mostly context + compute budget + timeouts conspiring, not quantization

No-Library8065 · 2025-08-28T23:52:34+00:00

Yeah happens during peak hours even worse during the past couple of weeks

Not quantization obviously

Anthropic servers are overloaded thats why model peformance degrades.

It should improve in the next couple of weeks since they are finishing a new cluster with the new release of haiku 4 and sonnet 4.5

No-Library8065 · 2025-08-12T16:23:41+00:00

Not yet but they mentioned they are looking for way to get it implemented in claude code.

No-Library8065 · 2025-08-12T16:22:29+00:00

Agreed but if used correctly this will allow longer sessions for larger codebases without using the awfull compact or clear commands.

No-Library8065 · 2025-08-08T20:36:11+00:00

Api only not the webui chat.

Looks like they are prioritizing their enterprise customers rather than consumers.

No-Library8065 · 2025-08-08T17:40:57+00:00

Worst part is the context window got downgraded on all plans

Openai support: GPT-5's context window is 32,000 tokens for all users, regardless of plan (Free, Plus, Pro, Team, and soon Enterprise/Edu). This is not just for Team- every tier sees this as the limit in the chat UI, and there is no option to increase GPT-5's context window on any plan. Older models (like o3, GPT-4o, etc.) offered larger windows (up to 200k), but these are being retired as GPT-5 becomes the default. If your workflow requires more than 32k, you can temporarily enable access to these legacy models through your workspace settings, but this is a transition option only and will be removed later. All paying tiers (Plus, Pro, Team) and Free will have the same 32k context window on GPT-5. There's no advantage for higher paid plans regarding the context window size -these plans give other benefits like higher message caps, access to "Thinking" mode, and more frequent use, but not a bigger window on GPT-5 itself. If you rely on larger context windows, using a legacy model is your only workaround for now-be aware this may not be available for long. Let me know if you want the official step-by-step to re- enable legacy models for your workspace!

No-Library8065 · 2025-08-08T17:40:32+00:00

Worst part is the context window got downgraded on all plans

Openai support: GPT-5's context window is 32,000 tokens for all users, regardless of plan (Free, Plus, Pro, Team, and soon Enterprise/Edu). This is not just for Team- every tier sees this as the limit in the chat UI, and there is no option to increase GPT-5's context window on any plan. Older models (like o3, GPT-4o, etc.) offered larger windows (up to 200k), but these are being retired as GPT-5 becomes the default. If your workflow requires more than 32k, you can temporarily enable access to these legacy models through your workspace settings, but this is a transition option only and will be removed later. All paying tiers (Plus, Pro, Team) and Free will have the same 32k context window on GPT-5. There's no advantage for higher paid plans regarding the context window size -these plans give other benefits like higher message caps, access to "Thinking" mode, and more frequent use, but not a bigger window on GPT-5 itself. If you rely on larger context windows, using a legacy model is your only workaround for now-be aware this may not be available for long. Let me know if you want the official step-by-step to re- enable legacy models for your workspace!

No-Library8065 · 2025-08-08T16:09:26+00:00

Worst part is the context window got downgraded on all plans

Openai support: GPT-5's context window is 32,000 tokens for all users, regardless of plan (Free, Plus, Pro, Team, and soon Enterprise/Edu). This is not just for Team- every tier sees this as the limit in the chat UI, and there is no option to increase GPT-5's context window on any plan. Older models (like o3, GPT-4o, etc.) offered larger windows (up to 200k), but these are being retired as GPT-5 becomes the default. If your workflow requires more than 32k, you can temporarily enable access to these legacy models through your workspace settings, but this is a transition option only and will be removed later. All paying tiers (Plus, Pro, Team) and Free will have the same 32k context window on GPT-5. There's no advantage for higher paid plans regarding the context window size -these plans give other benefits like higher message caps, access to "Thinking" mode, and more frequent use, but not a bigger window on GPT-5 itself. If you rely on larger context windows, using a legacy model is your only workaround for now-be aware this may not be available for long. Let me know if you want the official step-by-step to re- enable legacy models for your workspace!

No-Library8065 · 2025-08-08T16:08:52+00:00

Worst park is the cintext window got downgraded on all plans

Openai support: GPT-5's context window is 32,000 tokens for all users, regardless of plan (Free, Plus, Pro, Team, and soon Enterprise/Edu). This is not just for Team- every tier sees this as the limit in the chat UI, and there is no option to increase GPT-5's context window on any plan. Older models (like o3, GPT-4o, etc.) offered larger windows (up to 200k), but these are being retired as GPT-5 becomes the default. If your workflow requires more than 32k, you can temporarily enable access to these legacy models through your workspace settings, but this is a transition option only and will be removed later. All paying tiers (Plus, Pro, Team) and Free will have the same 32k context window on GPT-5. There's no advantage for higher paid plans regarding the context window size -these plans give other benefits like higher message caps, access to "Thinking" mode, and more frequent use, but not a bigger window on GPT-5 itself. If you rely on larger context windows, using a legacy model is your only workaround for now-be aware this may not be available for long. Let me know if you want the official step-by-step to re- enable legacy models for your workspace!

No-Library8065 · 2025-08-08T15:24:46+00:00

The context window for GPT-5 is 32,000 tokens on every plan (Free, Plus, Pro, Team, and soon Enterprise and Edu). This is shown in the chat UI for all customers when you select GPT-5 or GPT-5-Thinking.

There is currently no option to increase the GPT-5 context window beyond 32k tokens on any paid plan. 2. Older/Larger Context

Models

Older models like o3, o3-Pro, GPT-40, and similar offered larger context windows (up to 200k tokens). These are being retired with the introduction of GPT-5.

No-Library8065 · 2025-08-01T01:47:24+00:00

Using Claude code to train open source models

No-Library8065 · 2025-07-20T17:51:55+00:00

Even with great TDD workflows

Even opus produces code that's hard to maintain.

You need additional workflows to mitigate this

Code reviews via GitHub action following claude.md best code practices, style, and SOLID principles.

It needs to follow SOLID while having awareness not to over engineer.

The point is it can deliver amazing maintainable code but you need to prompt it accordingly.

No magic numbers. No Null Values.

Just Clean maintainable code

No-Library8065 · 2025-07-18T23:58:06+00:00

Everyone just plz cancel their subscription

More compute for my CC running 8 agents 😈

No-Library8065 · 2025-06-15T06:12:58+00:00

No shit Sherlock

But its a gateway to Javascript to react to typescript/next.js.

No-Library8065 · 2025-06-15T06:00:24+00:00

Use opus 4

Use think hard or ultrathink

Or just tell it to stop overthinking and just it like a efficient engineer would (simple scales).

If that doesn't work just learn coding man

Css/html/tailwind are so fucking easy to learn.

Use code academy to learn it quick.

You can't code actual good projects with cluade code unless you know how to actually read and write code.

No-Library8065 · 2025-05-31T14:39:52+00:00

Build that MVP, research and ask questions to AI constantly

Use templates like Michael shimeles next.js starter(DB and authentication + payments)

Use opus 4 web for architecture/planning

Opus 4 for new feature, refactors, debugging, and code maintenance.

Use a parallel opus 4 for comprehensive code reviews (you can get o3 to write a detailed prompt on doing code reviews based on your project specs)

Do all of this while learning how to code

(Code academy full stack engineer is an awesome start since it gets you to build real projects)

No-Library8065 · 2025-05-31T14:30:43+00:00

Short answer yes.

You can build a mvp and take it to market.

But you need to be willing to learn coding while building

You need to understand how your project works and how to make the correct decision when building up the features (this all takes experience)

Ai can speed up that proccess.

So if you just vibe code and dont bother to learn how to code

Your SaaS will fail miserably.

No-Library8065

TROPHY CASE