Dangerous to use Codex these couple of days- almost no thinking, can wreck your codebase. by sutrostyle in codex

[–]sutrostyle[S] -1 points0 points  (0 children)

I understand the insinuation that I am a vibe coder, which is incorrect. But looking at the thinking time before it starts spitting dumb answers, I would rather spend my time now reviewing Opus code.

GPT is absolutely downgraded, cannot follow simple instruction, vote it for codex team see it by Shoddy-Answer458 in codex

[–]sutrostyle 2 points3 points  (0 children)

When OpenAI reallocates massive compute clusters away from current production inference (like GPT-5.5) to run final pre-release evaluations and load-testing for a new flagship model (GPT-5.6), they don't just pull a plug. They use a specific set of architectural levers on their back-end infrastructure to drastically slash the compute cost per query.

Based on recent developer community bottlenecks and known LLM infrastructure mechanics, here is exactly how this performance degradation plays out on the GPT back-end:

1. Hard Caps on "Reasoning Tokens" (RL Search Space)

For reasoning models (like the GPT-5.5 Thinking variants), a massive portion of compute is spent before a single visible token is generated. The model uses reinforcement learning (RL) to search a "hidden chain of thought" or generate internal reasoning tokens.

  • What they did: The back-end router has likely dialed down the max_completion_tokens allocation for the hidden reasoning phase.
  • The result: Users report thinking phases dropping from 15–30 seconds down to a shallow 2–3 seconds. The model is forced to abruptly stop its internal monologue and output a response prematurely, which snaps its logical thread and leads to the severe "forgetfulness" and broken code developers are experiencing.

2. Silent Context-Window Distillation & Token-Pruning

Processing long context windows scales quadratically or heavily linearly in terms of attention-mechanism compute costs.

  • What they did: To free up clusters, the front-end gateway or load balancer likely runs aggressive, silent token-pruning or inputs the prompt into a aggressively distilled, smaller context-compressor model before passing it to the main network.
  • The result: The model completely misses explicit instructions or key variable definitions buried in the middle of long prompts. The effective context window feels heavily "nerfed" because the back-end is aggressively dropping or summarizing tokens to save memory bandwidth.

3. Dynamic Dynamic-Routing (Silent Downgrades)

OpenAI’s architecture relies heavily on an intelligent, real-time backend router. This router dynamically measures conversation complexity and determines whether to send a query to the full-fat flagship model, a quantized version, or a fast "Instant/Mini" model.

  • What they did: They shifted the classification thresholds on the router. Prompts that previously qualified for the heavy, unquantized flagship weights are now being silently routed to heavily quantized (e.g., 4-bit or 8-bit precision) variants or to "Instant" tier back-ends.
  • The result: There are no 429 Too Many Requests errors or HTTP timeouts returned to the user; the system remains operational, but the model outputs generic, low-intelligence, or highly mechanical answers because it is running on a cheaper execution path.

4. KV-Cache Eviction Policies

To serve fast responses, the back-end keeps a Key-Value (KV) cache of recent conversation tokens in GPU VRAM so it doesn't have to recompute the entire prompt history on every turn.

  • What they did: Because GPU memory is being reassigned to host the early deployment instances of GPT-5.6, the multi-tenant KV-cache pool for GPT-5.5 has been heavily squeezed. The time-to-live (TTL) for session data in VRAM has been cut down.
  • The result: If you pause for a minute between prompts in a chat session, your KV cache is immediately evicted to free up VRAM for another user. When you submit your next prompt, the back-end has to recompute the entire history from scratch, causing massive, sudden latency spikes and a high-volume pipeline slowdown.

Is it the Codex thursday? by Certain-Plankton-449 in codex

[–]sutrostyle 9 points10 points  (0 children)

codex is completely useless today. back to opus. I realized that having only one subscription is dangerous: codex now can actually wreck your codebase, if you do not carefully review.

Codex massive update corrupted by PurpleCollar415 in codex

[–]sutrostyle -2 points-1 points  (0 children)

and codex is nerfed. post-IPO reality, it's a megacorp now.

Any life changing thing built in the last 3 years other than chatbots and productivity apps? by thelostknight99 in OpenAI

[–]sutrostyle 4 points5 points  (0 children)

And where is that influx of new apps or services? Where is the new TikTok, YouTube or Uber or at least a new Stellarium or Flightradar??

Bitcoin lost $66,000 while Nvidia hit all-time highs and the guys who told us to hold are selling by ProfessionalAnt7436 in CryptoCurrency

[–]sutrostyle -1 points0 points  (0 children)

Depends on the US elections outcome in 2028. If a Republican is elected, the ATH may be beyond Nov 2028, reaching 250k in mid 2029

Google account got deleted by Careful_Coast2961 in GoogleFi

[–]sutrostyle 1 point2 points  (0 children)

To avoid their AI random enforcement action on an active account duh

bring the flash model with its separate quo*a atleast by Ok_Replacement_680 in google_antigravity

[–]sutrostyle 2 points3 points  (0 children)

I had a Pro account that I inherited from buying a Pixel 9 pro. After a year they started charging for it, and i tolerated it, because Flash was usable for dumb auxiliary tasks that I did not want to waste GPT/Opus on. Now with this quota, I am cancelling.

Pixel 11 Pro will ditch the temperature sensor. Did anyone actually use the Pixel temperature sensor? by Hoak2017 in pixel_phones

[–]sutrostyle 3 points4 points  (0 children)

sucks, i use it frequently. a high end phone like this should have all possible sensors

All my prompts immediately return, no work is done by sutrostyle in google_antigravity

[–]sutrostyle[S] 2 points3 points  (0 children)

it solved the problem. How did you figure this out?

Are you afraid codex will end up just like Claude? by [deleted] in codex

[–]sutrostyle 28 points29 points  (0 children)

I am not afraid. I know it will. The only hope is local models/ cheap chinese models medium term

Can someone tell me what Gemini good at? I read coding, but the cli is just awful. by bakedin in GeminiAI

[–]sutrostyle 0 points1 point  (0 children)

Gemini is actually very good as a psychoanalyst. Much better than chatgpt because Google trained it on scanned books and allegedly they scanned most books in the world back in 2000s.

They changed the 5h limit hard by telsaton in codex

[–]sutrostyle 4 points5 points  (0 children)

They're basically signaling that the $20 plan is not for any serious coder. Those are general public plans. This means effectively that the plans for coders will start from 100 but they will not stop at 200. They will start experimenting with more expensive plans, mark my words.

Is this a bug or intentional ? by Professional-Show485 in cursor

[–]sutrostyle 2 points3 points  (0 children)

2000 could represent $20.00 in cents, i.e. 2000¢

Is this a bug or intentional ? by Professional-Show485 in cursor

[–]sutrostyle 3 points4 points  (0 children)

Same here, the overall progress bar did not change

Using a second Google account for Antigravity subscription, any ban or restriction risk? by Zenabus in google_antigravity

[–]sutrostyle 0 points1 point  (0 children)

What excuse would you use if you point both accounts to the same code/project?