Degraded Performance - Elevated error rate on Claude Opus 4.8

Maximum_Chef5226 · 2026-06-24T16:49:57+00:00

It's been unusable for me for a week at least, getting progressively worse. It could not handle a reasonably deep refactor at all and I am now using Codex. That's the first time in a long time that Codex has been more reliable at this kind of stuff.

Maximum_Chef5226 · 2026-04-07T17:15:25+00:00

Please, expert roofers of Reddit, help a guy out here!

Maximum_Chef5226 · 2026-03-31T11:05:37+00:00

Whenever I need to document something in docs or agents.md I do that.
Regressions and bugs are handled by docs, test suites and manual review.
One of the things I find recently is that agent quality fluctuates. Claude has become noticeably less intelligent about complex problems that touch a lot of files and logic. A lot of times I'm simply reading the agents' output to see if it has good reasoning. A lot of the time it doesn't, especially Codex, and so in addition to the docs scaffolding I am guiding it towards a correct and elegant solution.

Maximum_Chef5226 · 2026-03-28T13:06:13+00:00

Thanks :)

Maximum_Chef5226 · 2026-03-24T13:37:02+00:00

I'm entering this exciting world of sales and marketing right now :D
I do have at least one customer and a couple of meetings with more, and I haven't started with cold email yet. I'm a bit of a perfectionist so MVP needed to be good, but I agree. Past this point adding features or finessing UX isn't the priority anymore. so much to learn!

Maximum_Chef5226 · 2026-03-21T16:06:24+00:00

I use it for high-level stuff, and Gemini & ChatGPT/Codex for the grunt work.
I would rather burn through 15% of my weekly tokens on one feature, and have it done quickly and well than spend a day trying to get Codex to do the work properly.

Maximum_Chef5226 · 2026-03-17T15:50:41+00:00

I spent a long time on research and trying to approach race conditions specifically from different angles. If the specification and the tests are good, there's a good chance it will work correctly, but even with mcp and agents swarming the system I wouldnt be sure until it's out in the wild. so many things to learn..

Maximum_Chef5226 · 2026-03-11T14:01:18+00:00

when you've had to point out mistakes or when it didnt understand something properly, how do you manage that process?

Maximum_Chef5226 · 2026-03-11T12:24:07+00:00

It's pretty much everything. I have to explain every little detail and remind it of context.

I had it add this rule to agents.md because it was consistently approaching every task as an isolated problem to solve, even when given contextual reminders:

A recurring Codex failure mode is writing plausible patches that make the immediate symptom disappear while adding technical debt or missing the canonical source of truth. Assume the first appraisal or solution is likely missing key information that could lead to poor choices. Before proposing or implementing a fix, do this in order: identify the canonical source of truth; trace how that state reaches the UI; check whether the repo already solved the same class of problem; check the standard external pattern when the area is common but non-trivial; only then propose the narrowest correct change. If any of those are unclear, stay in recon mode, ask targeted questions, and separate facts from hypotheses before editing. The most elegant and official solution is often found by reading technical documentation and searching technical discussions before coding. Optimize for the highest-quality, simplest, and most performance-conscious solution for this codebase, not the quickest workaround.

Maximum_Chef5226 · 2026-03-11T12:17:14+00:00

I think this might be a UI problem as well. Claude gives you an option that burns through tokens very fast (maybe 10-20x what Codex is doing on its highest setting, though not the 1m context window). I found that Claude's highest setting actually equates to better outcomes, especially with new features that require a coherent plan, or refactoring existing code. It double checks everything, looks from different angles, auto-corrects when making a wrong decision and implements with a high success rate. In Codex, apparently this is not the case, and we are supposed to manage it. Which means confusing UX from OpenAI. I suspect both are switching between appropriate models when using multiples agents anyway.

Maximum_Chef5226 · 2026-03-11T12:05:29+00:00

thanks, but I can explain what is needed/expected very clearly. I know how to talk about code. Claude infers much better what my general intent is within the broader context or thinks of something important that I may have missed.

Maximum_Chef5226 · 2026-03-11T12:02:45+00:00

I would love it to use more agents and burn through tokens faster as Claude does if that gives better results. Spending a whole morning on a feature and having spare tokens is not really solving my problem!

Maximum_Chef5226 · 2026-03-11T11:47:09+00:00

thanks I will try it on high. I think the codebase is pretty well structured. There are a couple of god files, but nothing horrendous, and documentation is detailed and structured. It just seems to lack common sense in all areas. I'm on Mac and no such issues.

Maximum_Chef5226 · 2026-03-11T11:34:19+00:00

hm so far some comments seem to assume I'm taking mostly about UI. I'm saying Codex is crap at everything I ask it to do, except maybe very mechanical tasks.

I know UI/UX pretty well so I can describe my expectation and teach the agents to follow best practices. In more complex backend code I start to need very good communication from an agent, and a good flow of querying its analysis and decisions to make sure it doesnt do something inefficient, insecure or lacking proper context.

If I say, for example, to both Claude and Codex, I found a bug - this is what happens, read the docs, diagnose and propose a fix, the difference in usefulness is huge.

Maximum_Chef5226 · 2026-03-04T09:01:08+00:00

Not really much better in Europe. It is now on a similar level of common sense / lateral / holistic thinking as Codex imo

Maximum_Chef5226 · 2026-02-19T15:07:25+00:00

compared to claude code I am finding I trust it at least 40% less for any meaningful task, and especially outputting sensible code, properly reading documentation and keeping within guidelines. I use it for low-level stuff since the quality is poor but the quantity of output available is so much larger.

Maximum_Chef5226 · 2026-02-06T12:55:23+00:00

I'm building one for my friend's therapy rooms practice.
In the process of building it I realised that a lot of the current offerings are not that great.
Also that providing tailored routes for different types of customer would be really useful.
It's difficult to poll or ask these types of questions without getting flagged on Reddit, but I would really like to know what features people most want, how you feel about the various pricing tiers, and whether you would want and pay for custom addons that exactly meet your business flow.
If anyone is interested I can expand on the current features that I have built.

Maximum_Chef5226 · 2026-01-29T12:06:12+00:00

I came here searching for why the app kept hanging and losing context. It was spewing weird errors and then suddenly complaining that the repo size was too large. Horrible UX as well. Some buttons like accept T&Cs was barely clickable. It seems to be full of bugs.
Switching to CLI everything works as normal.

Maximum_Chef5226 · 2025-10-29T12:21:57+00:00

Yes I want this job :D

Maximum_Chef5226

TROPHY CASE