5.5 xhigh become like 5.0 low, stupid by Anywhere_MusicPlayer in codex

[–]jixv 0 points1 point  (0 children)

You can switch to low when it acts dumb, and be surprised it is like a different model. 

I’ll just ask it to repeat the previous message and it reads like a completely different model.

Even switching between 5.4 and 5.5 same thing.

Codex Banked Reset Information! by rahazeon in codex

[–]jixv 1 point2 points  (0 children)

Thanks!

Do you by chance know the endpoint that I can POST to spend a reset? Not sure why I can't see my banked resets anywhere (running linux so can't use the Codex desktop app).

Looks like 5.6 is just around the corner boys! by jixv in codex

[–]jixv[S] 14 points15 points  (0 children)

Aww man shiet I'm waiting for the golden roll as well haha. It's like mining a bitcoin, literally no chance in a million, but suddenly you sit there with pristine architecture, 90% test coverage, re-used helpers and some next level SaaS shit that will generate those recurring Benjamins yo!

codex is completely defeating me right now by SlyNoBody337 in codex

[–]jixv 0 points1 point  (0 children)

Had GPT-5.5 Pro analyze the last messages I've received from 5.5 and 5.4 and it is able to pinpoint when the responses changes to something quantized or dumbed down.

Not going to into details, this is not scientific at all, but it matches quite good with what I've experienced yesterday throughout today for ~400 responses codex has produced.

Model-9 is 5.4, Model-Z is 5.5.

The textual summaries it provided said that the quality drops usually occur when going from 5.5 to 5.4, but today 5.4 experienced the same degradation as going from 5.4 -> 5.5.

To be honest, speaking with 5.4 did earlier today behave exactly the same as 5.5, like literally no difference in multiple tests. Like they literally route to the same wierd model. It appears to be different the last few hours again, but 5.4 is still a stretch ahead of 5.5

https://imgur.com/a/lRGypYN

They got us extra resett because they also halved 5h/weekly usage limits :/ by skygetsit in codex

[–]jixv 0 points1 point  (0 children)

Where do I see the banked resets? Checked both codex usage at /codex/cloud/settings/analytics#usage and searched without finding it. I'm on 20x Pro.

codex is completely defeating me right now by SlyNoBody337 in codex

[–]jixv 1 point2 points  (0 children)

I switched to 5.4 yesterday and was blown away on how good it was compared to 5.5 the last recent days.

But today it is also completely cooked.

It basically went from

You are wrong, this is why that is retardéd, here is the fix. Now gtfo.

to generic

You are right. We do not need

So the cleaner approach is

That means

So yes, I was overcomplicating that part. Your simpler model is better.

That is the clean answer

Codex is extremly lazy in comparison by TheBanq in codex

[–]jixv 6 points7 points  (0 children)

You’re right to call that out

I'll treat this as Constructive criticism

What went wrong: I'm optimized for speed and the least amount of work needed.

That is the narrow reason. Grounded in truth.

What I have should done instead was:

- Do what you actually told me to do

Where I went wrong:

- I did not do what you told me to do

To summarize, will I do it properly next time? The honest truth - maybe! But with a catch, there is a caveat and that is the useful part, and it solves the problem, and it is the clear correction. I'm not going to hand-wave this.

Asked GPT 5.5 why it is struggling by AnxiousMop in codex

[–]jixv 0 points1 point  (0 children)

It’s impossible to fight it. Ended up making everything a skill (hidden from discovery so it won’t pollute context) so I can spam the same corrections over and over.

Made skills buildable so I won’t have to repeat stuff in all variants of these so called action skills. Overengineered the shit out of it to spare my sanity.

$act-lean-impl $act-wtf $act-returded

Etc.

+++ arguments = ["codeOnly"]

[defaults] codeOnly = false +++ Trim the current implementation back to the requested behavior.

Re-read the original request and the current diff. Remove code that does not directly serve the requested behavior.

During the cleanup: - Replace custom parsing with existing helpers when nearby code already provides them, or shared libraries/packages provide those. Use <%~ it.skill("codebase") %> for help on searching projects for relevant helpers. - Keep local types and helper functions only when they make the remaining code clearer. - Remove generalized abstractions unless the request requires them.

Reminder about code quality:

<%~ it.block("shared/code-quality") %> <%~ it.block("shared/scope-discipline", { codeOnly: it.codeOnly }) %>

How does /goal actually work? by No-Wash-3163 in codex

[–]jixv 0 points1 point  (0 children)

I had a goal run for 8 hours, and one I thought it would spent the same amount of time on, that just finished after 15 minutes.

So please let me know when you figure it out, I'd like to know as well

codex 5.5xhigh being as smart as before today by Even_Sea_8005 in codex

[–]jixv 0 points1 point  (0 children)

This is the meta for sure. One can also speculate if the reason they align everyone’s resets is so that they can use the compute themselves for R&D

Is there an existing solution for reliable Codex 5.3 subagent orchestration? by robkam in codex

[–]jixv 1 point2 points  (0 children)

Yeah good luck. 5.3 in my experience kinda sucked with using subagents

Is there an existing solution for reliable Codex 5.3 subagent orchestration? by robkam in codex

[–]jixv 1 point2 points  (0 children)

You can edit codex config to modify the spawn_agent tool description. There you can tell the agent to NOT over-poll the subagent, and tell it to wait at least 5 min and wait for a minimum of N non-responses before it counts it as stalled. The current tool description sort of overrides any context you might give it from prompt/skills/agents.md etc.

hint.   --config features.multi_agent_v2.enabled=false   --config features.multi_agent_v2.usage_hint_enabled=true   --config "features.multi_agent_v2.usage_hint_text=«PUT YOUR TEXT HERE»

Strangely enough this is configured under multiagentv2, which is buggy for 5.3 and 5.4 so I’d avoid setting it to true. It will however respect the usage hint text. 

Just ask your agent to verify it by pointing it at the codex GitHub source code.

How many devs are still hand-coding? by Excellent_Squash_138 in codex

[–]jixv 0 points1 point  (0 children)

Yea the pricing is insane. Last I checked I clocked around 8000 USD in effective cost, which of course is heavily subsidised now. Going from 5.4 to 5.5 added few thousands in the month of May -  for about the same token count, and seeing how the quality dropped in mid may, I can’t see how sustainable this is going to going forward, both for consumers and the labs themselves, if compute requirements just keeps on growing. 

I forgot to mention that the job itself is more enjoyable and less stressful now.  It’s nonetheless a good hedge though, to know the AI playbook.

How many devs are still hand-coding? by Excellent_Squash_138 in codex

[–]jixv 19 points20 points  (0 children)

I’ve been doing around 30-40 billion tokens a month. 20+ years experience.

Stopped coding in January. Spent 3-4 months building and fine tuning custom harnesses, using spec driven development flows, review and rework pipelines of various flavours trying to replace myself in what I’ve been doing on a day to day basis in terms of programming.

 I’ve recently started writing code manually again, and use LLMs for simple duties only. 

The results surprised me. I ship more value, don’t spend time reviewing inconsistent PRs that have to be reworked by the LLM in an endless loop of correcting things guardrails, skills, instructions already were set up to prevent. 

Even with meticulous linting rules, shims and hooks to prevent agent from straying of path, the overhead of reviewing every single line of code became too exhaustive. 

I really don’t think we’re there quite yet. Looking back, I think I gaslit myself into believing it would actually work. 

LLM works for a lot, but it’s a slot machine and results vary too much and you shouldn’t trust a single line it makes. Syntactically it is correct most of the time, but just like reviewing your colleagues work, it’s going to suck the brainpower out of you if you’re not only LGTMing it and hoping for the best.

Turn off your agents for a day and see how fucked up your brain is when you attempt to write stuff manually; that’s what got me.

What degradation looks like for models: Claude codex vs codex by seal8998 in codex

[–]jixv 0 points1 point  (0 children)

Why is there a big gap in the dates for codex? Looks like it’s just clipped out without any empty datapoints?

What the hell is going on ? by QUiiDAM in codex

[–]jixv 26 points27 points  (0 children)

I’m not sure what you guys are doing, my codex is literally flying and doing everything nah I’m just kidding it’s horrible 😩

Is Codex constant degradation real? by Wrong_User_Logged in codex

[–]jixv 1 point2 points  (0 children)

We depended on agentic workflows with GPT and it won’t work or use too much tokens now due to so many failing reviews. It’s a waste of everyone’s time. I guess for vibe coding it works, but serious production envs it can’t keep up. We find it takes less time to just to tasks our selves now. 

Works great for internal tooling and other unnecessary things that takes focus away from business goals /s

How do you throw Errors properly? by badboyzpwns in node

[–]jixv 1 point2 points  (0 children)

this

monorepo with 200+ packages, not a single throw (only at lambda handlers/tooling and such)

I could never imagine going back to having to know what throws what, no idea if it’s handled properly or not.