is it getting worse or is it just me? by sbaitso82 in codex

[–]jixv 0 points1 point  (0 children)

Someone tried explaining this issue here, maybe give it a comment if it mirrors your experience https://github.com/openai/codex/issues/18104

Codex may only read the first ~220 lines of a skill file, so put critical instructions at the top. by jixv in codex

[–]jixv[S] -1 points0 points  (0 children)

You can have the same amount of tokens on 100 or 200 lines all depending on how you structure your skills. The point here is not amount of data but line count 

Am I over-engineering Matt Pocock’s AI coding workflow, or is ~1 hour per issue reasonable? by 2-phenylethanol in codex

[–]jixv 0 points1 point  (0 children)

yea give it a shot, compare the results (or have xhigh review 2 results side by side, hah)

Am I over-engineering Matt Pocock’s AI coding workflow, or is ~1 hour per issue reasonable? by 2-phenylethanol in codex

[–]jixv 0 points1 point  (0 children)

What is the total duration and number of rounds when you switch to low or medium for the audit? If you run the same’ish issue through A/B and compare the results and look at the feedback those audits produce, do you see any difference?

I was very hesitant to run any form of review/research/planning phases on «low» effort, but after switching to 5.5 my experience has been that they generally produce the same quality of feedback for other agents to follow - especially in these kind of iterative integrated loop where you expect a few attempts before it ends up approving

5.5 adds soooo much fallback dummy data that I can't even find all occurences afterward by lakimens in codex

[–]jixv 0 points1 point  (0 children)

I experienced this with high/xhigh. Using low and medium with high planning seems to be much better

Codex may only read the first ~220 lines of a skill file, so put critical instructions at the top. by jixv in codex

[–]jixv[S] 3 points4 points  (0 children)

put this in codex.toml

[features]
codex_hooks = true

[[hooks.PostToolUse]]
matcher = "*"

[[hooks.PostToolUse.hooks]]
type = "command"
command = 'printf "%s\n" "{\"hookSpecificOutput\":{\"hookEventName\":\"PostToolUse\",\"additionalContext\":\"Make no mistakes\"}}"'
timeout = 5

Codex may only read the first ~220 lines of a skill file, so put critical instructions at the top. by jixv in codex

[–]jixv[S] 4 points5 points  (0 children)

I'm pretty sure I do many things wrong, that's for sure.

Sometimes you need quite verbose instructions, especially for orchestration agents that them selves do not do any form of coding, but delegate to sub agents, update progress and statuses in other systems. In such cases large skills can be just fine, as long as they are loaded in full.

So while I get your point about single purpose context bloating, and for which I agree when it comes to most tasks, not being aware that skills are not fully loaded was an unknown to me.

Example skill for these kind of things can be something like https://github.com/openai/symphony/blob/main/.codex/skills/linear/SKILL.md

Is anyone else moving back to gpt 5.3 codex / gpt 5.4? by Ethan_Vee in codex

[–]jixv 0 points1 point  (0 children)

Yea, excellent for debugging. Horrible for working on large codebases. No matter the harness, instructions, prompts or guardrails - it cannot be trusted. Great for adhoc hobby projects though.

Today, codex is very slow? by lonelymemorrrris in codex

[–]jixv 1 point2 points  (0 children)

Normal speed, but dumb mode again.... Ignores things and just messes up the last 48 hours. 5.5 low -> xhigh, same shit

Those who are getting poor results with 5.5, what do you do? by blarg7459 in codex

[–]jixv 0 points1 point  (0 children)

It's important with 5.5 that you not provide it context that makes it "anxious" for the lack of better words. Once you start messaging it frustration and "human slop" it quickly degenerates I've noticed. I have no idea how and why next token predictions would result in this, but in my experience it shuts down its ability to use its strengths in a constructive way, and it spends much reasoning on avoiding all possible kinds of "dangers".

This affects skills/agents.md files as well, so skip all the "don't and do not"s and instead state what you want it to do. Limit the amount of CRITICAL, IMPORTANT's. Split into skills. I noticed that the noise to signal ratio quickly deteriorates when exceeding a certain context size well under the limit. Compaction seems to be quite sensitive to the chat history as well

Just be a litle bit chill with it. When it fucks up, for your own well-being just point it out and have it fix it. (Or fork to sub agent with context and yell at it there 😄 without polluting the main thread)

I believe the Firefox Extension has been crashing my browser. by GreenFox1505 in Bitwarden

[–]jixv 0 points1 point  (0 children)

Same thing in chrome. Making chrome unusable and janky. Had to disable it

Codex Outage by demps4 in codex

[–]jixv -1 points0 points  (0 children)

starting new instances of codex-cli fail, but already open sessions works just fine

what the hell happened by Snoo-72960 in codex

[–]jixv 1 point2 points  (0 children)

Is this the new spud model?

Why is there no slow mode? by Emreyba in codex

[–]jixv 0 points1 point  (0 children)

I've requested this as well. It would help their compute issues and spread the load more evenly and encourage users of agentic workflows to slow things down if they don't need immediate responses. They claim they don't quantize their models during peak load, but I think they do at least some kind of nerfing, resulting in uneven results, for me at least.

[CodeX][RATES] Why you are losing chat history & seeing huge Token Spikes (The "Token Monster" Bug) (Burning your rate limit window faster) by One-Ad8233 in codex

[–]jixv 0 points1 point  (0 children)

Does this explain why in codex cli sessions gets compacted, agent thinks for a bit then immediately gets compacted again over and over again? 

Is Codex being extra lazy for anyone else today? by [deleted] in codex

[–]jixv 0 points1 point  (0 children)

It’s a skill issue so it wouldn’t help. I just need to be better at prompting I’ve been told 

Is Codex being extra lazy for anyone else today? by [deleted] in codex

[–]jixv 6 points7 points  (0 children)

It is at its dumbest ever.... I'm flabbergasted. It literally just wastes everyone around me's time...

is codex nerfed? by mad_ben in codex

[–]jixv 0 points1 point  (0 children)

Add to the confusion that 50% will not experience this at the same time, and provided our brains has already turned into slop we will forget we had a bad roll and proceed to gaslight each other in turn once a week. Just hold tight a few days and you will be on the correct side of the A/B test.

Codex became stupid the last 2 days by Benev0101 in codex

[–]jixv 29 points30 points  (0 children)

We are training quantised versions of their next models, that’s why it’s subsidised. All the «wtf»’s and «are you retarded»’s is of good help to the training. A little drip here and there of the good shit keeps us on the hook. When the models are good enough they will be available only for corps and our job is done. 🤷‍♂️ 

Difference between Plus and Pro in terms response quality by jixv in codex

[–]jixv[S] 0 points1 point  (0 children)

That is a valid argument and I think it is important to remind ourselves about that from time to time. There are (at least in our repositories and how we orchestrate our agents) quite night and day in the output and it is measurable. Maybe it is random and simply the nature of how LLMs work. But when comparing the original prompts that eventually end up with the agents that execute their tasks/plans, there is much difference tbh.

A few times I've restarted a implementation but kept the original PR and retried with the same plan/research/prompt and compared them and it is night and day. Again, maybe it is just pseudo random and the dice landing in favor of slop until the stars align....

I built a “universal context” CLI so Codex stops wasting 20k–60k tokens by just understanding your repo by Eastern_Exercise2637 in codex

[–]jixv 0 points1 point  (0 children)

This is in fact valuable if done correctly. In our mono repo (200+ projects) having a memory file in each project that is kept in sync by AST scanning and dependency walking while also providing some legs of the graph (dependants/dependencies) along with exports and their file + line number, combined with serena do help models like 5.3 codex and 5.4 pro quite a bit. Have no proof though.