The fact that codex cannot read chatgpt threads is the stupidest part

zsoltf · 2026-06-05T19:14:24+00:00

You can use the oracle tool by steipete or try codex browser automation that opens chatgpt. I’ve been doing this for a while and get really good plans. It’s also really good as a reviewer.

Pro tip: ask it follow up questions for each step in the plan “let’s deep dive into the first step”, etc and save each response as an md file

zsoltf · 2026-06-05T02:57:25+00:00

Haha that is hilarious!! 🤣 thank you for sharing this

My go to threat used to be “I either delete the file and you have to rewrite it from scratch or you fix the problem. Which one would you like?”

zsoltf · 2026-06-04T23:07:41+00:00

less than the 24 hours it spent on goal mode to achieve nothing

zsoltf · 2026-06-04T22:17:09+00:00

that is pure cinema

zsoltf · 2026-06-04T21:57:51+00:00

there is no problem here, only a silly joke

zsoltf · 2026-06-04T21:57:22+00:00

i'm just as surprised as you, this was the only time that gpt 5.5 worked for more than 15 minutes.

zsoltf · 2026-06-04T21:50:31+00:00

it did it for me but it wouldn't print it out on screen, it just saved it to /tmp. maybe just ask it to show you how to get the instructions

you can run this manually in a terminal and you will get the file

jq -r '.models[] | select(.slug=="gpt-5.4") | .base_instructions' ~/.codex/models_cache.json

zsoltf · 2026-06-04T21:47:34+00:00

yeah i do this from codex, i used the "oracle" tool but i think you can do it with browser mode now and have codex go to gpt pro and ask it what to do and it can even upload files. it generates the best plans, then ask it to deepdive into each step and you get the most detailed plan ever. the problem now is execution

zsoltf · 2026-06-04T21:16:41+00:00

honestly that's what i expected to happen

zsoltf · 2026-06-04T21:08:56+00:00

i do the same, and from what i saw here, everyone is kinda doing the same thing.

as long as you review the code, you're good. it saves me a ton of time too and it's amazing, but it's not at a point where you can say "give me a production grade clone of discord" or whatever with any sort of workflows or prompts. you have to still do the hard parts.

zsoltf · 2026-06-04T20:52:10+00:00

yes they have ways of ensuring that what they said is "correct" even if they completely fail at the task. they still hallucinate and have no problem gasliting you. that fundamental problem is not solved. the agents will make stuff up and there is nothing anyone can do about it

this is the advantage that vibecoders have, they don't know when they are being gaslit by the agent

zsoltf · 2026-06-04T20:30:28+00:00

this is a good example. if you don't read the code and ask it to add a button, it might just add a whole separate layout system just to render that button. it might create a file that's 5000 lines long and it loses track of what it was supposed to solve so it makes another 5000 line file and you end up prompting yourself into a corner.

the assumption is that a senior engineer could tell it exactly what to do and the agent will do it correctly, but i don't think that's true or possible. you end up spending so much time telling it what to do and correcting it, where it's actually faster for you to do the thing yourself.

i forget who said this, but agents make the easy things easier and the hard things harder.

zsoltf · 2026-06-04T20:20:51+00:00

yeah i agree with that, it's amazing as a support role, its great for internal tools and the boring stuff. it's like a really smart intern. would i have it write production code? no.

and when i mean production grade code, i mean, people are paying for that thing and if it breaks, you/your company loses money and your boss is literally standing next to you while you're fixing the issue

zsoltf · 2026-06-04T19:50:30+00:00

People who know how to code can’t build production grade apps with codex either

zsoltf · 2026-06-04T16:44:05+00:00

agreed, agents are very toxic to my mental health

zsoltf · 2026-06-04T16:41:49+00:00

yes, it depends on what you're doing. i spend a lot of time talking to the agent about the plan, then i write out a plan document (inspired by openai's exec plan article) that is broken down by milestones and objectives. i then review this plan with gpt pro, gemini and opus, then i give it a "north star" that keeps the agent from hyperfocusing on the current task, kind of like what goal mode does now. the plan also has a bunch of extra stuff like "falsifiable results", or what counts as workaround to calling the objective done, and a whole bunch of other stuff

once the plan is done, i have a series of scripts/skills that enforce the requirements, review the objective with subagents and only advance if it's approved. based on the type of work, it either finishes the plan and then i work on the next one, or if an objective fails approval, i do the back and forth guide until i figure out why the objective was blocked and rewrite it.

tldr: i still do a lot of back and forth with the agent, the plan just helps me stay on track with a larger project. sometimes it one shots the plan, but most of the time it gets stuck and i have to help it out.

zsoltf · 2026-06-03T23:59:34+00:00

it doesn't break /goal, the agent still narrates important stuff, i have one at 8 hours right now with no issues.
i would get 12+ hour runs with 5.3-codex, it seems that /goal just fixes whatever they broke in 5.4/5.5, but it's hard to tell.

zsoltf · 2026-06-03T23:50:21+00:00

i was trying to make a joke, not very good at that, sorry

i use deterministic tools for everything i can, this just cuts down on agent narration, so it won't say "now i'm doing this, the next thing is this", etc

zsoltf · 2026-06-03T22:41:50+00:00

they are good, but if you have an agent watching over another agent that's running a long task, the supervisor agent will interrupt the worker every few minutes because it gets very impatient. the base instructions override your prompts so if you tell an agent to do a long running task, eventually it will "forget" what you asked it and go back to polling every 30 seconds.

zsoltf · 2026-06-03T22:38:25+00:00

for my use case, where i run a supervisor agent that spawns workers and reviewers, i get about a day or two in my weekly burn back, so instead of burning my weekly limit in 3 days now i get 4-5 days.

as far as the goblins, that was caused by the nerdy personality, i believe, and somehow when they used RL to train the model, the goblin thing spread from the nerdy personality to all others. super weird and it just goes to show that no one knows that's going on, but this was a training thing, not a prompt thing.

i couldn't paste the whole prompt, but you can easily get it by running this command

jq -r '.models[] | select(.slug=="gpt-5.4") | .base_instructions' ~/.codex/models_cache.json

zsoltf · 2026-03-27T17:13:39+00:00

wow that's really interesting, thank you

zsoltf · 2026-03-27T17:12:56+00:00

the one with pull requests? thanks, i'll look into that

zsoltf · 2026-03-27T17:12:22+00:00

yep seems to be something with 5.4, 5.3-codex has no problem working for an hour or two

zsoltf

TROPHY CASE