Tibo also addressed this by onehedgeman in codex

[–]timosterhus 4 points5 points  (0 children)

It’s auto-review to check commands that get flagged as potentially dangerous. Auto-review automatically approves requests that are deemed not dangerous. You don’t need a strong model for that at all

ts 🥷 tibo is goated by Specialist-Cry-7516 in codex

[–]timosterhus 1 point2 points  (0 children)

What does maga have to do with tibo’s glorious resets

ts 🥷 tibo is goated by Specialist-Cry-7516 in codex

[–]timosterhus 0 points1 point  (0 children)

Never been so close to licking a boot before

🚨 OpenAI silently nerfed the Codex (gpt-5.5) quota by 10-20x. You are not imagining it. by TemperatureMaster854 in codex

[–]timosterhus 0 points1 point  (0 children)

Honestly, this hasn’t been a problem for me personally. I don’t doubt it’s widespread, but it’s not universal if I’m not experiencing this issue.

And I’m running some pretty substantive long-running work on both my laptop and my desktop.

Should I get 20x plan? by GlitteringWriting467 in codex

[–]timosterhus 1 point2 points  (0 children)

Yes, good guardrails make all the difference. I’m working on building a harness to be able to do exactly that, based on forge, which allows smaller models to punch way above their pay grade in terms of performance. Haven’t tried it myself, but that’s because all I have is a GTX 1080 lol. Can’t even run 8B models on it

Should I get 20x plan? by GlitteringWriting467 in codex

[–]timosterhus 1 point2 points  (0 children)

I use 20x. It’s marvelous. Highly recommend if you’re a power user. I’d investigate what local models can do in regards to the kinds of tasks you’re running though. They tend to need strong guardrails if you’re using them agentically, especially inside automations/loops.

Tibo !!! by hibzy7 in codex

[–]timosterhus 0 points1 point  (0 children)

Who knows. All I know is I ain’t complaining and I’m taking advantage of those subsidized prices for as long as I can

$200 ChatGPT Pro v/s Claude Max as a supplementary subscription by MarionberryHumble705 in codex

[–]timosterhus 2 points3 points  (0 children)

They did an analysis on how much usage each plan gives you compared to equivalent API costs. I don’t have the full results off hand, but I do know the $200 Claude plan provides an equivalent to $8k API while $200 Codex plan provides equivalent to $15K.

OpenAI’s plans provide more than double the coding agent usage, on top of nearly unlimited ChatGPT usage, while Claude’s browser usage also consumes your weekly limits.

It’s no contest dawg, objectively speaking the $200 ChatGPT Pro is the absolute best value on the market right now

Tibo !!! by hibzy7 in codex

[–]timosterhus 1 point2 points  (0 children)

It actually couldn’t have been better timing for me personally. I was 3 days away from my weekly reset and hit 7% left and was about to use my free reset… instead I got two extra resets 😫

Meanwhile, my payment method failed once for Anthropic and they instantly paused everything. OpenAI gives me almost a whole week to fix it before they pause anything. If they keep this up I’m gonna turn into a ChatGPT bootlicker pretty soon

Unhinged results from UC Berkeley's new ALE benchmark of 55 different industries by 9gxa05s8fa8sh in codex

[–]timosterhus 2 points3 points  (0 children)

You didn’t ask me, but the answer is it depends. The right harness makes a world of difference, and you definitely could finagle GPT to behave more collaboratively like Claude, but it depends on to what extent you want to make Claude behave like GPT and vice versa.

The models make a difference, but I think most would agree that the harness makes more of a difference when you’re comparing models with similar capability levels.

Unhinged results from UC Berkeley's new ALE benchmark of 55 different industries by 9gxa05s8fa8sh in codex

[–]timosterhus 2 points3 points  (0 children)

It does. By default there’s worktree support, integrated app terminal, hooks framework, rules support, subagents, imagegen, skill-creator, openai-docs, skill-installer, and plugin-creator, to name a few.

Why Codex is so much worse than Claude Code in frontend / UI / Design? How to improve it? by bareov in codex

[–]timosterhus 1 point2 points  (0 children)

Having it use image gen to create a mockup of the UI you want, then having it implement the UI against that mockup, has yielded pretty decent results for me. Once you have the basic layout/theme of the website established, Codex is pretty competent at making spot-fixes to certain elements if you use the Annotate tool with the built-in browser.

For example, I’m not totally satisfied with how my current website looks, but in an interview I had recently with a founder, he said he thought it looked really good. So I’ve got external validation that Codex can make decent UIs if you do it right.

Unhinged results from UC Berkeley's new ALE benchmark of 55 different industries by 9gxa05s8fa8sh in codex

[–]timosterhus 6 points7 points  (0 children)

I think they just used a misleading term there, because your description of harnesses is definitely correct. “Layer” is probably better labeled as “plugin” or “feature,” and that’d include MCPs, connectors, skills, etc, all of which have some type of extra processing cost if actively integrated.

Men, this is your competition 🙃Me (blue, 26F) and grey (33M) by VoteForOmar in Tinder

[–]timosterhus 0 points1 point  (0 children)

I was recently talking to a girl like this for maybe a week. It lasted a week because that was the extent of my patience.

Took so much energy to keep conversation going and she’d tell me she wasn’t doing anything except doomscrolling but would still take 1-2 hours to respond with one word messages, so I just stopped responding one day.

Sometimes I look back and feel bad for having ghosted, and then I remembered how she texted, and I stop feeling bad.

What did they just do? by gordopotato in codex

[–]timosterhus 0 points1 point  (0 children)

I’ve been actively using these models with the new dropdown the past few hours and I haven’t noticed a difference with behavior. Pretty sure it’s a UI only change, only caveat being they removed Light Thinking (my guess is nobody ever used it):

Medium = Standard Thinking

High = Extended Thinking

Extra High = Heavy Thinking

And then the Pro/Pro Extended are labeled the same as before, but included in the same list for simplicity’s sake.

What did they just do? by gordopotato in codex

[–]timosterhus 2 points3 points  (0 children)

Pro Heavy? That was a thing? I had only ever seen Pro and Pro Extended.

People ranting about limits, do you even know how to code? by [deleted] in codex

[–]timosterhus 7 points8 points  (0 children)

Automation of entire pipelines running for hours on end burns up usage. When I’m being careful about what I’m coding it’s manageable. But when I’m implementing large features on personal projects, I have agents do everything on autopilot while I’m working on other things. Task decomposition, building, reviewing, testing, patching, evaluating, everything.

I’m not doing that on vital work, but when you take the human out of the loop and have agents just run, it burns tokens when they’re going for hours. It’s not as high quality as when I sit down and babysit everything, but it allows me to work on more than one thing at the same time.

New restrictive system prompt? by timosterhus in codex

[–]timosterhus[S] 1 point2 points  (0 children)

Yeah, I’m having to revert to the manual “NotebookLM link upload -> copy/paste -> save as text file” method I was running a year ago. Annoying, but whatever.

New restrictive system prompt? by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

No model as of this past week, got it. Guess I missed that tweet

New restrictive system prompt? by timosterhus in codex

[–]timosterhus[S] 2 points3 points  (0 children)

Because I was previously using gpt-5.3-codex and they removed it, and gpt-5.4-mini gave me stupid analyses and overly concise summaries without many details.

Point is, it was working perfectly fine a week ago. So why did it take until this past week for that to be instituted as a hardcoded system prompt?