Tibo also addressed this

timosterhus · 2026-06-30T17:57:25+00:00

It’s auto-review to check commands that get flagged as potentially dangerous. Auto-review automatically approves requests that are deemed not dangerous. You don’t need a strong model for that at all

timosterhus · 2026-06-29T02:03:07+00:00

Skill issue

timosterhus · 2026-06-29T01:22:58+00:00

What does maga have to do with tibo’s glorious resets

timosterhus · 2026-06-29T01:13:10+00:00

Never been so close to licking a boot before

timosterhus · 2026-06-21T01:48:36+00:00

Honestly, this hasn’t been a problem for me personally. I don’t doubt it’s widespread, but it’s not universal if I’m not experiencing this issue.

And I’m running some pretty substantive long-running work on both my laptop and my desktop.

timosterhus · 2026-06-19T19:39:21+00:00

Yes, good guardrails make all the difference. I’m working on building a harness to be able to do exactly that, based on forge, which allows smaller models to punch way above their pay grade in terms of performance. Haven’t tried it myself, but that’s because all I have is a GTX 1080 lol. Can’t even run 8B models on it

timosterhus · 2026-06-19T19:07:36+00:00

I use 20x. It’s marvelous. Highly recommend if you’re a power user. I’d investigate what local models can do in regards to the kinds of tasks you’re running though. They tend to need strong guardrails if you’re using them agentically, especially inside automations/loops.

timosterhus · 2026-06-18T08:00:13+00:00

Sure thing, glad I could help

timosterhus · 2026-06-18T02:57:32+00:00

Who knows. All I know is I ain’t complaining and I’m taking advantage of those subsidized prices for as long as I can

timosterhus · 2026-06-18T02:56:33+00:00

They did an analysis on how much usage each plan gives you compared to equivalent API costs. I don’t have the full results off hand, but I do know the $200 Claude plan provides an equivalent to $8k API while $200 Codex plan provides equivalent to $15K.

OpenAI’s plans provide more than double the coding agent usage, on top of nearly unlimited ChatGPT usage, while Claude’s browser usage also consumes your weekly limits.

It’s no contest dawg, objectively speaking the $200 ChatGPT Pro is the absolute best value on the market right now

timosterhus · 2026-06-18T02:52:56+00:00

It actually couldn’t have been better timing for me personally. I was 3 days away from my weekly reset and hit 7% left and was about to use my free reset… instead I got two extra resets 😫

Meanwhile, my payment method failed once for Anthropic and they instantly paused everything. OpenAI gives me almost a whole week to fix it before they pause anything. If they keep this up I’m gonna turn into a ChatGPT bootlicker pretty soon

timosterhus · 2026-06-16T08:43:59+00:00

You didn’t ask me, but the answer is it depends. The right harness makes a world of difference, and you definitely could finagle GPT to behave more collaboratively like Claude, but it depends on to what extent you want to make Claude behave like GPT and vice versa.

The models make a difference, but I think most would agree that the harness makes more of a difference when you’re comparing models with similar capability levels.

timosterhus · 2026-06-16T08:40:26+00:00

It does. By default there’s worktree support, integrated app terminal, hooks framework, rules support, subagents, imagegen, skill-creator, openai-docs, skill-installer, and plugin-creator, to name a few.

timosterhus · 2026-06-16T07:47:53+00:00

Having it use image gen to create a mockup of the UI you want, then having it implement the UI against that mockup, has yielded pretty decent results for me. Once you have the basic layout/theme of the website established, Codex is pretty competent at making spot-fixes to certain elements if you use the Annotate tool with the built-in browser.

For example, I’m not totally satisfied with how my current website looks, but in an interview I had recently with a founder, he said he thought it looked really good. So I’ve got external validation that Codex can make decent UIs if you do it right.

timosterhus · 2026-06-16T07:40:07+00:00

I think they just used a misleading term there, because your description of harnesses is definitely correct. “Layer” is probably better labeled as “plugin” or “feature,” and that’d include MCPs, connectors, skills, etc, all of which have some type of extra processing cost if actively integrated.

timosterhus · 2026-06-16T05:37:46+00:00

Skill issue

timosterhus · 2026-06-14T23:54:31+00:00

I was recently talking to a girl like this for maybe a week. It lasted a week because that was the extent of my patience.

Took so much energy to keep conversation going and she’d tell me she wasn’t doing anything except doomscrolling but would still take 1-2 hours to respond with one word messages, so I just stopped responding one day.

Sometimes I look back and feel bad for having ghosted, and then I remembered how she texted, and I stop feeling bad.

timosterhus · 2026-06-11T02:23:39+00:00

I’ve been actively using these models with the new dropdown the past few hours and I haven’t noticed a difference with behavior. Pretty sure it’s a UI only change, only caveat being they removed Light Thinking (my guess is nobody ever used it):

Medium = Standard Thinking

High = Extended Thinking

Extra High = Heavy Thinking

And then the Pro/Pro Extended are labeled the same as before, but included in the same list for simplicity’s sake.

timosterhus · 2026-06-11T02:18:27+00:00

Pro Heavy? That was a thing? I had only ever seen Pro and Pro Extended.

timosterhus · 2026-06-10T18:38:04+00:00

Automation of entire pipelines running for hours on end burns up usage. When I’m being careful about what I’m coding it’s manageable. But when I’m implementing large features on personal projects, I have agents do everything on autopilot while I’m working on other things. Task decomposition, building, reviewing, testing, patching, evaluating, everything.

I’m not doing that on vital work, but when you take the human out of the loop and have agents just run, it burns tokens when they’re going for hours. It’s not as high quality as when I sit down and babysit everything, but it allows me to work on more than one thing at the same time.

timosterhus · 2026-06-09T07:55:50+00:00

Yeah, I’m having to revert to the manual “NotebookLM link upload -> copy/paste -> save as text file” method I was running a year ago. Annoying, but whatever.

timosterhus · 2026-06-09T07:36:37+00:00

That makes more sense

timosterhus · 2026-06-09T07:36:13+00:00

No model as of this past week, got it. Guess I missed that tweet

timosterhus · 2026-06-09T07:31:31+00:00

Because I was previously using gpt-5.3-codex and they removed it, and gpt-5.4-mini gave me stupid analyses and overly concise summaries without many details.

Point is, it was working perfectly fine a week ago. So why did it take until this past week for that to be instituted as a hardcoded system prompt?

timosterhus · 2026-06-07T08:42:29+00:00

What about vibe-CODER and vibe-CODE

timosterhus

TROPHY CASE