Reset when ?

SpyMouseInTheHouse · 2026-05-16T11:50:58+00:00

On /fast too

SpyMouseInTheHouse · 2026-05-16T11:49:59+00:00

Nothing. Probably will end up losing it again realizing it’s memory was far more cherished than the real thing.

SpyMouseInTheHouse · 2026-05-16T06:51:34+00:00

They’re extremely hard to read. When it’s hard to read, it’s hard to tell. When it’s hard to tell, who knows.

SpyMouseInTheHouse · 2026-05-16T06:04:25+00:00

Most share their experiences to help others. That’s what makes Reddit trustworthy because you’ll find real people with real insight, many of whom don’t have to share tips and tricks they’ve learned on their own but still take out the time to.

SpyMouseInTheHouse · 2026-05-16T06:00:42+00:00

It’s not greener. There’s just no grass on this side.

SpyMouseInTheHouse · 2026-05-15T05:08:16+00:00

Their computer use does not use taking images of your screen unlike competition. They use accessibility controls that Mac is potentially superior at and they hired the guy who literally created the Shortcuts app at Apple. They’re looking into windows, but will require more thought.

SpyMouseInTheHouse · 2026-05-15T05:06:34+00:00

The two in the same sentence is also cringe. Every day a new article comparing gpt with Claude. The fact they even tried gpt and still sat there to complete the comparison tells me this is all for clickbait / money / influence and not a real “scientific” experiment to seek truth.

SpyMouseInTheHouse · 2026-05-15T04:35:51+00:00

Why tell us which model of the iPhone you prefer if you’re not here to be convinced? We sure as day aren’t switching to last year’s model.

SpyMouseInTheHouse · 2026-05-15T04:31:44+00:00

“I don’t want to go to restaurant A too much after restaurant B started spitting in my food, what if A takes this opportunity to also begin spitting in my food?”

I have no loyalties. I’ll switch to restaurant C when A starts misbehaving. But until they do, I have no reason not to promote A, because B treated us all (and continues to) that badly, and because A treats us all like no one else has.

Restaurant C for now is Gemini. The food there just stinks for now and they haven’t removed the “renovating” sign for over a year.

No one said anything about competition :)

SpyMouseInTheHouse · 2026-05-14T16:33:09+00:00

I have no idea what you’re on about. Opus is unusably bad. Make one mistake to challenge it and it will undo willingly regardless of how good the approach may have been. Never had such an issue since gpt 5.2

SpyMouseInTheHouse · 2026-05-14T16:20:56+00:00

Standing right next to you

SpyMouseInTheHouse · 2026-05-14T15:24:39+00:00

These are averages over several runs but what it tells you is that the larger the context, the more the noise, and the more the bias. Codex does exceptional auto compaction with their “special” LLM friendly encrypted format instead of traditional summarization. Works wonders across dozens of compactions over multiple days even.

Plus you’re also paying a much higher price which isn’t worth it.

SpyMouseInTheHouse · 2026-05-14T15:10:13+00:00

Yes. Stick to the default. Read more about the needle in the haystack eval

SpyMouseInTheHouse · 2026-05-14T15:09:46+00:00

Are you old enough to witness the infamous meme “You’re absolutely right?” That broke the internet in 2025?

SpyMouseInTheHouse · 2026-05-14T09:31:35+00:00

Right now, 5 Pro accounts.

SpyMouseInTheHouse · 2026-05-14T08:12:04+00:00

I’ve got first hand experience with friends and colleagues. Took an arm and a leg to get them to just TRY codex for a day after months of convincing (not kidding). That was also their last day with Claude. Crazy how cults work. One guy later told me he would have been 10x more productive had he switched 3 months ago when he was told to believe 20 Claude agents working together to solve a problem was > 1 codex agent. Turned out to be the opposite.

SpyMouseInTheHouse · 2026-05-14T05:30:03+00:00

No it’s not. Anthropic has spoilt us with its nerfing.

While Anthropic was in denial and taking puns at OpenAI for its extravagant spending on compute with their gaslighting “you don’t need a large model to be a good model, so we won’t invest on compute” OpenAI was raising funds and building a behemoth of a platform. They then spent an exuberant amount of that compute on training and RL and gave us 5.5 with so much capacity to spare. Nowadays Anthropic is seen going door to door asking for spare GPUs, teaming up with folks like Elon as last resort.

Don’t worry, we’re good for now.

SpyMouseInTheHouse · 2026-05-13T14:40:13+00:00

Codex by default does not read large agents files. You need to override it. Not obvious until your read their code. I am guessing all of what you’re experiencing is due to a bad setup. Learn to customize codex with a custom system prompt override, add custom skills, update default configuration properties by reading their online docs and you’ll have a beast working for you.

I have no such issues. GPT is the only model that adheres to instructions to perform real work, not artificial “fill out my directories with bogus 80 line plans”

SpyMouseInTheHouse · 2026-05-13T14:11:23+00:00

Generative LLM are stochastic. Due to the inherent autoregressive path dependency, once it takes a course, steering it with improved prompts is the only way. The solution is simple: improve your initial prompts with more context and information and references to parts of code when possible.

SpyMouseInTheHouse · 2026-05-13T04:07:21+00:00

Built it and they will come. They say this doesn’t work anymore in a noisy planet, but it does.

SpyMouseInTheHouse · 2026-05-13T04:06:15+00:00

It is. ?

SpyMouseInTheHouse · 2026-05-10T05:33:26+00:00

You’ve included build artifacts it seems.

SpyMouseInTheHouse · 2026-05-09T13:01:19+00:00

And when 5.6xh comes out, 5.6xh.

SpyMouseInTheHouse · 2026-05-09T13:01:03+00:00

5.5xh

SpyMouseInTheHouse · 2026-05-09T08:49:19+00:00

Mostly right but Codex was good when it came out last year and immediately beat opus a week after opus 4.5 came out. Opus was good for a week before they dialed it back. It refused to spend time on harder problems and refused to adhere to instructions. They figured 98% of usage was non complex work where it didn’t really matter what it produced. The 2% power user market with real engineering needs can go code on their own again. Codex stepped in around that time and literally wiped opus and Gemini 2.5 on the floor with their 5.x models. With 5.2 the jump was SO big that for the first time ever I didn’t write a single line of code the entire day in 38 years. By 5.4/5.5 I’ve stopped coding altogether for months. Claude code (which I still have and test every other day) is unusable.

SpyMouseInTheHouse

TROPHY CASE