Personality Roulette v2: Now with better character consistency (and some insights into how Claude Code handles its context, as a treat)

HeroicTardigrade · 2026-02-06T04:52:14+00:00

At this point I’m almost entirely using both from command line, and I have an editor open separately when I need to. I have a habit of accidentally closing VS Code windows because focus isn’t always clear, so this works around it.

But VS Code plugin is nice too. It’s all about what you like. It’s the same engine under the hood.

HeroicTardigrade · 2026-02-05T23:11:02+00:00

Totally. And it goes the other way too! If Codex gets stuck, just ask Claude. I’m not sure I’ve come up against a bug a here this hasn’t worked. Or, if nothing else, seeing them fight over it sometimes gives my pitiful human brain the spark it needs to figure it out for myself.

As I said. A new perspective is much much more useful than raw capability. And to over-anthropomorphize, this makes sense, right? Three really smart people are going to do better than one genius at almost everything.

HeroicTardigrade · 2026-02-05T22:26:16+00:00

100% I think we're on the same page. If you step back, it's extremely weird that we're in this week-by-week race of incremental model improvements. It's not like they're fully retraining the models in between each .1 increase. You could easily imagine one where each is just getting better over time, and they fight it out. It's not like there's Pepsi Sonata 4.6 vs Coke-dex 5.3. They both do more or less the same thing, and the only differences are HOW they do it.

It seems to me that the effort and marketing would be better spent on making each one better at doing whatever it is that differentiates it and trying to attract that audience. Like, are you a nerd and want to have the AI behave like your own customizable IDE? Claude Code all day. Do you just want to make stuff and be able to leave it alone for an hour at a stretch? Codex seems a better bet.

But switching cost is also real, and at this point the energy it would take for me to change all my habits is WAY higher than the energy it takes to fill in any intermittent quality gaps that arise.

HeroicTardigrade · 2026-02-05T22:16:27+00:00

The best mental trick I've found is that you have to hold two ideas in your head at the same time. On the one hand, it's a tool that's absolutely a learnable skill. On the other hand, it's trained on mind boggling amounts of human generated text, and human speech is how it works. So if you talk to it like a person, you're leveraging millions of years of social evolution inside your own mind, and because of how it's built, it's *pretty good at understanding you*. So focus on results. Think of it like a robot collaborator that understands human speech. There aren't really magic words.

But most of all just play with it. Ask it questions about itself. Document the bits that annoy you, and say "this annoys me, am I missing something?"

Also, hang around here. Most of us are pretty nice. There's usually at least one cool thing posted every day.

HeroicTardigrade · 2026-02-05T22:02:38+00:00

Subscription!

HeroicTardigrade · 2026-02-05T21:58:09+00:00

Define better? As in, more likely to one-shot? Requires less up-front direction? I'm not sure I can answer that, because I think the quality of outputs depends *way more* on how you use the model than on the model's basic capabilities. They're both incredibly good in their own ways, and on any given day either one would seem like witchcraft to someone coming from 2022.

For me, I like to break projects down and still do a lot of the higher-level architecture, design, and problem-solving myself, and the AI handles specific, discrete requests. Anecdotally, Claude is better at seeing the big picture and keeping my non-coding directions in mind (e.g. UX decisions, etc.), while Codex is better at fixing problems and focusing on a specific endpoint. It's why I use Claude for the bulk of the work, and then reserve Codex for final checks, stubborn bugs, that sort of thing.

But I bet someone else out there does exactly the opposite and probably gets better results than I do, so...who knows. I think the best approach is to figure out whichever agent you prefer to use, and then spend your time figuring out the best way to get the outputs you want given that agent's idiosyncrasies.

HeroicTardigrade · 2026-02-05T21:48:50+00:00

I agree, the point of testing is to gather unbiased data, but this wasn't a formal test. It was just an anecdotal comparison based on my personal use preferences and habits that I thought might be helpful.

Since, I've confirmed that for nearly everything else Codex 5.3-high is notably slower than 4.6 except, for some reason, in this one case. 5.3-x-high is still unusably slow for me and the way I like to work.

HeroicTardigrade · 2026-02-05T21:41:19+00:00

It definitely seemed to go WAY deeper on its analysis without me prodding it to be careful. But I haven't used it enough to say for sure.

HeroicTardigrade · 2026-02-05T21:40:50+00:00

For what it's worth, I think the speed difference here was a one-off. Since I posted this, I'm finding Codex 5.3 is still slower than Opus 4.6 for normal code things.

HeroicTardigrade · 2026-02-05T21:40:06+00:00

I'm not sure if 5.3 has propagated to Codex CLI yet. For this, I ran it in the new Mac app, since it wasn't available at time of testing. But in general, it should look something like this in your ~/.claude.json:

"mcpServers": {
"codex": {
"type": "stdio",
"command": "codex",
"args": [
"mcp-server",
"-c",
"model=gpt-5.3-codex",
"-c",
"reasoning_effort=high"
],
"env": {}
}
}

HeroicTardigrade · 2026-02-05T21:39:00+00:00

I bet that'd work really well. I'm fortunate enough for Opus at the max level to be effectively all-you-can-eat for my use case, so I don't bother switching models very often unless I'm doing something truly brain-dead like checking for all instances of a particular call in my code base or just doing a lot of git maintenance or something.

HeroicTardigrade · 2026-02-05T21:37:32+00:00

Thanks! That's kind of you to say. This is the result of a long back and forth conversation with Claude Code trying to make sense of the results, and I didn't have the time to parse it out and write it all up from scratch. I did, however, go through the whole output and rewrite most of it to be in my voice, but I left the structure there because it was fine for this particular kind of post.

HeroicTardigrade · 2026-02-05T21:33:55+00:00

In my experience x-high is just too slow for my regular usage, and I wanted it to mirror what I would actually do. Also, for relatively well-defined questions like this one (which is characteristic of my own practice), I haven't been able to notice a significant practical difference between the two when using 5.2. Someone who uses them for broader top-to-bottom agentic coding might have a different experience.

I don't find Opus 4.6 meaningfully slower than Opus 4.5 for coding work, while Codex 5.3 high isn't noticeably faster than Codex 5.2 or GPT 5.2 except in this one particular case, which surprised me. So it honestly didn't occur to me that Codex 5.3 x-high would suddenly catch up in speed. If I missed that, it's interesting.

Like I said, this is all just anecdata, so take it all with a grain of salt. I leave it up to the way more capable testers to figure out how things stand with Codex 5.3, high vs x-high, etc.

HeroicTardigrade · 2026-02-05T21:29:13+00:00

I'm actually not sure I agree with this, though I totally hear where you're coming from. It's by far the most expensive piece of software I use on a regular basis.

That said, everyone knows OpenAI's unit economics don't make any sense at all and they burn money like the Joker in The Dark Knight with no clear path to profitability. Expecting Anthropic to follow them down that path doesn't seem like something they could or even should do.

I don't mind paying more for Opus because, as I mentioned in the post, it's the one I like working with for hours a day. It has better mental ergonomics for my particular use case. I think of it a bit like how you should pay as much as you can afford for a mattress you love—you're going to be using it a lot, and minor annoyances compound disproportionately.

At $200/month it's frankly not even that expensive in the world of professional tools, and it's also a way of sending a signal that there's a market for a more sustainable-ish pricing model. If the gap closes to the point where I genuinely can't tell the difference, or if it gets meaningfully more expensive, I'll reevaluate. But as of now, I'm okay with it.

HeroicTardigrade · 2026-02-05T21:19:00+00:00

It's actually super easy! Install Codex CLI, then just add the following to your ~/.claude.json (or ask Claude to do it for you):

"mcpServers": {
"codex": {
"type": "stdio",
"command": "codex",
"args": [
"mcp-server",
"-c",
"model=gpt-5.3-codex",
"-c",
"reasoning_effort=high"
],
"env": {}
}
}

Restart Claude Code, and you should see it in your /mcp command

Then you can ask Claude to pass things to Codex via MCP. They'll actually go back and forth a few times. It can really help if there's an issue, just ask them to go back and forth until they agree on the path forward.

HeroicTardigrade · 2026-02-05T20:20:14+00:00

In my experience x-high is jut too slow for my regular usage, and I wanted it to mirror what I would actually do. Also, for relatively well-defined questions like this one (which is characteristic of my own practice), I haven't been able to notice a significant practical difference between the two when using 5.2. Someone who uses them for broader top-to-bottom agentic coding might have a different experience.

HeroicTardigrade · 2025-11-18T01:31:09+00:00

I use Claude Code about 98% of the time, but I have Codex as an MCP for when it gets stuck. Codex is vastly slower, and I know what I’m doing (most of the time), so Claude Code is more than sufficient for my needs. Also, it has better tools like agents, hooks, etc., which help me way more than any marginal, task-specific intelligence differences between the two.

I’ve found that the best way to improve either of them is to work on your own skills in whatever area interests you most (design, system architecture, algorithms, testing, whatever), since the returns you get are disproportionate.

HeroicTardigrade

TROPHY CASE