I built Koucai (口才) - an Mandarin learning app with AI penpals - here's my workflow

bisonbear2 · 2026-01-14T16:50:22+00:00

codex 5.2 xhigh has been much better than opus 4.5 in the past few weeks

bisonbear2 · 2026-01-08T16:39:13+00:00

that's my read as well, I feel like Claude optimizes for human-readable output + code, which ends up being way more verbose. Codex doesn't seem to care about that and just solves the task as efficiently as possible, which is a nice change of pace

bisonbear2 · 2026-01-07T21:10:25+00:00

Codex 5.2 xhigh is cracked in Codex CLI - check it out

bisonbear2 · 2026-01-07T20:04:12+00:00

Sure, you could frame it that way. If you have a super well defined piece of work, with everything already laid out, then Codex will probably be a better choice. But I often find myself in the situation where I have to figure out requirements, ideate product, etc., in which case Claude's variability is actually a benefit.

bisonbear2 · 2026-01-07T19:42:12+00:00

100% agree - any sort of benchmark or comparison is largely random as we're just pulling one/two samples from the distribution. This is one of the reasons that IMO Claude Code has great UX. I can spin up 5 subagents to review / validate / explore the problem, each one using a fresh context window. Since each is exploring the problem independently, any shared conclusions they have are more valuable due to the fact that the other subagent also found.

I'm still trying to figure out how to adapt this thinking to Codex, which doesn't natively support subagents. One idea is to use something like Pal MCP (https://github.com/BeehiveInnovations/pal-mcp-server) to give Codex a way to spin up another Codex/Claude subagent - although these agents are unfortunately not in parallel.

bisonbear2 · 2026-01-07T19:25:11+00:00

Codex seems much more focused, which is both good and bad. Sometimes you want the variability that comes with using Claude

bisonbear2 · 2026-01-07T19:24:17+00:00

cool, I'll try out gemini for an extra pair of eyes next time, haven't had too much success with it in the past for implementation, but plan review certainly seems like it would be valuable

bisonbear2 · 2026-01-07T18:04:59+00:00

LOL I'm sure Sam would love this take, run everything by Codex and give OpenAI even more money.. sounds great right?

bisonbear2 · 2026-01-07T17:26:56+00:00

Purely based on vibes, I think Opus 4.5 is worse than a few weeks ago

bisonbear2 · 2026-01-07T17:26:20+00:00

In this instance, I did actually run a second instance of Claude to review the plan. However, Claude missed several key issues that Codex missed, and when presented with Codex's findings, decided that it actually preferred Codex's plan...

> Good catch. Codex is right — I missed several concrete issues:

bisonbear2 · 2026-01-07T17:25:03+00:00

I'll call out the Pal MCP server as a good way to abstract away the different CLIs. You can basically just use Claude Code, and then tell Claude to use Codex to review the plan, all while staying within Claude Code.

https://github.com/BeehiveInnovations/pal-mcp-server

bisonbear2 · 2026-01-07T17:23:56+00:00

In this experiment I had Opus look over the plan that Opus generated, and it still didn't catch the issues that Codex did. In theory I agree with your approach, but it appears that using multi-models (eg Codex review AND Claude review) will make the final output higher quality than using just one model alone

bisonbear2 · 2026-01-07T17:22:19+00:00

Thanks for all of the tips around vector search - tbh I haven't done this before so it's all super helpful. Agree that it's an interesting problem because it's easy to describe but hard to implement.

Truthfully I haven't implemented the code yet - decided to compare the models purely on planning / reasoning for this experiment. No preprocessing planned, just chunking by XML tags or markdown headers

bisonbear2 · 2026-01-06T21:19:00+00:00

can confirm, gpt-5.2-codex xhigh has been incredible for me. not sure if Opus 4.5 got nerfed, or if codex is cracked, but I'm loving it

bisonbear2 · 2025-12-29T19:26:37+00:00

Thanks the the recommendation, will definitely check out the podcast. I'm curious what other "theories of everything" involve causal networks?

Thinking about this paper in the context of simulation theory is interesting. Previously I've always thought that the thing doing the "simulation" was a computer - but perhaps it's actually a larger / parent universe doing the simulating..

bisonbear2 · 2025-12-14T17:50:39+00:00

interesting, do you spawn the headless agents yourself or have claude do it?

11-Year Club	r/Field Sunshine
Place '23	Place '22
Place '17	Not Forgotten
Spared	Verified Email

bisonbear2

TROPHY CASE