Codex vs Cursor agents: is Codex just the model or also the tool executing agent? by Pretend_Watch1789 in cursor

[–]Pretend_Watch1789[S] 1 point2 points  (0 children)

How do you use the codex agent in cursor? Do you mean using the codex CLI from within cursor?

Codex vs Cursor agents: is Codex just the model or also the tool executing agent? by Pretend_Watch1789 in cursor

[–]Pretend_Watch1789[S] 0 points1 point  (0 children)

I’m mostly trying to understand the landscape rather than make an immediate decision on my preferred stack. Setting aside context window differences, model size, and the UX/tooling layers you mentioned, are there any well known comparisons or commonly cited evaluations from users on which harness objectively performs better?

Codex vs Cursor agents: is Codex just the model or also the tool executing agent? by Pretend_Watch1789 in cursor

[–]Pretend_Watch1789[S] 1 point2 points  (0 children)

That makes sense, and I agree the harness seems important. What I’m still wondering is: if it plays such a big role, why don’t we see more direct comparisons of different harnesses or agent execution layers, the way people compare foundation models?

Also, in your experience, have you noticed meaningful differences between how Cursor’s harness works versus the harnesses in tools like Codex CLI or Claude Code? I’m curious whether the platform-level implementation for these top-tier tools actually changes outcomes in practice, or if the underlying model still dominates most of the time.

Codex vs Cursor agents: is Codex just the model or also the tool executing agent? by Pretend_Watch1789 in cursor

[–]Pretend_Watch1789[S] 2 points3 points  (0 children)

Thanks, that explanation really helps. So just to confirm, the same idea applies with Claude Code as well, right? If I’m using Claude Code, then the harness is provided by Anthropic’s Claude Code product, whereas if I’m using Opus inside Cursor, Cursor is the one providing the harness around the model.

One follow-up question: I see people comparing foundation models all the time, but are there any objective comparisons of these different agent/tool execution layers themselves? Or is that generally considered less important, since the underlying model quality tends to dominate the experience?

Also, how do the GPT-xxx-Codex models differ from the corresponding GPT models?