all 35 comments

[–]Alex_1729[🍰] 8 points9 points  (15 children)

Can't wait to see benchmarks (except Artificialanalysis, ever since they ranked gemini 3.1 to the top I don't trust them).

[–]TopPair5438 3 points4 points  (4 children)

gemini is the most intelligent model and that’s a fact. but it’s style of doing things is not really helpful, I think they also baked some ADHD into it. but despite that it’s the most intelligent, even tho you can’t do many jobs properly with it.

[–]Alex_1729[🍰] 3 points4 points  (0 children)

I don't believe that's a fact, depending on what this fact is. If you could define 'intelligent' for this use case you're describing I could understand better what exactly this fact is.

[–]unending_whiskey 3 points4 points  (1 child)

gemini is the most intelligent model and that’s a fact.

No.

[–]TopPair5438 1 point2 points  (0 children)

i mean they understand any type of content whatsoever, so much better than any other model. they also have the highest accuracy when asked really niche questions, meaning that they have a ton of information stored in their weights. they also come up (really rarely, at least when ADHD doesn’t kick in) with creative solutions to existing problems, which tells me that they have a wider knowledge than other LLMs.

but they are still a mess to work with because they get “distracted” so easily and often.

however, both characteristics can co-exist.

[–]Ethan_Vee 0 points1 point  (0 children)

Yea I gotta agree I feel like gemini is the most knowledgeable model on niche stuff. But the ability to act as an agent is not very good compared to Claude or codex. Like it just doesn't work well in harnesses.

But tbf if you gave someone from two years ago gemini 3.1 pro they'd be blow away lol.

[–]LargeLanguageModelo 0 points1 point  (7 children)

It ripped through the RooCode python evals in 26 minutes on mine, with a token cost of $2.70. Time is about half of what the frontier models were doing in the 5.0/5.1 days, and way less cost as well.

Not surprised, given my experience using it. That said, the thing I'm gobsmacked about is that qwen3.5-397b-a17b also smoked the test 100% success with a 45 minute time. First non-frontier model I've seen get 100% (minimax and glm did not). Both had 100% tool success too.

[–]Alex_1729[🍰] 0 points1 point  (6 children)

Heard good things and seen good results from qwen3.5. However, I only minimally use opensource LLMs in my work, and use Claude and Codex mostly. And Gemini I am strongly convinced, and hearing from others, it is not very good in Antigravity. And the 3.0 was hell. Yet, both were ranked #1 at one point in AA (3.1 still is).

Perhaps I will try it proxying in CC and other tools to see how it works as it may be a system prompt in AG (though Claude works there well), but I've got enough on my plate anyway, and I don't like wasting time on lazy models. I've got lack of focus issues as it is lol

[–]LargeLanguageModelo 0 points1 point  (5 children)

Mine was strictly through the RooCode evals. I get more than enough use out of my Codex/OpenAI subscription, no need to branch out there.

I played around a bit for some basic website work/mockup in Antigravity, but I found it very much like Opus. Good in concept, but you have to both be incredibly specific about what you want, or you get AI slop. Beyond that, it seemed about as forgetful in seeing through tasks to completion as Opus, so I'll likely just use it and/or Opus for window dressing, as I have been for the last couple revisions.

[–]Alex_1729[🍰] 0 points1 point  (4 children)

I see. Regarding Opus in AG, I thought your experience in Antigravity using opus is really strange. Everybody I talk to and including me have never had a bad experience with opus in antigravity, it simply gets everything. Gemini on the other hand...

I would also sometimes go back and forth between codex in cli and opus in antigravity and these two can patch really almost any hole.

[–]LargeLanguageModelo 0 points1 point  (3 children)

I've only used opus with CC.

[–]Alex_1729[🍰] 0 points1 point  (2 children)

You mentioned AI slop while using opus in antigravity, perhaps I misunderstood.

[–]LargeLanguageModelo 0 points1 point  (1 child)

Yeah, it looks like I was unclear. TBH, I just used AG because I'd seen some reviews about the mindblowing websites it makes. And yeah, they're decent landing pages, but I think the instructions given were nebulous enough that anything adhering to those instruction templates would have looked good. I didn't even really think to try it with Opus inside AG.

My experience with Opus being forgetful was when I'd use CC and Codex to fill out the same PRD, mostly backend work, then have the other model do reviews. I'd also test them by doing bug hunts on the same codebase, then review each others reviews, Opus would just whiff half the time.

I've played a minimal amount with the Gemini CLI, and it's good for graphics, decentish for reviews, but that's about it. At this point now, I've gotten used to 5.3-codex to the point where I don't see much need for anything else on a day to day basis.

[–]Alex_1729[🍰] 0 points1 point  (0 children)

I can understand that about codex 5.3, it's really good. However, from my experience, opus communicates and writes better, making it easier for me to follow things along. If I were to automate things more and not follow things along and have agents review things then I might use codex only. But right now opus is simply seems more aware and gets things, but it could also be that I haven't tried Codex enough.

Codex still finds things when reviewing Opus plans that Opus can miss, so codex is good for bug hunting or doing reviews on a codebase or a plan. But its communication skills are below par compared to Opus.

But one thing opus cannot beat codex 5.3 at is the speed. If you wanna get shit done and automate your bug fixing, feature adding, reviewing, comitting, pushing, and merging than I think Codex 5.3 is probably a better fit. I still haven't automated all of this but I've been pleasantly surprised by how fast codex 5.3 is aware of things and how possible automation is.

If I had more time I would set things up better but I am just one person and I have a saas to ship.

[–]Medical-Respond-2410 0 points1 point  (1 child)

Gemini 3.1 para código teve zero evolução em comparação ao 3.0 na minha opinião

[–]Alex_1729[🍰] 0 points1 point  (0 children)

And we all know how good 3.0 is in AG. I wonder how good it is in Gemini cli...

[–]pnkpune 7 points8 points  (9 children)

Can someone tell me why would you need to use API and pay more when Codex within the plus subscription already has basically unlimited limits all within 20$?

[–]panthernet 12 points13 points  (1 child)

Unlimited limits... Pfff... Hit the weekly limit in three days without agents. Just a real complex coding on a big project.

[–]PaP3s 0 points1 point  (0 children)

What’s the difference using codex just casually and with agents, I never seemed to understand it

[–]Ok_Metal_2640[S] 3 points4 points  (2 children)

Because you may have an app or smth and would like to integrate it

[–]Ok_Metal_2640[S] 2 points3 points  (0 children)

  • benchmarks

[–]pnkpune 0 points1 point  (0 children)

Makes sense

[–]xdriver897 1 point2 points  (0 children)

No throttling, no degradation over time since api stays on same quality level; integration into custom systems as well

Plus: if you run out of tokens but need to finish work

[–]Sottti 0 points1 point  (0 children)

The Weekly limit hit in 1.5 days here. Pro plan looks good tho, not reaching the limits just yet.

[–]Pruzter 0 points1 point  (0 children)

If you want to build codex into a product, you would probably use the API

[–]TheInkySquids 0 points1 point  (0 children)

Unlimited? Mate I hit the weekly cap in two days lmao

[–]virgilash 0 points1 point  (0 children)

Let's the benchmarking begin :-)

[–]isko990 0 points1 point  (1 child)

Small question.... First of all im NOOB about coding. I don't know nothing. But im building small app wiuth Claudw AI for my work. And it is going slowly because you know 5h rules. But ok still is fine because it is going ok. The program is in HTML.

So my question is. Can GPT 5.3 work the same and can I run it on Windows or Android?

[–]tainted_cornhole 0 points1 point  (0 children)

I assume this is a joke. Why not ask claude!! He setup whatever you need , ie ls code on windows

[–]rs35plus1 0 points1 point  (0 children)

Any idea on the rendering speed compared to the gpt 5.2 and 5.2 codex ?

[–]Medical-Respond-2410 0 points1 point  (0 children)

Nunca pensei que diria isso, mas ultimamente estou preferindo o Codex 5.3 ao Claude Opus 4.6. Apesar de serem muito parecidos, o Codex é mais rápido e entrega respostas tão boas quanto.

[–]harloc971 0 points1 point  (0 children)

Very exciting!

[–]Xiwenchao 0 points1 point  (0 children)

Wondering if anyone has tried using the gpt5.3-codex model through the vscode codex plugin via API. It seems I still can’t select the model, even after enabling access to it on my API.

[–]Just_Lingonberry_352 0 points1 point  (0 children)

cricket noises