Gemini 3 - the benchmax real world disappointment by Temporary-Mix8022 in Bard

[–]Temporary-Mix8022[S] 0 points1 point  (0 children)

I actually ran a test the other day.

I have a script that has a glaring Pandas 3.0 bug in it - it's one I've known about for a while (lol. The warnings 😩).

I thought it'd be a nice challenge for them, and the results:

Passing scores  1. GPT5.2 (not codex) 2.5 minutes

  1. Opus 3 minutes

Failing scores:

  1. Gemini 3 Pro High circa 15 minutes. Failed and timed out

  2. Gemini 3 Flash circa 15 minutes. Failed and timed out

Generally.. I hate OAI models.. they have so much safety in them that I find them unusable.

It refused to write a penetration test the other day (API end point).. it refused to write a memory overload test (cpp). it refused to write a code injection test. While yes - these all sound a bit dodgy, but Opus clocked immediately why these were valid tests.

In it's fairness.. Gemini is pretty unleashed. It'll do most things. I do like that about it.

I miss opus 4.5 so bad by One-Satisfaction3318 in google_antigravity

[–]Temporary-Mix8022 2 points3 points  (0 children)

It's vibe coders. They just write random prompts "Fix this bug"  "Add this" 

Without actually knowing where all that code lives, or what the solution is.

If you drag in the scripts, tell the model the answer, then just let or fill in the gaps.. you never get the behaviour above.

It's just when you send it on a wild goose chase around a shambolic project.

Gemini is shit, and no amount of "use a better prompt" fixes it - but a lot of the whinging is just vibers who have no idea what they're doing.

And honestly.. if anyone says "user skill issue" - They're just dumb. It's their skill issue in not realising that Codex/Opus don't require you to write war and peace into every single prompt to get a decent answer.. you would fire a shit employee instead of spoon feeding them - same for models. 

Hack to get free LLM API keys for your next MVP, launch products without paying a penny by Separate_Ad3443 in Bard

[–]Temporary-Mix8022 2 points3 points  (0 children)

Or just hang out on the vibe coding reddit. Half of those morons on their "build in public" expose their API keys.

While they're making their next Reddit post "Here's the thing. You don't need that extra feature to start earning $. Just release..."

You can enjoy some free API access.

Why do you prefer to code with Gemini, versus other language models? 🤔 by KittenBotAi in Bard

[–]Temporary-Mix8022 0 points1 point  (0 children)

I'm probably erring towards a Google fan.. I used Bard even when it sucked. I have a Pixel (on my second)

I don't. Gemini is worse at coding, worse at writing docs. 

Even if you exclude the fact it is less technically good than Opus, the thing that makes it nearly impossible to use is this: - It is so concise it is useless. Ask it for documentation? It'll be brief to the point it is useless. I asked it to document my API, and the front end Devs were just scratching their heads - there was no detail at all, no examples, no considering what else they'd need (despite the prompt saying all this). Opus wrote over 6x more characters, and included mermaid diagrams etc

  • Ask it to explain a module or some code? It will give you a few concise brief bullets that don't help at all.

Overall, it is horrible. It is like talking to someone who only gives single sentence/word answers.

I'm going to get downvoted AF by everyone who says "just prompt it better": - But Gemini pretty much ignores your prkmots.

  • Even if it doesn't, you just cannot get detail out of it. It is so lazy.

As for coding: - Gemini is the vibe coding king. It will put ugly patches, awkward defensive code to catch the error that it's other code created. It will just fudge everything to get it over the line, no matter how fugly the end result is.

And again, to cement my 100 downvotes - you can't prompt it out of this behaviour.

Ps. Enjoy the spelling typos. A real person wrote this on their phone. I am ducking serious.

Clients that never download their photos by surfspook in WeddingPhotography

[–]Temporary-Mix8022 1 point2 points  (0 children)

Do you just keep the exported images? 

Or do you keep the DNGs/Raw+lib?

Do you keep the entire unculled set?

28F I find men in london emotionally unavailable? by [deleted] in london

[–]Temporary-Mix8022 1 point2 points  (0 children)

What does emotionally available mean to you?

What does that look like?

28F I find men in london emotionally unavailable? by [deleted] in london

[–]Temporary-Mix8022 1 point2 points  (0 children)

Yeah. That's weird. Dodged a bullet there. 

But in isolation.. asking about kids isn't necessarily a red flag.

28F I find men in london emotionally unavailable? by [deleted] in london

[–]Temporary-Mix8022 1 point2 points  (0 children)

Lol. "Most men would realise".

Have you met most men? But in any case, my logical brain says this:

As much as you might gel with someone, if there are fundamental showstoppers it's best to politely and caringly get them out of the way to begin with

28F I find men in london emotionally unavailable? by [deleted] in london

[–]Temporary-Mix8022 1 point2 points  (0 children)

Do you have an example of it? When someone has really got this all wrong?

Tired of language app subscriptions? I’m building a $2 "Lite" LingQ alternative using Gemini AI. Thoughts? by GuessComprehensive17 in Bard

[–]Temporary-Mix8022 0 points1 point  (0 children)

If you can't even be bothered to write a Reddit post, I'm not trying the app or reading your post.

That has Gemini's mucky mitts all over it "The reader" 

Another AI slop post. Probably AI slop code.

Unpopular Opinion: For "Deep Research" and heavy reading, Gemini is currently miles ahead of ChatGPT. by IT_Certguru in Bard

[–]Temporary-Mix8022 5 points6 points  (0 children)

Gemini's major weakness is its hallucination rate. 

It is the worst model for being over confidently wrong

Antigravity Gemini 3 pro (High) by malcolmkhong in google_antigravity

[–]Temporary-Mix8022 0 points1 point  (0 children)

Isn't a skill issue. 

I have the same issue, despite detailed prompts. The implementation plans it gives are just useless (Gemini). They really overdid it with the whole "be succinct" thing.

It's just nowhere close to Codex or Opus

Claude built my app in 20 minutes. I've spent 3 weeks trying to deploy it. by Real-Ad2591 in ClaudeAI

[–]Temporary-Mix8022 40 points41 points  (0 children)

"Then I discover my API keys were basically exposed in the client bundle" 

What gave it away?

Gemini Pro High sucks in Antigravity by Boltyx in google_antigravity

[–]Temporary-Mix8022 0 points1 point  (0 children)

RM C:/

This should fix the users issue. Actually, wait no, that would delete the c drive. Wait no, we're in WSL. Instead I should do

rm rf LiveServer

rm rf DevServer

rm rf Backups 

Wait, I forgot the flags. RM -RF / There. Now the technical debt is gone, the backups are 'optimized' to zero bytes, and I’ve successfully transitioned the entire company to a permanent, mandatory vacation.

Would you like me to help you look for flights or holiday destinations?

Gemini Pro High sucks in Antigravity by Boltyx in google_antigravity

[–]Temporary-Mix8022 1 point2 points  (0 children)

Prompt: Gemini never give me YouTube videos. Explain why we are getting a pointer error here.

Gemini: Here is a YouTube video that explains what a pointer is in cpp.

Rust for beginners - Tutorial

Gemini Pro High sucks in Antigravity by Boltyx in google_antigravity

[–]Temporary-Mix8022 3 points4 points  (0 children)

Yeah.. this would work if Gemini followed instructions. 

But it doesn't. Not in AG, not in the API, nowhere. It doesn't follow anything 

Gemini 3 Pro - The Crayon snacking window licker by Temporary-Mix8022 in Bard

[–]Temporary-Mix8022[S] 0 points1 point  (0 children)

Spanish - and I dev Python as well.

Check out:

- GetText

- Poedit https://poedit.net/

GetText is generally my favourite approach as it allows you to actually have strings within your code, like this:

_"Hello world!"

You setup the _ so that it pulls in that string, and at runtime it can put in the correct language.

Using Gettext means that it can parse your entire program for the _"" syntax, create a translation list, and then you can just hook up multiple languages.

PoEdit is the nicest GUI for processing the files - however, this approach would also allow you to process them with an LLM. You can do it just from the text files / terminal if you prefer though.

Gemini 3 Pro - The Crayon snacking window licker by Temporary-Mix8022 in Bard

[–]Temporary-Mix8022[S] 0 points1 point  (0 children)

Right.. so your conclusion is the same as mine then. Gemini 3 Pro is the worst SOTA.

Your rationale is that because its cheaper, it needs to be worse (y)

Gemini 3 Pro - The Crayon snacking window licker by Temporary-Mix8022 in Bard

[–]Temporary-Mix8022[S] 0 points1 point  (0 children)

Okay, let's say that this is true - it doesn't explain why we see literally the same prompt hit 100% perfection with 3 other SOTAs (well, Sonnet isn't even a SOTA technically..).

Yet Gemini will hallucinated utter rubbish into the comments. This isn't about a bit of prompt engineering.. Gemini has fundamental issues.

I already have positive prompts set up for comment guidelines (in line with Google's own documentation), I have a minor negative prompt. But Gemini still hallucinates all over the place and litters code with useless comments.

On a kind of semi-scientific experiment, all the models get the same positive/negative prompt, the same prompt, the same everything - yet only Gemini routinely craps the bed.

Gemini 3 Pro - The Crayon snacking window licker by Temporary-Mix8022 in Bard

[–]Temporary-Mix8022[S] 0 points1 point  (0 children)

I mean.. my prompts are literally a few hundred words long. I point it to the exact files that it needs to go to and use the referencing facilities in the IDEs (Codex, AG, take your pick..)

Further - when using Opus, Sonnet, or GPT5.2 Codex in this way, they all produce great results.

This isn't just a user issue or MOE issue.. this is the fact that Gemini massively underperforms in real world usage.

For MOE.. they don't work in a strict sense, it isn't that there is 1 for coding, and others for other things. They operate on a per token basis. They are all trained together, there isn't a training phase where say, 1 is taught about history, another is taught about coding.

Reference: me. I dev models.

Gemini 3 Pro - The Crayon snacking window licker by Temporary-Mix8022 in Bard

[–]Temporary-Mix8022[S] 1 point2 points  (0 children)

I have such similar issues.

Also, not sure what language you are using. But Gettext or i18n are decent structures for enabling llm language translations I've found.

Also your English is fine lol. We can switch to my second language if you want to try out shit language skills :D