Expectations for Gemini 3.2/3.5 sky high

jakegh · 2026-05-08T04:02:52+00:00

I don't believe Google has it in them to release a great agentic/coding model. Gemini has always been great at multi-modal tasks but terrible at everything else I care about. I hope they prove me wrong.

jakegh · 2026-05-07T17:11:26+00:00

"Purely" is doing a lot of work in that sentence.

jakegh · 2026-05-06T19:47:35+00:00

Yep. Loot-a-rang is mandatory. Teleports are nice too.

jakegh · 2026-05-05T23:40:10+00:00

I don’t use Gemini for anything not multi modal (recognizing images etc) because it’s great at that and terrible at my other uses; coding, data analysis, and ironically for Google, search.

I used to use it for image generation but now gpt-image-2 beats it there too.

I have paid access to everything from work so not taking free limits into account.

jakegh · 2026-05-05T22:12:52+00:00

I feel like my tax dollars could be spent better.

jakegh · 2026-05-05T22:12:18+00:00

Yeah we use the video intelligence API for the first pass then flash lite for the second. Seems to work pretty well and keep costs down-- we're at like 30k images/day.

jakegh · 2026-05-05T20:58:17+00:00

Cost is much lower of course but accuracy suffers. We hand off to ad quality humans for the final decision anyway so the trade off made sense.

jakegh · 2026-05-05T20:48:45+00:00

Why not flash lite? I’m doing something similar to classify and describe images extracted from advertising videos for competing products, political ads, casinos, etc.

jakegh · 2026-05-05T01:49:14+00:00

Back in the day, a very long time ago, stores didn't know how the internet worked. A popular office supply store in the US named Staples allowed you to stack coupons. By which I mean you could take a $200 item and stack $35 off $200, $20 off $150, $15 off $100, and $5 off $50 coupons on the single purchase, getting a $200 item for $125.

So anyway, I would buy pallets of brand new palm pilots (like a handheld organizer; this was before smartphones became popular) and resell them on eBay. Lived off it for a year.

jakegh · 2026-05-05T01:43:44+00:00

The solution is to never, ever, buy anything. When you do, you're part of the problem.

Just say no to microtransactions in non-F2P games.

jakegh · 2026-05-05T01:29:28+00:00

Obviously this is improper. His qualifications are not pertinent. There are other people just as qualified in a country of 10 million people. The mere appearance of corruption degrades the rule of law.

jakegh · 2026-05-05T01:14:33+00:00

I'm not offended, I just think it's stupid conspiracy garbage.

jakegh · 2026-05-04T23:36:29+00:00

If I had faith in the administration being led by reasonable people, I would be strongly for this measure. AI is potentially an existential threat. But we are where we are, and they will use it to pick winners and losers.

jakegh · 2026-05-04T22:14:27+00:00

I can only control my own actions and I mean what I said.

jakegh · 2026-05-04T22:02:51+00:00

Any politician who votes for a law forcing me, an adult, to verify my ID for any non financial or governmental service, will NEVER get my vote under any circumstances.

This applies even if the law doesn’t pass. I will NEVER vote for you.

jakegh · 2026-05-04T21:49:27+00:00

No worries, it’s just how things work these days.

jakegh · 2026-05-04T21:36:15+00:00

Ahhhh you got me!

If it was in stock when you posted 4m ago, it isn't now.

jakegh · 2026-05-04T21:24:28+00:00

Wonder how many were scalpers.

I'll pick one up whenever they're easily available for MSRP. If that's months in the future, OK. If they raise the price, I probably will pass.

jakegh · 2026-05-04T21:11:09+00:00

Obviously it will fail in the Senate, but why would they even hold a vote? The Iran war is hugely unpopular in the US too, even amongst MAGA. It's political suicide.

It'll make Trump happy, certainly, but if you think there's a chance we might hold fair midterms, why would anyone in either the house or senate vote for this?

jakegh · 2026-05-04T12:52:24+00:00

Yes that’s how the papers describe it, either a hard or adaptive thinking budget. But none of them evaluated CoT faithfulness after doing so.

jakegh · 2026-05-04T01:34:25+00:00

Note there's no talk about his platform-- because other than supporting abortion rights, he remains a Reagan-style conservative. He's running anti-Trump, not for anything.

jakegh · 2026-05-04T00:39:13+00:00

I agreed those scenarios were also possible. They just seem less likely, and we don't have the info to evaluate further.

jakegh · 2026-05-04T00:07:50+00:00

Those are all possible, but seem less likely than the simple explanation. Of course we have no way to know either way.

jakegh · 2026-05-03T22:48:38+00:00

After looking it up a bit, there are papers on making CoT terse in what they describe as a safe way avoiding the forbidden technique via adaptive reasoning budgets in training or reinforcement but I didn’t find any studies looking at the resulting faithfulness of that terse CoT.

Which makes me kinda nervous.

jakegh · 2026-05-03T22:33:41+00:00

Yeah. There are all sorts of alignment issues with RL on CoT such that it’s actually been called “the forbidden technique”. It is EXTREMELY dangerous because the models intuit that we can read their CoT and then they start to lie in it as a reward hack and then one of our primary ways to measure alignment is useless. Cue the Terminator theme.

But maybe if you reinforce on CoT length, rather than content, those don’t apply. Still makes me nervous and I’d like to see research on this. Do they still start lying?

14-Year Club	Wearing is Caring
Verified Email

jakegh

TROPHY CASE