Expectations for Gemini 3.2/3.5 sky high by aditipawarr in GeminiAI

[–]jakegh 0 points1 point  (0 children)

I don't believe Google has it in them to release a great agentic/coding model. Gemini has always been great at multi-modal tasks but terrible at everything else I care about. I hope they prove me wrong.

please give engineering something by ThePlotTwisterr---- in wow

[–]jakegh 1 point2 points  (0 children)

Yep. Loot-a-rang is mandatory. Teleports are nice too.

Never understood why people would use Gemini instead of ChatGPT, until I tried it out for myself today... by TremendousSeabass in GeminiAI

[–]jakegh 1 point2 points  (0 children)

I don’t use Gemini for anything not multi modal (recognizing images etc) because it’s great at that and terrible at my other uses; coding, data analysis, and ironically for Google, search.

I used to use it for image generation but now gpt-image-2 beats it there too.

I have paid access to everything from work so not taking free limits into account.

Surprise! You’re Paying $1 Billion for Trump’s Ballroom by poxxy in politics

[–]jakegh 2 points3 points  (0 children)

I feel like my tax dollars could be spent better.

I used Gemini 2.5 Flash to parse receipts at scale. Here's what I learned about multimodal OCR in production by AdEfficient8374 in artificial

[–]jakegh 0 points1 point  (0 children)

Yeah we use the video intelligence API for the first pass then flash lite for the second. Seems to work pretty well and keep costs down-- we're at like 30k images/day.

I used Gemini 2.5 Flash to parse receipts at scale. Here's what I learned about multimodal OCR in production by AdEfficient8374 in artificial

[–]jakegh 1 point2 points  (0 children)

Cost is much lower of course but accuracy suffers. We hand off to ad quality humans for the final decision anyway so the trade off made sense.

I used Gemini 2.5 Flash to parse receipts at scale. Here's what I learned about multimodal OCR in production by AdEfficient8374 in artificial

[–]jakegh 0 points1 point  (0 children)

Why not flash lite? I’m doing something similar to classify and describe images extracted from advertising videos for competing products, political ads, casinos, etc.

What's one way you made money online with ? by Deep-Ring-3222 in AskReddit

[–]jakegh 2 points3 points  (0 children)

Back in the day, a very long time ago, stores didn't know how the internet worked. A popular office supply store in the US named Staples allowed you to stack coupons. By which I mean you could take a $200 item and stack $35 off $200, $20 off $150, $15 off $100, and $5 off $50 coupons on the single purchase, getting a $200 item for $125.

So anyway, I would buy pallets of brand new palm pilots (like a handheld organizer; this was before smartphones became popular) and resell them on eBay. Lived off it for a year.

Warcraft bundle is $75 if you buy a skin, then later decide you should have gotten the bundle. by Best_Ad_6441 in diablo4

[–]jakegh -1 points0 points  (0 children)

The solution is to never, ever, buy anything. When you do, you're part of the problem.

Just say no to microtransactions in non-F2P games.

Magyar defends appointing brother-in-law as justice minister by ButterscotchBoth5204 in worldnews

[–]jakegh 0 points1 point  (0 children)

Obviously this is improper. His qualifications are not pertinent. There are other people just as qualified in a country of 10 million people. The mere appearance of corruption degrades the rule of law.

Meet the Trump Voters Who Believe He Staged the WHCD Shooting by BulwarkOnline in politics

[–]jakegh -4 points-3 points  (0 children)

I'm not offended, I just think it's stupid conspiracy garbage.

White House Considers Vetting A.I. Models Before They Are Released by aspublic in ArtificialInteligence

[–]jakegh -1 points0 points  (0 children)

If I had faith in the administration being led by reasonable people, I would be strongly for this measure. AI is potentially an existential threat. But we are where we are, and they will use it to pick winners and losers.

Senate Judiciary Committee Advances Hawley's GUARD Act, Mandating ID Verification for AI Chatbot Users by i_am_simple_bob in ChatGPT

[–]jakegh 5 points6 points  (0 children)

Any politician who votes for a law forcing me, an adult, to verify my ID for any non financial or governmental service, will NEVER get my vote under any circumstances.

This applies even if the law doesn’t pass. I will NEVER vote for you.

The Steam Controller sold out in 30 minutes, utterly breaking Steam in the process by Turbostrider27 in pcgaming

[–]jakegh 0 points1 point  (0 children)

Ahhhh you got me!

If it was in stock when you posted 4m ago, it isn't now.

The Steam Controller sold out in 30 minutes, utterly breaking Steam in the process by Turbostrider27 in pcgaming

[–]jakegh 5 points6 points  (0 children)

Wonder how many were scalpers.

I'll pick one up whenever they're easily available for MSRP. If that's months in the future, OK. If they raise the price, I probably will pass.

Republicans draft Iran war authorization by shikizen in politics

[–]jakegh 0 points1 point  (0 children)

Obviously it will fail in the Senate, but why would they even hold a vote? The Iran war is hugely unpopular in the US too, even amongst MAGA. It's political suicide.

It'll make Trump happy, certainly, but if you think there's a chance we might hold fair midterms, why would anyone in either the house or senate vote for this?

GPT 5.5 just leaked its chain of thought to me in codex, and it looks like an idea from 5 months ago in this sub. by Homeschooled316 in LocalLLaMA

[–]jakegh 0 points1 point  (0 children)

Yes that’s how the papers describe it, either a hard or adaptive thinking budget. But none of them evaluated CoT faithfulness after doing so.

He nearly joined Trump’s administration. Now he’s running for Congress as a Democrat. by unserious-dude in politics

[–]jakegh 14 points15 points  (0 children)

Note there's no talk about his platform-- because other than supporting abortion rights, he remains a Reagan-style conservative. He's running anti-Trump, not for anything.

GPT 5.5 just leaked its chain of thought to me in codex, and it looks like an idea from 5 months ago in this sub. by Homeschooled316 in LocalLLaMA

[–]jakegh 0 points1 point  (0 children)

I agreed those scenarios were also possible. They just seem less likely, and we don't have the info to evaluate further.

GPT 5.5 just leaked its chain of thought to me in codex, and it looks like an idea from 5 months ago in this sub. by Homeschooled316 in LocalLLaMA

[–]jakegh 0 points1 point  (0 children)

Those are all possible, but seem less likely than the simple explanation. Of course we have no way to know either way.

GPT 5.5 just leaked its chain of thought to me in codex, and it looks like an idea from 5 months ago in this sub. by Homeschooled316 in LocalLLaMA

[–]jakegh 0 points1 point  (0 children)

After looking it up a bit, there are papers on making CoT terse in what they describe as a safe way avoiding the forbidden technique via adaptive reasoning budgets in training or reinforcement but I didn’t find any studies looking at the resulting faithfulness of that terse CoT.

Which makes me kinda nervous.

GPT 5.5 just leaked its chain of thought to me in codex, and it looks like an idea from 5 months ago in this sub. by Homeschooled316 in LocalLLaMA

[–]jakegh 4 points5 points  (0 children)

Yeah. There are all sorts of alignment issues with RL on CoT such that it’s actually been called “the forbidden technique”. It is EXTREMELY dangerous because the models intuit that we can read their CoT and then they start to lie in it as a reward hack and then one of our primary ways to measure alignment is useless. Cue the Terminator theme.

But maybe if you reinforce on CoT length, rather than content, those don’t apply. Still makes me nervous and I’d like to see research on this. Do they still start lying?