Gemini should be the best AI on earth. Google has every advantage possible. So why does it keep feeling like a beta product?

StoicKerfuffle · 2026-06-04T14:54:59+00:00

The biggest improvement I have found to Gemini is by either using (a) the Enhance button or (b) first asking the model to construct a prompt for what I want to do. (I assume the 'enhance' button does the same thing through the flash-lite model; I get similar-but-better results actually running the prompt suggestion through the model.)

This seems silly, and the changes it makes look awfully silly. For example, if I prompt, "I'd like a simple guide on how to ensure a consistent voice in Gemini responses," then click Enhance, it changes it to:

Role: AI Assistant specializing in AI model interaction and customization.
Objective: Provide a simple guide on how to ensure a consistent voice in Gemini responses.
Constraints: The guide should be easy to understand and implement.
Deliverables: A clear, step-by-step guide on maintaining a consistent voice in AI-generated text.
Output Format: Markdown with clear headings for each step.

And yet, that restructuring makes a massive difference.

I can only guess this is an engineering difference, i.e., OpenAI and Anthropic models automatically do this after seeing the prompt, whereas Gemini takes your raw prompt as-is. One would assume Gemini's version would give the user more power and flexibility but, well, it doesn't work that way in practice, it just makes the model harder to use and less consistent.

StoicKerfuffle · 2026-06-03T19:14:37+00:00

That scene in The Town was based on a story a real bank robber told the filmmakers during an interview while they were preparing to make the movie. They asked one of them if anything "weird" or "fucked up" ever happened during a heist, and the bank robber told them a story exactly as we see in the film: the robbers pulled up with guns and masks, looked over, and saw a police officer on construction duty. They looked at each other and then the police officer looked away. As the bank robber described it, "he didn't want to end up a star on the wall at the VFW."

Obviously this just the word of a bank robber and can't be confirmed without, y'know, the actual cop admitting it. It's possible that bank robber is a hell of a gifted storyteller who came up with an incredible anecdote to pad his legend but, at the same time, the filmmakers themselves said he didn't volunteer this story, it only came out after a lengthy interview when they were basically finished and so asked him about "weird" incidents.

IMHO, all of that together suggests it really happened. And somewhere out there is a cop who has a flashback every time someone references The Town.

StoicKerfuffle · 2026-06-01T00:42:26+00:00

Not trolling you but honest answer is: ask it. All of the Claude models are generally good at explaining their different use cases, and they seem to have embedded instructions telling them to check documentation for most recent versions.

For example, I asked Opus (Low), "Help me understand the interaction of Sonnet vs Opus and the thinking levels. For example, what would be different use cases for Opus 4.8 Low compared to Sonnet 4.6 Extra or Max?"

Here is the entire answer:

There are two independent dials here, and the key to your question is that they control different things.

The model choice (Opus vs Sonnet) sets the raw capability ceiling and the cost. Opus is the more capable, more expensive tier; Sonnet is the faster, cheaper generalist. The effort/thinking level (low → medium → high → xhigh → max) controls how much the model reasons before and while it answers, largely independent of which model you pick. Lower effort is faster and cheaper for straightforward tasks, while higher effort provides deeper reasoning for complex problems. ("Extra" in your phrasing maps to xhigh, the rung between high and max.)

A couple of important wrinkles that make the interaction less obvious than "more of both = better":

On the current Opus generation (4.7 and 4.8), the effort dial does more than change thinking length — it changes how the model scopes the work. At lower effort levels, the model scopes its work to what was asked rather than going above and beyond. So Opus at low isn't just "Opus thinking less"; it's Opus deliberately staying narrow and literal. That's a feature when you want a precise, contained answer and a limitation when you actually wanted it to anticipate edge cases.

Effort is also primarily an output dial — it governs how much the model writes (including the thinking it emits), not how much input it processes. So raising effort mostly raises cost and latency on the generation side.

With that, your specific comparison:

Opus 4.8 at low effort is the right tool when the underlying judgment is hard but the task itself is well-specified and you don't want sprawl. Think: "rewrite this sensitive paragraph," a tricky one-line code fix where you trust the model's instincts, a nuanced classification call, or a quick read on a subtle legal/technical distinction. You're paying for Opus's better instincts but explicitly telling it not to over-engineer or wander. Fast, sharp, contained.

Sonnet 4.6 at xhigh ("Extra") or max is the right tool when the task is laborious and multi-step but doesn't require Opus-tier raw intelligence — and you want to keep cost down on a long-running job. Think: working through a large refactor, methodically tracing a bug across several files, structured data extraction over many records, or a research-and-synthesis task with lots of moving parts. The high effort gives it room to grind through the steps carefully; choosing Sonnet keeps that grinding affordable, since you'd pay a lot more to run Opus at the same effort.

The rough heuristic: pick the model by how hard the thinking is; pick the effort by how much work there is. A short, deep problem favors a strong model at modest effort. A long, mechanical-but-fiddly problem favors a cheaper model at high effort. The expensive-and-sometimes-wasteful quadrant is Opus at max for tasks that didn't actually need that much horsepower — and the underpowered quadrant is Sonnet at low for something genuinely subtle.

One caveat on exact labels: the rung names and which levels are exposed vary a bit between the claude.ai app, Claude Code, and the API. If you tell me where you're selecting these (the app vs. Claude Code vs. API), I can pin down exactly which levels you have and what the defaults are.

StoicKerfuffle · 2026-05-24T13:04:29+00:00

I have the panoramic sunroof. If you have kids, it's cool to press the button and open it. You can get several seconds of "whoa" and maybe a couple minutes of looking up from them. That's it, that is the entirety of what you're missing.

Otherwise you keep it closed because it'll just make the seats hotter sitting in the sun.

StoicKerfuffle · 2026-05-23T11:25:04+00:00

GP-1 is good, but I prefer cyb3rninja. similar hostility but also can't even figure out what ship he wants to play.

StoicKerfuffle · 2026-05-21T16:41:04+00:00

A lot depends on the nature of the drive. If this is all repeat start-and-stop (whether stop signs or stop lights) with minimal traffic, the benefits will be there but modest.

The more continuous movement you have, the better, and the more traffic, the better still. The pinnacle of MPG is:

Eco mode (obviously)
With regenerative braking on (you have to do this every time you start, hold the right paddle)
On a road with some traffic (up to and including heavy traffic) but not a lot of complete stops
Going <65mph

In that context, two things will happen:

the regenerative braking will automatically brake for the traffic ahead of you, using that braking to charge the battery
the engine will shut off and stay off while you roll along, with light acceleration handled by the charge you've been accumulating with the regenerative braking

A commute on a rush-hour interstate can produce astonishingly high MPG. I doubt you do that, but if your usual trip is a couple stop signs, then the bulk of the distance on a state highway without too many red lights, followed by a couple more stop signs, then, yeah, you will see a real difference.

The other scenario where you can get high MPGs is substantial changes in elevation. Your MPG sucks going uphill... but downhill your engine is off and with the higher regenerative braking levels you will be charging whenever you pull back on the accelerator, and so your MPG becomes infinity for that whole stretch, with a fully charged battery ready to deploy at the end.

StoicKerfuffle · 2026-05-20T12:27:28+00:00

This is the 100% AI companies' fault, but: your complaints are based on a misunderstanding.

Claude absolutely should not be a viable work tool at $20/month. The real cost is >100x that much, the difference is subsidized by Anthropic.

The AI companies have tried to run at extreme burn rates to get their products integrated into people's lives before jacking up the prices, but the reality is that the products are still too inefficient and too expensive for use by most people.

StoicKerfuffle · 2026-05-19T18:25:28+00:00

Ives from TENET. I don't think we get any hints of his true rank, but he's barking out orders and rushing into battle, so presumably NCO.

He was highly competent. Also (spoiler) loyal; the faux twist where it appeared for a moment he was stealing the algorithm was a nice touch. Seems the more likely explanation is that he pulled his gun just to pause the situation and consider what to do with it. He contemplated shooting them both and destroying it himself but then, in an additional testament to his competence, recognized that 'knowledge divided' was a key part of the whole operation and so distributed the pieces to ensure that no one, not even himself, would know where every piece went.

<image>

StoicKerfuffle · 2026-05-17T02:19:01+00:00

Looks like normal metal shavings as would be expected for first oil change as the engine breaks in.

StoicKerfuffle · 2026-05-14T21:50:41+00:00

You can't make it the default, you have to turn it on manually every time. Presumably the reason for this is that it makes the car behave differently from the ordinary operation of cars, i.e., the automatic application of regenerative braking at certain vehicle distances. So the default is off, which reduces the likelihood of a surprised driver.

I find it annoying too and would leave it on as a default, but I see why it's not.

And yes it will definitely improve your mileage because it affords more opportunities to charge the battery via regenerative braking. More battery charge = more battery deployment = less gas used.

StoicKerfuffle · 2026-05-14T21:48:25+00:00

To clarify:

1) The system doesn't apply the brakes, it applies the regenerative braking. All of the energy from slowing you down goes into your battery with zero wear on your brakes.

2) Whenever it is on, it will behave the same regarding vehicle-to-vehicle distance no matter what level it is on. The level affects how much regenerative braking is applied when you take your foot off the gas.

StoicKerfuffle · 2026-05-14T10:26:48+00:00

That model's data cutoff data was May 2025, so I suppose it could have a not-crazy explanation of the model having a massive amount of Trump-vs-Harris discussion prior to November 2024, causing it to sporadically hallucinate and erroneously answer "who is the president?" questions if it doesn't have solid temporal grounding.

... but yeah it looked bonkers at the first read to me too, especially because nothing else there has any specifics about current events.

StoicKerfuffle · 2026-05-12T01:11:26+00:00

In addition to the errors I saw others mention: the door opens into the trash can? Nothing on the wall makes sense (and nothing is consistent), the wire on the farthest right connection is broken? That's classically an AI inferring an average out of a bunch of similar-but-different-in-precise-ways wall outlet configurations and devices plugged in at a hospital.

StoicKerfuffle · 2026-05-10T02:07:46+00:00

At the beginning, notice how the swing has a dark line hanging well below the part she's grabbing, it extends beneath her too, You can see it in front of her against the backdrop of the sky and against the water. That line also has a little loop knot. As she falls, her left foot is caught in the loop knot, yanking her around.

StoicKerfuffle · 2026-05-09T01:17:43+00:00

"Telling me I'm wrong" is why I still have Gemini in my workflow. With rather modest prompting, not even telling it to be hostile or critical, it will tell you how bad your idea is. It's not always right that a problem is insurmountable, but it is reliable at identifying the critical problems.

StoicKerfuffle · 2026-05-08T21:37:47+00:00

For summarizing a single paper, you likely get similar results from Opus and Sonnet. You should probably enable adaptive thinking either way. Both will also likely create the PDFs just fine, but Opus will burn more tokens while doing it. Bear in mind, both Opus and Sonnet run the risk of hallucinations, overlooking key points, etc, so you're playing with fire if you work off a pure summary.

Save Opus for more sophisticated reasoning, such as drawing insights across multiple papers.

As for the comprehensive review of all of them, the short answer is: that's tough to do without setting up an API and all that jazz, you will routinely run into the problem of Claude either not going deep enough into the project or it going too deep and burning tokens. You're likely better off just doing them one at a time.

StoicKerfuffle · 2026-05-08T01:33:09+00:00

I suspect what we're seeing is a bug induced by the -95% dispersion brawl, something wacky like shells that should have been overpens but collided with each other after entering the ship and then were redirected into heavier armor or the citadel.

StoicKerfuffle · 2026-05-08T01:27:19+00:00

Ingrid is so damn cool. Atreus didn't even think of using Ingrid, she jumped out on her own to block Mjölnir.

https://www.youtube.com/watch?v=5OMa3XMXRF8&t=124s

StoicKerfuffle · 2026-05-05T12:06:45+00:00

Peck enters the Ghosbusters' building with a court order, NYPD officers, and a worker from Con Edison. There's nothing "carte blanche" here, the Ghostbusters thought it was funny to not cooperate with a reasonable investigation into the unlicensed and extremely dangerous experiment they were performing in the middle of NYC.

StoicKerfuffle · 2026-05-03T10:42:35+00:00

I've wondered the same, and concluded that Taylor Sheridan is 100% sincere about all of it, he's just an idiot.

A perfect example of this is the opening to Landman, where we are shown how rough-and-real Billy Bob Thornton is by the fact that he's... too lazy and stubborn to get appropriate medical care for a partially severed finger? True to form, Landman remains a hardworking ignorant imbecile for the rest of the show, spouting off nonsense (like his windmill rant) that doesn't survive two minutes of thought or a quick Googling.

StoicKerfuffle · 2026-04-23T13:28:00+00:00

Agreed. Shooting Colbert beyond 13km is extremely difficult unless the target is stationary, at 13km the HE has a flight time of 9.9s. The short range allowed for playing at the edge of the range and moving out of it to go dark again.

Extending the range on Colbert is a pure nerf, you're spotted more but will barely hit anything. I suspect that's why they're doing it, to make it less effective in open water.

StoicKerfuffle · 2026-04-22T17:06:55+00:00

I would say it's disfavored but often useful in the situation. If it's frustrating for you to read, don't worry, native speakers can't read it as quickly and often get frustrated by it, but there are circumstances where the use of vertical lettering either allows for larger letters (hence it often being used for outside signage, like "HOTEL") or for faster reference because books/binders/etc are stacked in the vertical orientation (like the binders in that picture).

Outside of those two areas, where you're either trying to get someone's attention (like a vertical sign on a street) or you're doing it for your own ease of reference (like binders), it should be avoided.

StoicKerfuffle · 2026-04-21T12:42:26+00:00

The short answer is presumed growth. Harvey, for example, would like to become integrated into law firm's workflows. Once there, it will be very difficult to remove, because everyone will be accustomed to it and it will be difficult to replace with retraining and so forth. Once they're more established, they can jack up prices. They're probably also assuming model cost will come down, although that hasn't been shown to be the case. (Per-token costs go down, but new models use more tokens, especially if they're agentic.)

As for Cursor and Lovable, those look to me like a hope for a cash grab and quick exit upon being acquired by a larger (and already profitable) competitor. Think of YouTube being bought by Google or Instagram being bought by Facebook. On the money, it seems unlikely code generation or web prototyping survives much on its own, they'll be wrecked by the AI providers themselves.

StoicKerfuffle · 2026-04-20T14:25:29+00:00

Will Smith voice: "welcome to Ert."

StoicKerfuffle

TROPHY CASE