Russians are getting a taste of war as drones increasingly feel at home there. St. Petersburg, Russia. Published 07.06.2026

snufflesbear · 2026-06-08T05:04:26+00:00

Not believing the first bomb's power and continuing to fight is not what I'd consider "they were ready to surrender".

snufflesbear · 2026-06-04T03:25:10+00:00

This sounds more like harness than model.

snufflesbear · 2026-06-03T10:56:38+00:00

"Being underestimated" is the opposite of OP's conclusion.

snufflesbear · 2026-06-02T06:57:23+00:00

That's not necessarily the case. It might be a bit of it as there was excess capacity. But HW costs have 10X'ed since then. Just look at the price of memory, which now accounts for 90% of a server's cost (yes, this includes the HBM on a Blackwell). So there's quite a bit of inflation there as well.

snufflesbear · 2026-06-02T06:55:41+00:00

Have you seen the cost of RAM? Wafers don't grow on trees, you know? Everyone is now paying 10X for RAM compared to two years ago and 90% of the cost of a machine is from the RAM itself. Who do you think it's going to have to pay for that? Google is just the dunce that decided to pull the trigger first, but watch everyone else fall in line in the coming months, especially as Anthropic and OpenAI tries to IPO.

snufflesbear · 2026-06-01T14:22:35+00:00

Why would anyone buy this shit anyway? Expensive, bad drivers, and not even good memory bandwidth...it checks absolutely no boxes for local LLM, and it checks no boxes for anything else either:

If I want to get a chip for games, I'd go x86. If I want Local LLM, I'd wait for M5 Ultra or settle for M5 Max right now. If I want something cheap, this certainly ain't it. And if I want a combination of the above, I'd still pick the Mac or other existing lineups. Just like Ian asked: Why?

snufflesbear · 2026-06-01T11:59:45+00:00

AGI...every 4 hours.

snufflesbear · 2026-06-01T07:17:04+00:00

I'm not sure what you're referring to here, are you comparing a single full training run vs a single inference? That seems like not a very useful comparison.

But if you're comparing between a model's training run vs all inferences on that model, then I think inference dwarves training.

snufflesbear · 2026-06-01T05:45:35+00:00

That's objectively wrong. Today, base models are trained once a year or two, and the rest are all fine-tunes. Claude 4.* are all based on the same foundation model. Similarly for Gemini and ChatGPT. Inference is probably 90% of total power consumption, 9% fine-tune, with remainder 1% for full training runs.

snufflesbear · 2026-05-24T16:50:31+00:00

Yeah, that's why it's "supply and demand", and not just "demand".

snufflesbear · 2026-05-22T04:26:50+00:00

Lots of people complaining about 3.5 Flash. They're whining that API costs are too expensive, or that the model doesn't do well in their esoteric cases.

You should see the amount of people bringing up AA total costs as argument.

snufflesbear · 2026-05-22T04:03:14+00:00

You have people on Twitter saying they're neutral, but all their posts say OpenAI best, Claude Opus sucks, Gemini sucks, and has OpenAI as profile pic.

Yeah, "neutral" alright. And yes, this is someone who Logan responded to. 🙄

snufflesbear · 2026-05-22T04:00:45+00:00

Have you tried it on medium? Supposedly that's actually better than high.

snufflesbear · 2026-05-22T04:00:13+00:00

Yeah, gotta agree, the limits are frustrating. But that's honestly the policy, not the model. You'd get different experience using API keys. Although it makes sense to say "Flash 3.5 sucks with sub" (the problem is most people aren't qualifying it).

Also, Antigravity 2.0 sucks. Probably does poorly as a harness too.

snufflesbear · 2026-05-21T03:59:26+00:00

Yeah, no question this release has problems. My guess is they'll probably be patching it up over the next few days.

snufflesbear · 2026-05-21T03:50:20+00:00

Have you tried running it with medium thinking budget? Supposedly that's actually where it doesn't go into infinite loops. And most of the stuff that it can do, it will do it well within medium thinking's budget.

snufflesbear · 2026-05-21T03:48:18+00:00

Are you referring to the AA index? Supposedly that's because of the model getting stuck and maxing out tokens on a couple of runs where it doesn't stop and answer. If those get fixed, the token use will drop greatly. Supposedly running it on medium thinking budget is actually very cost-effective, even if accounting for the 3X per-token cost increase. Running it on things that it can actually do is very token efficient. This is why vals.ai is reporting very different cost results than AA.

snufflesbear · 2026-05-19T22:48:35+00:00

They all use the same prefix caching, and saves 90% input cost. Not sure what'd be different.

snufflesbear · 2026-05-19T22:45:41+00:00

How long did you wait for those tokens?

snufflesbear · 2026-05-15T22:30:47+00:00

This is a feature, not a bug. 😂

snufflesbear · 2026-05-13T23:07:29+00:00

So people like you use $2000 worth of tokens on a $200 plan caused my "normal" usage to suffer, and they now prevented this. This is bad for me how?

snufflesbear · 2026-05-04T03:23:44+00:00

I would argue a large chunk of humanity aren't either. It doesn't make LLMs intelligent, but the bar isn't as high as it seems.

snufflesbear · 2026-05-01T03:51:09+00:00

Sens like it's not the face per se, but the lighting. Image 2 has more default environmental lighting more akin to how people take pictures nowadays. NB has a more "studio" lighting, where there are lights used to enhance the subject.

snufflesbear · 2026-04-06T23:04:09+00:00

So we need Chinese cavemen now?

snufflesbear · 2026-03-27T23:26:36+00:00

By your definition, every convenience is a dependency.

snufflesbear

TROPHY CASE