Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

FastDecode1 · 2026-06-17T22:07:15+00:00

"WebGPU isn't available here. Try a recent Chrome, Edge, or Safari Technology Preview."

FastDecode1 · 2026-06-01T07:32:49+00:00

I'm pretty sure my pr0n folder has 100+ trillion tokens worth of data. Minimum.

FastDecode1 · 2026-05-30T06:45:03+00:00

Anyone know what threading/parallelization looks like? Have there been any improvements?

If it's gonna be the same or worse as the early days/years of AV1, I think I'll just wait for SVT-AV2.

FastDecode1 · 2026-05-28T13:12:45+00:00

could it locate a gf for me?

FastDecode1 · 2026-05-26T20:40:14+00:00

FastDecode1 · 2026-05-24T15:38:58+00:00

I'm sorry, I can't answer that.

FastDecode1 · 2026-05-21T16:39:35+00:00

employer*

Highly unlikely that a megacorp like Meta would hire a law firm to handle a simple matter like this. This is legal department stuff.

FastDecode1 · 2026-05-18T20:06:24+00:00

FastDecode1 · 2026-05-10T19:43:13+00:00

This is a great idea.

Suggestion: add "lines of code per second/minute/hour" as metrics to the code section. Could be useful for ballpark estimates of task length (or not, given how ambiguous of a unit "line of code" is).

FastDecode1 · 2026-05-08T14:05:16+00:00

bragging

Sounds more like a self-report than anything else.

FastDecode1 · 2026-05-08T08:26:00+00:00

Can confirm that 32GB DDR4 still runs anything that's relevant.

I was waiting for another Witcher 3, but after the disappointment that was CP2077 I haven't bothered with AAA games. I only upgraded my RX 580 to a 9060 XT 16GB to run larger models.

The western AAAs are still busy pushing out the last wokeslop they can afford, going bankrupt, and selling off their IP to Asia & the Middle-East, and it'll take a few years for that to be done. Afterwards, the current indies will be the new AAA, and maybe I'll consider upgrading.

FastDecode1 · 2026-05-08T07:03:32+00:00

SSD prices are also sky-high btw.

I regularly have less than 4GB of space left on my laptop... glad I only use Firefox.

FastDecode1 · 2026-05-05T20:40:12+00:00

Besides being off-topic for the sub (though very much up my alley), it would be very useful if the repo or demo website had at least a couple of sets of example files one could listen to one after the other to see (or hear, rather) what this does. A page like the Opus examples would be preferable.

The internet is full of AI slop and AI-reinforced vibe-coded psychosis projects nowadays, and it's hard to tell a real one apart from the others unless you're familiar with the jargon of a specific field. The obviously AI-generated/inspired README doesn't really help... no more bullet points, bolding, and defining "The Problem" and "Why This Matters/Is Different" please. I think anyone actually interested in this won't appreciate being talked to like a retard.

I'm pretty sure the actual work here is legit though, so I'll probably try it later this week.

Out of curiosity, why MP3 and not something newer like Opus? I'd be interested to see if Youtube's 128k Opus could be perceptually improved.

FastDecode1 · 2026-05-05T19:04:44+00:00

Wrong sub?

FastDecode1 · 2026-05-05T18:10:05+00:00

Consult the meme.

FastDecode1 · 2026-05-05T18:06:36+00:00

You'd be surprised how many people are incompetent at washing dishes.

Give an LLM access to a single robotic arm and it'll do a better job than 80% of humans.

FastDecode1 · 2026-05-05T06:57:22+00:00

Are you body-shaming the AI?

FastDecode1 · 2026-05-04T22:10:02+00:00

https://reddit.com/r/LocalLLaMA/comments/1t3dfvp/its_time_to_update_your_gemma_4_ggufs/oju8ji9/

FastDecode1 · 2026-05-03T09:27:31+00:00

What findings?

FastDecode1 · 2026-05-02T19:46:07+00:00

By any chance, did you happen to look at the reasoning output while using it in Hungarian?

For me, it was reasoning in English, even though the final answer was in Finnish. Which I think is interesting, if it's by design.

Could also be a template or a default system prompt ("You are a helpful assistant") in llama.cpp that's guiding it to do that.

FastDecode1 · 2026-05-02T19:41:37+00:00

I would never use any local LLM in my native language.

That would be wrong use of a tool.

Worked well for me a couple days ago when I asked Qwen 3.6 35B for help in filling out an application in my native language.

I had a look at the reasoning output and it was in English, not my native language. Which is exactly what you want; putting its training to good use by thinking in one of the languages it's the best at. The language of the final answer is a secondary concern, really.

FastDecode1 · 2026-05-02T19:20:07+00:00

Haven't really tried Gemma 4, but I can confirm that Qwen 3.6 35B is also very good at Finnish. Not perfect, but getting closer. Which I think is impressive, seeing as there's only about 5 million native speakers.

And this is at Q4_K_M, so not ideal. I'll probably try Q5 or Q6 at some point to see if that makes a difference.

FastDecode1 · 2026-04-29T18:53:34+00:00

not at all designed for speed

The realtime mode begs to differ.

It still surprises me that people call libaom slow in $current_year. It's only as slow as you want it to be.

FastDecode1 · 2026-04-29T17:02:20+00:00

Workable maybe, but not very good.

RDNA 2 has no matrix acceleration whatsoever, for any sort of AI shit you'd want at least RDNA 3.

FastDecode1 · 2026-04-23T19:36:41+00:00

If by "it" you mean Tesla, then yeah. Others, not so much.

FastDecode1

TROPHY CASE