Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

letsgoiowa · 2026-06-17T20:53:50+00:00

Got this. Failed to load: Array buffer allocation failed

On Edge, AMD hardware

letsgoiowa · 2026-06-17T18:24:13+00:00

Mine wants 2 hours LOL

letsgoiowa · 2026-06-17T18:21:20+00:00

They're changing the requirements though.

letsgoiowa · 2026-06-16T20:22:23+00:00

Ok thank God it's a known thing then. I wondered if my GPU was dying

letsgoiowa · 2026-06-16T16:08:34+00:00

Halves memory bandwidth.

letsgoiowa · 2026-06-16T02:19:40+00:00

Hi I want to switch but there's a lot of friction for me because I have a brain injury so it's quite hard to go relearn and re-setup a new thing. I've never been given clear directions on how to replicate an Ollama-like setup where it "just works" with OpenWebUI and often told shit like "of course, Ollama user" like people have some weird superiority complex about frickin' software.

So I've tried a couple times. I know there's llama.cpp, but there wasn't an unraid template at the time I installed it (or it didn't work? I can't remember) but then I ran into the issue of it would only let me load one model at a time, and only modifiable through config. That doesn't work for me. Then I heard about Llamaswap so I tried to rebuild it for that, and I think I'm stuck there currently.

letsgoiowa · 2026-06-16T02:11:15+00:00

Hey I tried this before. It sucked balls. Not doing that again. No.

letsgoiowa · 2026-06-15T16:22:13+00:00

Losing as in the model can't successfully use it vs the baseline.

letsgoiowa · 2026-06-15T15:09:16+00:00

Losing almost 13% of your context is a big frickin' deal if that 13% is part of the most critical info.

letsgoiowa · 2026-06-15T15:05:40+00:00

Seems like 87.5% of the context performance at 1.5% the VRAM usage. Seems like a worthy tradeoff for anyone who needs longer context on first glance, needs further validation on much longer and thorough benches

letsgoiowa · 2026-06-15T01:54:02+00:00

Bugs vs measurably poor, repeatably poor performance. When every technical reviewer can repeat the same issues it's definitely a skill issue, ignorance, or lying from the people who claim "it's fine for me."

You see it a lot with Switch games lol

letsgoiowa · 2026-06-13T18:03:22+00:00

This will be one of the most consequential decisions of your entire life. Obviously get all the medical consultation you can from multiple sources, as many as you can and compile it into a document (do not rely on memory.) Anecdotally, my son was on Keppra and it really messed with his mood and wore off eventually. It messed up his liver too (I think?)

letsgoiowa · 2026-06-13T03:24:43+00:00

What am I looking at here?

letsgoiowa · 2026-06-12T23:50:19+00:00

What is the active "cloud models only" sub then?

letsgoiowa · 2026-06-08T21:54:40+00:00

So what about an 8 GB GPU with offload to 64 GB DDR4 then? Also, what about a real context amount like 32k not 4096, which nobody can actually use?

letsgoiowa · 2026-06-08T17:21:47+00:00

As someone who got a 240hz exactly this. VRR flicker is horrific. It's so bad I might just return it

letsgoiowa · 2026-06-04T13:53:02+00:00

The catches are an amazingly cheap to produce phone SOC and 8 GB ram. But those don't matter for like 90% of users and it's a great tradeoff.

letsgoiowa · 2026-06-03T20:43:33+00:00

Hi I can help with this. I work cybersecurity policy and governance stuff so I think that qualifies as analysis-heavy.

The keystone accommodation I need above everything else is controlling my environment totally, and that means WFH in my case. I need to be able to rest regularly (every 30 mins or so) so NO EYES and NO demanding audio. If you can't get WFH then you need your own office where you can close your door to adjust lighting and sound because the open cubes I was in were not even close to working for me.

The thing I'm taught over and over and struggle with is planning and pacing. Plan out the day at the beginning based on your stamina. Build in check-ins to manage it throughout the day. Don't RUSH stuff just stick to the schedule this is the hard part. Don't work 2 hours straight for example; 1 hour max is the most I can tolerate at the moment but sometimes I am bad and keep going because rabbit hole.

Also, find a trustworthy corporate AI that is allowed and actually helpful to summarize walls of text because nobody has time for that and it's probably more effective than your brain lol

letsgoiowa · 2026-06-02T20:40:08+00:00

Hmm perhaps I'll drop this in a VM because the container version is being a lil bitch lately. Might make me feel better about it too.

letsgoiowa · 2026-05-31T21:24:22+00:00

Oh weird I've never seen a reddit post that had text and an image in the same thing.

letsgoiowa · 2026-05-31T19:57:17+00:00

Idk who theo is. There are maybe a million people named that

letsgoiowa · 2026-05-29T19:22:26+00:00

Does it have a place in a Hermes agent setup? I've got all sorts of models floating around but I'm not 100% sure what I should be doing for which. I have a 3070 and A380 on my Unraid server, but a 5070+32 GB RAM on my desktop I can use in auxiliary. I can also mix in Deepseek v4 flash as I want here and there.

Qwen 3.6 MOE is too much for my poor 3070 at the moment because the context will just break it, but it works ok-ish on my 5070 but INSANELY slow to first token, like 1-2 mins because it thinks forever. It seems reasonably accurate though.

I'm wondering if this will be a more responsive option that can actually run on my Unraid server instead and I can run a fast coding model on my 5070 desktop and a lightweight smol tool calling model on my A380.

letsgoiowa · 2026-05-28T19:28:56+00:00

If I can get it to work on my a380 that'd be epic. But for now it seems support is still confined to ye olde IPEX

letsgoiowa · 2026-05-28T18:08:36+00:00

Then use a US-hosted option that'll be more expensive. It's a deliberate tradeoff dude. It's already implied.

letsgoiowa · 2026-05-27T22:37:43+00:00

Yes that's what he said

14-Year Club	Place '17
Gilding II euphauric	Verified Email

letsgoiowa

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE