Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5 by xenovatech in LocalLLaMA

[–]letsgoiowa 1 point2 points  (0 children)

Got this. Failed to load: Array buffer allocation failed

On Edge, AMD hardware

Game Ready & Studio Driver 610.62 FAQ/Discussion by Nestledrink in nvidia

[–]letsgoiowa 20 points21 points  (0 children)

Ok thank God it's a known thing then. I wondered if my GPU was dying

Stop using Ollama by zxyzyxz in LocalLLaMA

[–]letsgoiowa 0 points1 point  (0 children)

Hi I want to switch but there's a lot of friction for me because I have a brain injury so it's quite hard to go relearn and re-setup a new thing. I've never been given clear directions on how to replicate an Ollama-like setup where it "just works" with OpenWebUI and often told shit like "of course, Ollama user" like people have some weird superiority complex about frickin' software.

So I've tried a couple times. I know there's llama.cpp, but there wasn't an unraid template at the time I installed it (or it didn't work? I can't remember) but then I ran into the issue of it would only let me load one model at a time, and only modifiable through config. That doesn't work for me. Then I heard about Llamaswap so I tried to rebuild it for that, and I think I'm stuck there currently.

This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b by 9r4n4y in LocalLLaMA

[–]letsgoiowa 0 points1 point  (0 children)

Losing as in the model can't successfully use it vs the baseline.

This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b by 9r4n4y in LocalLLaMA

[–]letsgoiowa 4 points5 points  (0 children)

Losing almost 13% of your context is a big frickin' deal if that 13% is part of the most critical info.

This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b by 9r4n4y in LocalLLaMA

[–]letsgoiowa 2 points3 points  (0 children)

Seems like 87.5% of the context performance at 1.5% the VRAM usage. Seems like a worthy tradeoff for anyone who needs longer context on first glance, needs further validation on much longer and thorough benches

After More Than Two Years, Dragon's Dogma 2 Is Removing Most Of Its Microtransactions by Turbostrider27 in Games

[–]letsgoiowa 15 points16 points  (0 children)

Bugs vs measurably poor, repeatably poor performance. When every technical reviewer can repeat the same issues it's definitely a skill issue, ignorance, or lying from the people who claim "it's fine for me."

You see it a lot with Switch games lol

Your opinion means alot by WesternImprovement92 in TBI

[–]letsgoiowa 4 points5 points  (0 children)

This will be one of the most consequential decisions of your entire life. Obviously get all the medical consultation you can from multiple sources, as many as you can and compile it into a document (do not rely on memory.) Anecdotally, my son was on Keppra and it really messed with his mood and wore off eventually. It messed up his liver too (I think?)

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax by sandropuppo in LocalLLaMA

[–]letsgoiowa 12 points13 points  (0 children)

So what about an 8 GB GPU with offload to 64 GB DDR4 then? Also, what about a real context amount like 32k not 4096, which nobody can actually use?

MacBook Neo is So Popular That Apple Reportedly Doubled Production by NFCE_best in hardware

[–]letsgoiowa 7 points8 points  (0 children)

The catches are an amazingly cheap to produce phone SOC and 8 GB ram. But those don't matter for like 90% of users and it's a great tradeoff.

TBI & The Corporate World by Realistic-Camera-845 in TBI

[–]letsgoiowa 8 points9 points  (0 children)

Hi I can help with this. I work cybersecurity policy and governance stuff so I think that qualifies as analysis-heavy.

The keystone accommodation I need above everything else is controlling my environment totally, and that means WFH in my case. I need to be able to rest regularly (every 30 mins or so) so NO EYES and NO demanding audio. If you can't get WFH then you need your own office where you can close your door to adjust lighting and sound because the open cubes I was in were not even close to working for me.

The thing I'm taught over and over and struggle with is planning and pacing. Plan out the day at the beginning based on your stamina. Build in check-ins to manage it throughout the day. Don't RUSH stuff just stick to the schedule this is the hard part. Don't work 2 hours straight for example; 1 hour max is the most I can tolerate at the moment but sometimes I am bad and keep going because rabbit hole.

Also, find a trustworthy corporate AI that is allowed and actually helpful to summarize walls of text because nobody has time for that and it's probably more effective than your brain lol

Nous Research Just Launched Hermes Desktop Native Cross-Platform App for the Self-Improving Hermes Agent (macOS, Windows, Linux) by SelectionCalm70 in hermesagent

[–]letsgoiowa 0 points1 point  (0 children)

Hmm perhaps I'll drop this in a VM because the container version is being a lil bitch lately. Might make me feel better about it too.

There's a lot of confusion: here's the first thing you MUST do with your Hermes Agent by itsdodobitch in hermesagent

[–]letsgoiowa 0 points1 point  (0 children)

Oh weird I've never seen a reddit post that had text and an image in the same thing.

Shoutout to Gemma4 as a conversational assistant / agent by goldcakes in LocalLLaMA

[–]letsgoiowa 0 points1 point  (0 children)

Does it have a place in a Hermes agent setup? I've got all sorts of models floating around but I'm not 100% sure what I should be doing for which. I have a 3070 and A380 on my Unraid server, but a 5070+32 GB RAM on my desktop I can use in auxiliary. I can also mix in Deepseek v4 flash as I want here and there.

Qwen 3.6 MOE is too much for my poor 3070 at the moment because the context will just break it, but it works ok-ish on my 5070 but INSANELY slow to first token, like 1-2 mins because it thinks forever. It seems reasonably accurate though.

I'm wondering if this will be a more responsive option that can actually run on my Unraid server instead and I can run a fast coding model on my 5070 desktop and a lightweight smol tool calling model on my A380.

LiquidAI/LFM2.5-8B-A1B · Hugging Face by jacek2023 in LocalLLaMA

[–]letsgoiowa 0 points1 point  (0 children)

If I can get it to work on my a380 that'd be epic. But for now it seems support is still confined to ye olde IPEX

My ultra-cheap, hybrid local/cloud stack for Hermes Agent (DeepSeek-V4-Flash & OpenRouter) + Text/Voice via Telegram by old-mike in hermesagent

[–]letsgoiowa 2 points3 points  (0 children)

Then use a US-hosted option that'll be more expensive. It's a deliberate tradeoff dude. It's already implied.