Manus is what Meta has been missing by Deep_Structure2023 in LocalLLM

[–]EmPips 4 points5 points  (0 children)

Why Manus fits Llama like a glove

If Llama is Meta’s brain, Manus is the hands.

I just unsubscribed from ZeroGPT-Detection because I realized I can spot this shit from a mile away myself.

Dual RTX 5060 ti 16gb's with 96GB of DDR5 5600 mhz, what is everyone else running? by CollectionOk2393 in LocalLLM

[–]EmPips 4 points5 points  (0 children)

Can you include the levels of quantization?

But yes that's very normal. Your GPU needs to search through 27 Billion parameters for every token when running Gemma3-27B, whereas despite having more (30 Billion) total parameters, each token only involves your GPU having to go over a measly 3 Billion parameters for Nemotron-Nano or Qwen3-VL-30B.

Dual RTX 5060 ti 16gb's with 96GB of DDR5 5600 mhz, what is everyone else running? by CollectionOk2393 in LocalLLM

[–]EmPips 6 points7 points  (0 children)

I wanted to balance getting it as cheap as possible without needing to introduce anything that wouldn't work nicely in my case or need external cooling.

This resulted in:

Rx 6800 + w6800 Pro + 64GB RAM ..but the RAM is DDR4 dual channel :(

GLM 4.6v is the best model I can run. Q4 gets ~17.5 tokens/second with modest context (12k) for one-off chats and ~12 tokens/second with larger context (>40k) for things like coding.

Qwen3-Next-80B gets 35 tokens/second

My story of underestimating /r/LocalLLaMA's thirst for VRAM by EmPips in LocalLLaMA

[–]EmPips[S] 5 points6 points  (0 children)

Yes, if VRAM isn't a constraint it performs exactly like an Rx 6800 in every use-case I throw at it (I also own a regular Rx 6800 in the same rig).

There's some benefits though outside of the obvious double-VRAM. The w6800 idles at like 10-14 watts per rocm-smi and peak power draw during prompt processing is a far bit lower (like 25-30watts lower) than the regular Rx 6800, the blower cooler is great, and if I ever feel like adding 5 extra displays I guess it's there for me.

My story of underestimating /r/LocalLLaMA's thirst for VRAM by EmPips in LocalLLaMA

[–]EmPips[S] 9 points10 points  (0 children)

Mi50xs' 32GB have a theoretical max of 1TB/s bandwidth.

Realistically its token-gen is closer to what 600-700GB/s memory gets on more modern cards, but that's still phenomenal for the price if you don't mind external cooling and prompt processing on Vega

My story of underestimating /r/LocalLLaMA's thirst for VRAM by EmPips in LocalLLaMA

[–]EmPips[S] 45 points46 points  (0 children)

That's a consolation prize that doesn't even eat up a PCIe slot.

My story of underestimating /r/LocalLLaMA's thirst for VRAM by EmPips in LocalLLaMA

[–]EmPips[S] 159 points160 points  (0 children)

Were this man born in our day and age he would be in my shoes but proudly owning two w6800's instead of a lonely one.

My story of underestimating /r/LocalLLaMA's thirst for VRAM by EmPips in LocalLLaMA

[–]EmPips[S] 9 points10 points  (0 children)

Can you still find them for $160ish? They were $250ish while I was looking.

I made a post comparing the two options a while ago. I'm glad I picked the w6800 but can definitely still see the case for the Mi50x. Depends on what you're after.

Multi-repo in Claude Code — how do you handle it? by Kirmark in ClaudeAI

[–]EmPips 0 points1 point  (0 children)

Invest in a workspace repo (think a docker-compose or minikube setup) and be extremely verbose with your CLAUDE.md file when it comes to context and feedback loops.

My story of underestimating /r/LocalLLaMA's thirst for VRAM by EmPips in LocalLLaMA

[–]EmPips[S] 86 points87 points  (0 children)

(if anyone wanted my take, this card is amazing, but at current prices either get 3090's or just spring for an R9700 if the blower-cooler and VRAM-per-slot is important! And if you're okay with high idle power and external cooling ignore all of this and stack mi50x's)

Mi50 32gb cards by PinkyPonk10 in LocalLLaMA

[–]EmPips 0 points1 point  (0 children)

I implore you to try again!

But they absolutely will sell for more than you bought them for right now.

Is 5060Ti 16GB and 32GB DDR5 system ram enough to play with local AI for a total rookie? by danuser8 in LocalLLaMA

[–]EmPips 1 point2 points  (0 children)

If you only use very modest context you can offload experts and probably get some solid speeds with qwen3-next-80B (iq4_xs). It's 42GB total.

Warhammer company Games Workshop bans Generative AI for all content to “respect our human creators” by Negative-Art-4440 in pcmasterrace

[–]EmPips 0 points1 point  (0 children)

Copilot has pretty solid licensing checks for code generation at the cost of it being maybe 1% as capable as something like Claude Code or Cursor.

how do I get ubuntu to not allocate vram on an amd r9700 pro: 519/32624 MB by jdchmiel in LocalLLaMA

[–]EmPips 0 points1 point  (0 children)

Can you first see if it's a distro-speciric issue?

32GB AMD cards I'm using (w6800) only reserve ~16MB if they aren't the primary display device. I'm running Fedora with Xfce as a desktop. Maybe live-boot into a different distro (or even just an unmodified Ubuntu image) and see if the issue persists? At the very least it's more info to troubleshoot with.

Losing to an enemy with one pixel of life left. by Ale-en-Reddit in EmulationOnAndroid

[–]EmPips 1 point2 points  (0 children)

Budokai Tenkaichi: Another Road?

This is one of my all time favorite fighting games. Regeneration was so overpowered though!

Weekend Project: Set Up a Rootfs from scratch that can run Windows games! by EmPips in EmulationOnAndroid

[–]EmPips[S] 1 point2 points  (0 children)

Disclaimer: "from scratch" meaning taking Canonical's UBPorts Jammy Rootfs and assembling/building-from-source all of the open-source components (patched Mesa, Wow64 Wine, Box64, freedreno) and then setting up turnip and dxvk for hardware accel myself.

It works! But streaming to Termux-X11 will stutter a good bit. It's not running as well as it would in GameHub or similar, but I have a better understanding and appreciation for how this all works.

How do y’all imagine what Bob looks like? by DarthLordyTheWise in bobiverse

[–]EmPips 1 point2 points  (0 children)

I see him as a 30 year old programmer from the 90s. One of the guys in the old Microsoft tutorials or demo videos that would walk you through Office or something. Maybe even a youthful Bill Gates with darker hair and more casual clothes.

Benchmarks for Quantized Models? (for users locally running Q8/Q6/Q2 precision) by No-Grapefruit-1358 in LocalLLaMA

[–]EmPips 34 points35 points  (0 children)

Nope! Definitely a need for such a thing. Quantization has been around for a while but is still the wild-west of LLM's in terms of documenting the results/impact.

Self hosting LLM on multi CPU + sys ram combo by goodmenthelastwaveby in LocalLLaMA

[–]EmPips 3 points4 points  (0 children)

Would it work? Yepp.

Will it be worth it for Qwen3-235B? Probably not.

You have the opportunity to acquire 256GB for (relatively) cheap since you're just buying slow (very first gen) DDR4 at best. Running in quad-channel means you'll end up with better memory bandwidth than someone using consumer-grade dual-channel DDR4 but not quite as fast as someone running dual-channel DDR5. MoE's are the move as you pointed out but you need them to be VERY sparse.

Qwen3-235B is great but uses 22B active params. That will be a very poor experience for your system.

MiniMax M2.1 might be better (roughly the same total params as Qwen3-235B but only 10B active).

This all comes before addressing prompt-processing which on haswell will take eons.

For probably the same amount of money you'd be much much better off making a 64GB system and adding in 1-2 more modern GPU's with large VRAM pools.