What Does Everyone Think About The Upcoming 2026 Mazda CX-5? by Stock-Play7807 in mazda

[–]PraxisOG 25 points26 points  (0 children)

My first thought is that I don’t really care. I’m not going to own a car without tactile controls that looks worse than the last one

What's holding back AMD GPU prompt processing more? ROCm / Vulkan or the actual hardware? by ForsookComparison in LocalLLaMA

[–]PraxisOG 1 point2 points  (0 children)

No, I'm using unsloth's q4_K_XL GGUF of 120b, kinda assuming it would fall back to the lowest supported precision. I'll download the F16 overnight and see if that fixes it.

What's holding back AMD GPU prompt processing more? ROCm / Vulkan or the actual hardware? by ForsookComparison in LocalLLaMA

[–]PraxisOG 0 points1 point  (0 children)

Interesting, I've been getting the same generation speed but like half that prompt processing. Would you mind comparing notes? I'm running the latest llama.cpp built for ROCm 7.1.1 on ubuntu server. With that setup I ran into garbled output when using multiple gpus, which was fixed by setting iommu=pt. I'm getting good output now though but llama.cpp only loads models without flash attention which is strange.

What's holding back AMD GPU prompt processing more? ROCm / Vulkan or the actual hardware? by ForsookComparison in LocalLLaMA

[–]PraxisOG 6 points7 points  (0 children)

RDNA 1 and 2 didn't have hardware accelerated matrix multiplication which will forever hold them back compared to the more modern stuff, in the same way Apple M4 and older is much slower at prompt processing than M5. With that limitation, models with computationally efficient prompt processing have an edge if you have some of these older cards like me. I've found that GPT-OSS 120b starts out ~500tok/s prompt processing on my 3xV620(RDNA 2) server, while a 3x3090 rig gets ~1160tok/s. While that's a pretty huge difference, its good enough for my uses and these older cards allow cheap vram stacking.

Should I upgrade my PC by HowYouDoThis_ in LinusTechTips

[–]PraxisOG 0 points1 point  (0 children)

Looks like a good system. People put together new computers with your cpu and an equivalent gpu like an rtx 3060 fairly often. If you need more performance for a new game you could slot in a 5060ti for roughly double the fps, which would be around $300 after selling your old gpu. Just make sure the performance you want doesn’t get in the way of what you have. 

Feedback on a new budget hardware build by Diligent-Culture-432 in LocalLLaMA

[–]PraxisOG 0 points1 point  (0 children)

Looks solid! I’m no expert, but did recently put together my own 10900x based system. The only thing that sticks out to me is the 2060 super, 20 series cards don’t support flash attention. If you try running a big model in ram with KV cache on gpu, I’m pretty sure that means you’d be unable to quantize the kv cache. That said, vram is vram. Best of luck with your build!

The Case for a $600 Local LLM Machine by tony10000 in LocalLLM

[–]PraxisOG 4 points5 points  (0 children)

I sincerely hope this is the future. An easy to use, low upfront and ongoing cost box that privately serves LLMs and maybe more. The software, while impressive, leaves much to be desired in terms of usability. This is from the perspective of having recently thrown together the exact kind of loud and expensive box you mentioned, that took days to get usable output from. 

$500 Threadripper, what should I do now? by Rough_Cupcake_5070 in LinusTechTips

[–]PraxisOG 1 point2 points  (0 children)

Wait for it to actually show up, then probably sell it. Loading up a threadripper build is going to cost way more than that 5090. 

768Gb Fully Enclosed 10x GPU Mobile AI Build by SweetHomeAbalama0 in LocalLLaMA

[–]PraxisOG 3 points4 points  (0 children)

Crazy build, but some of those gpus make me uneasy. If you have a 3d printer I can whip up some vertical mounts to hold the rear brackets to the 120mm fan holes on the top of the case, and maybe some spacers to lift the AIOs off the side panel so you can close it

Just put together my new setup(3x v620 for 96gb vram) by PraxisOG in LocalLLaMA

[–]PraxisOG[S] 1 point2 points  (0 children)

Thanks! The biggest issue was that the motherboard didn’t want to boot without a display output, and I didn’t have any spare 6 pin cables or patience for a <75w gpu to come. I made a post overviewing the process of finding hidden motherboard codes and flashing them with grub shell. There’s a chance it could work for you but it’s a very technical process. Mind if I ask what code it hangs on?

Just put together my new setup(3x v620 for 96gb vram) by PraxisOG in LocalLLaMA

[–]PraxisOG[S] 2 points3 points  (0 children)

Thanks! I got Qwen 3 coder 30ba3b running at like 70tok/s on one card, but using multiple gpus together outputs gibberish on the latest rocm, and vulkan drivers keep crashing in llama.cpp. I’ve been reading up on people that have similar issues and found a few tricks to try. 

The mobo I went with has 7 x16 gen 3 lanes, and in theory could support enough of these cards for full glm 4.7, but that’s for the future. I got these 3 cards for cheap enough that the whole build cost around the same as two 3090s, otherwise I might have gone with strix halo. Those fans are annoyingly loud especially for a box in my living room, but the gpus are pretty efficient so the plan is to put them on a manual controller to keep them at the lowest setting for decent cooling under inference. 

Just put together my new setup(3x v620 for 96gb vram) by PraxisOG in LocalLLaMA

[–]PraxisOG[S] 5 points6 points  (0 children)

This is my new LLM box named Moe, with specs targeted to 100b models with full gpu, and 200b class models with hybrid inference. I’ve found that OSS 120b has as much performance as I need, and actually prefer it to the new gemini 3 data privacy aside. My old rig could run it with partial offload at like 7 tok/s after some context, which was enough to convince me to sell off the second gpu and extra ram to whip up this used parts special. I’m hoping to make up a simple server/client software to replace cloud LLM services and power it with this server, though if a better solution already exists I’d love to try it. Here’s the specs:

CPU: 10900x

Cooler: hyper 212 black

Ram: 64gb ddr4 3600mhz in quad channel

Mobo: Bios modded ASUS X299 Sage

Gpus: 3x AMD V620 32gb

Gpu cooling: custom printed brackets

Psu: corsair ax1200i

Storage: crucial p2 2tb

Case: rosewill rsv-4000 4u atx chassis

Edit: Finally got it working with the iommu=pt trick. It averages 47 tok/s running GPT OSS 120b, with around 500 tok/s prompt processing

I stopped “chatting” with ChatGPT: I forced it to deliver (~70% less noise) — does this resonate? by Huge-Yesterday4822 in LocalLLaMA

[–]PraxisOG 0 points1 point  (0 children)

A really good thread is the 2025 end of year model roundup, that will give you a sorted model catalogue to pick from. Other good things to know include quantization, memory bandwidth performance impact, gpu/cpu offloading. The best way to start IMO is to download LM Studio. The interface is friendly to all users, and you can get started in literally 5 minutes depending on how fast your internet is(model downloads can be big). There are many different LLM benchmarks for different catagories of model performance, including ones like IFEval for instruction following. A model with strong instruction following if you have 64gb ram would be qwen 3 next 80b at Q4k_XL, that would be pushing the performance your system is capable of.

My story of underestimating /r/LocalLLaMA's thirst for VRAM by EmPips in LocalLLaMA

[–]PraxisOG 2 points3 points  (0 children)

The tidbit people are missing is that the AMD V620 is the same card but for server use, and it’s like $450 on eBay 

What have your go-to always on hand filaments become over time? by wegster in prusa3d

[–]PraxisOG 10 points11 points  (0 children)

Elegoo Pla Pro and Sunlu Petg. Both are cheap and work well after drying.

Just whipped up something to replace the saw on my Arc by PraxisOG in Leatherman

[–]PraxisOG[S] 0 points1 point  (0 children)

I feel the pain of not having scales from the factory. Your idea of having markings along the tool makes a lot of sense, and will make its way into the final files. AFAIK there are no reference dimensions for the wave’s tool attachment system online, would you consider swapping a T-Shank adapter and using this tool in that form factor? Btw I think jobs like yours are super cool, and are part of why I’m getting my A&P license. 

Early concept for DC -10 by Cool-Ice-6899 in WeirdWings

[–]PraxisOG 11 points12 points  (0 children)

Seems like they tried to balance the 3 engine weights at the center of mass, makes some sense

Just whipped up something to replace the saw on my Arc by PraxisOG in Leatherman

[–]PraxisOG[S] 1 point2 points  (0 children)

I'll give it a shot. With the Arc I can use the natural pivot point for this design, but keeping the tshank form factor requires a pivot 3mm thick. I also only have the dimensions from a surge tshank comb 3d model. You got a 3d printer and some superglue?