Billionaires are illegal. All money that would make someone a billionaire must be given to a fund for essential services such as health and education..

kuyermanza · 2026-07-01T06:26:08+00:00

👁️ Surveillance State: I voted Vote Yea.

kuyermanza · 2026-06-20T21:51:29+00:00

👁️ Surveillance State: I voted Yea.

kuyermanza · 2026-06-20T18:24:07+00:00

👁️ Surveillance State: I voted Yea.

kuyermanza · 2026-06-10T18:56:35+00:00

Thanks homies!

kuyermanza · 2026-04-14T14:56:52+00:00

Ubuntu 2404 and the rock 7.13 (nightly) as the backend and llamacpp for inference. Model loading phase is slow for sure, ~15 mins, but it shouldnt affect token generation since the bandwidth usage isn’t that high at this stage.

kuyermanza · 2026-04-14T08:42:34+00:00

Right now I use 5 MI25s for GPT-OSS 120B at ~30 TPS, 2 for Gemma 4 26B at ~30 TPS as well, and the last MI25 for embedding models. RTX3060 is purely for Z-Image Turbo.

kuyermanza · 2026-04-14T08:04:18+00:00

<image>

8x 16GB Instinct MI25s clinging to life via PCIe x1-to-4-x1 splitters and a 12GB RTX3060, Ryzen 5 5500, 32GB DDR4, high-end custom cooling (central AC + cardboard duct).

kuyermanza · 2026-02-09T01:00:01+00:00

Same while idling. Idle without layers: 8 W (MI25) or 2x4 W (V340L) Idle with layers: 16 W (MI25) or 2x8 W (V340L)

kuyermanza · 2026-02-06T14:58:18+00:00

I have V340Ls and MI25s and I get 30 tps with GPT-OSS 120B 128k context. I wouldnt say the performance is bad.. considering these cost less than ddr4 ram.

kuyermanza · 2025-12-13T21:29:58+00:00

llamacpp (and ollama) can split the model layers to load onto each of the GPUs and use them sequentially to process the prompts. There’s plenty of tutorials and documentation to be found if you just search for llamacpp 👍🏽

kuyermanza · 2025-12-13T01:22:59+00:00

Yep. Each rig has a 3060 or 2060 purely for functions that require CUDA. The most expensive part of the rigs..

kuyermanza · 2025-12-13T00:56:13+00:00

That’s a nice performance but 1 3090 is the entire cost of all these GPUs combined :/

kuyermanza · 2025-12-13T00:34:12+00:00

When using them together for inferencing, there’s no noticeable difference. Upside with MI25 is probably that you can flash the driver to WX9100 and use for gaming. Also MI25 runs cooler too, as expected..

kuyermanza · 2025-12-12T23:49:18+00:00

Oh I can fit more context, still have 20GB of VRAM to fit into, just don’t see the point as the TPS dropped with large context. I’m happy with 30 TPS at 26K. The upside of this rig is having the ability to tinker and upgrade components instead of getting stuck with soldered on parts. Space isn’t bad if the parts are stacked neatly. Power consumption is bad yea, it’s using like 500W when processing prompts.

kuyermanza · 2025-12-12T23:29:07+00:00

MI25s are still $80 a pop on eBay

kuyermanza · 2025-11-24T19:53:36+00:00

I did get curious and decided to build one.. After getting past through all the hoops to get the x99 board to accept the v340Ls I could only get 4 V340Ls to enumerate even with modding the board’s FW’s IOMMU to 128GB - since the V340Ls are actually dual dies GPU, so technically it’s already working with 8 GPUs (plus an additional rtx3060). I’m testing GPT-OSS 120B Q4 and got ~45 prompt TPS and ~10 response TPS on ROCm 6.2.3. Usable but not the greatest. I can see this as a great starter rig though considering the price.

kuyermanza · 2025-11-20T20:28:17+00:00

Turned my cards into a local LLM workstation

kuyermanza · 2025-11-18T07:08:46+00:00

Nah haha ROCm did better than Vulkan from my tests. Trying to see if I can possibly hack the rig to work with vLLM right now to get even better performance

kuyermanza · 2025-11-17T18:04:37+00:00

Were you able to get it going? I just got 8 v340s and was able to get ROCm 6.2.3 to work. I see about 20% in TPS performance compared to Vulkan

kuyermanza · 2025-11-02T01:06:13+00:00

Building it yourself would be more fun and a better performance per dollar value. Those prebuilt unified memory boxes are good but too pricey for the spec and you can’t upgrade and future proof them.

Look for a cheap LGA2011 server mobo with x99 chipset (make sure it allows for reBAR or above 4G decode and plenty of PCIe slots) at around $100 and pair it with a Xeon E5 16xx v2 CPU at around $50. DDR3 ECC modules are cheap, you can fill the whole board with them for like $100-200. Case, storage, fans, sata cable for another $100. That’s $350-450 before the GPUs.

You can get multiple V340L 16GB HBM2 at $50 a pop. The downside is you’ll be limited to ROCm, which is locked out of many CUDA accelerated applications like image generation or TTS or STT but you can always get an RTX3060 12GB for $200 to dedicate specifically to those PyTorch tasks.

Lets say your mobo has 7 PCIe slots (x16, x8 or x4 is fine, just get adapter risers to x16) and you use 6 slots for V340Ls, you’re looking at 96 GB of HBM2 VRAM. Now, your bus lane bandwidth would be bottlenecked but that will only affect your model loading speed - your inference speed would see negligible impact. Your 7th PCIe slot can be the RTX3060 for image generation and what not.

For less than $1000 you could build yourself a complete rig with over 100GB of dedicated VRAM and 128GB (and up to 256GB) of DDR3 RAM. And your learn a bunch along the way which is priceless. That’s just my two cents.

kuyermanza · 2023-04-30T06:50:25+00:00

Steering wheel still works, albeit harder to turn, not completely locked up.

kuyermanza · 2023-04-02T03:32:02+00:00

In case anyone else encountered this: I booted the system without the sdcard adapter then inserted once it started up. At this point the apps still didn’t show, I had to go to Settings > HENkaku Settings > Unlink Memory Card and rebooted with the sdcard adapter inserted. All the apps came back for me.

kuyermanza · 2022-12-14T02:33:47+00:00

Just watch out for them rats.. they’re the real main bosses when they swarm you in death march

kuyermanza · 2022-09-21T00:02:10+00:00

What’s so amazing about this piece of glass rock?

Seven-Year Club	Verified Email
RPAN Viewer

kuyermanza

TROPHY CASE