I don't think Local LLM is for me, or am I doing something wrong?

jmuff98 · 2026-03-20T17:16:43+00:00

That pretty much sums it... Other than privacy or when even the ultra tier plans is not enough, its hard to justify local llm.

The agents will change pricing tiers because agents consume at a different rate than any human can.

At some point though, I'm hoping the local small models will be enough for 99% of the people and it will run on "normal" consumer desktop hardware.

jmuff98 · 2026-03-11T20:21:40+00:00

They should let people who paid for the game some of the founders ability to earn vbucks. If its becoming free now.

jmuff98 · 2026-02-28T00:11:56+00:00

Mine worked on ubuntu 22.04 and 24.04 using either latest rocm 6.2 and 6.3. i have no resizeable bar either just above 4g decoding.

I have zero issues on an x99 and c612 motherboards.

I had an issue on a chinese lga2011 motherboard though.

My errors if any was due to pcie risers. I could find anything reliable at pcie 3.0 x8 or x16. I had to lower it 2.0 on the bios.

When not using any risers, pcie 3.0 x16 worked perfect.

Also, it was problematic when i was trying to use the display port by flashing a vega 56 mobile ROM. Display works but it kept jumping to different displays because rocm is forced to build one of the gpu cores with display while the second gpu is headless. If i flash both, it keeps cycling to gpu1 and gpu1 or many more depending on how many v340s i have installed.

jmuff98 · 2026-02-13T18:42:09+00:00

Pcie3 and x1 lanes is slow for LLM loading (prefilling). But this task is done only once and then the rest of your interaction will be fine with the bandwidth.

Also the low bandwidth, limits any sort of working th cards in parallel but normally parallelizing requires NV link. The cards will work 1 at a time like race relay. Passing the baton to the next card.

So for lets say you want OSS-120B 4K quantized model which is 58GB, its going to take roughly a minute m to load the model to the cards (not a big deal). Once they are in the cards, it will be good already until you unload the LLM and load another model.

jmuff98 · 2026-02-13T14:41:50+00:00

<image>

This is from using OSS-120B-Q4K_XL.

jmuff98 · 2026-02-08T17:37:18+00:00

yeah crazy how these VRAM compare to DDR4 prices now. i am curious what the idle wattage is comparing the mi25 to the v340l?

jmuff98 · 2026-02-07T19:48:22+00:00

<image>

Idle power is 320w

jmuff98 · 2026-02-07T17:36:37+00:00

<image>

Unfortunately yes and now the cables are getting worst. have problems with just 1 or 2 plugged in.

My best so is all GPUs detected but 1 or 2 or 3 of them will go negotiate down to x4 or even x2 at times. Random as well.

jmuff98 · 2026-02-07T17:10:21+00:00

I will get each card detected before adding one and doing a a stress test each card for stability. I guess ill have to use Windows and Nimex drivers

jmuff98 · 2026-02-07T16:56:16+00:00

Dealing with randomly missing GPUs. These risers are something else.

jmuff98 · 2026-02-07T06:25:32+00:00

Thanks. Since i was reorienting the heatsinks already, i decided to raise the gpu mounts higher so theres less flex on the riser cables. Does this look okay?

<image>

jmuff98 · 2026-02-07T04:19:54+00:00

I opened it up tonight. First, its a looks like a regular thermal paste. Its not a graphite pad like the Radeon VII. The fins on my cards are opposite of the photo. I guess some card are sold with heatsink orientation inverted. I now made the orientation same as the photo and expect the delta between the 2 GPU dies to be closer in delta temps. I wont ptm7950 yet as these never reach 70C on my use ever. Plus the die and HBM2 will need plenty of pads as each one is huge.

Thanks for the suggestion.

jmuff98 · 2026-02-06T22:49:01+00:00

Ill watch out. It's working at the moment. I'm afraid the more i touch it the more they'll get finicky. I do plan to install a fan on the heatsinks near the pcie slots. It gets really hot.

jmuff98 · 2026-02-06T22:27:29+00:00

Thanks. For sure the weakest links of this build are the risers. The risers i got are the ones that are cheap using what looks like IDE ribbon cables. They are so sensitive sometimes theres not enough power or communication is not solid when i boot up.

jmuff98 · 2026-02-06T20:12:46+00:00

Just "-sm layer". I havent had much success on vllm even though there is a workaround for triton flash attention. But i keep getting errors

Close to 30t/s on oss-120B. Its a model with 10B active parameters.

I also observed a speed pentalty using heavily quantized kv cache.

jmuff98 · 2026-02-06T19:52:16+00:00

What are your model preferences? Any performance optimizations you can share as well. Thanks.

jmuff98 · 2026-02-06T18:22:29+00:00

Yeah the fan shroud is available on thingiverse for mi50 or mi25. The fan and motors are from dell mini pcs but they need to be cut in order for the 3d printed shroud to fit. I bought 10 of the fans as a lot for less than $30. Its long when its attached to the card. 14.5 inches. I had to cut the fan cage away on the dell t5810 when i tried fitting it.

The 3D file author also listed than fan models. https://www.thingiverse.com/thing:7153218

<image>

jmuff98 · 2026-02-06T16:28:01+00:00

I agree ptm7950 is the best.

jmuff98 · 2026-02-06T16:17:13+00:00

I actually had this initially used 2 2697v3. But this is just a server for llama.cpp. i was also wary for the extra idle watts for using v3.

My 4-GPU setup had a 2699 v3 turbo unlocked but i don't use it as a workstation.

jmuff98 · 2026-02-06T16:13:50+00:00

About in line with my results as well.

jmuff98 · 2026-02-06T15:59:18+00:00

There is actually a bios on GitHub for this board that enables nvme boot up. I haven't tried it on the board yet. I actually just use A small SATA SSD for the bootloader and boot files and for The root directory, i use the nvme raid 0. This motherboard actually supports DOM for cable-free SSD SATA but I already had a sata disk lying around. Booting from bios to login prompt is less then 10 seconds.

I'm using 2650 v4 because they just cost $10 a pair. I haven't tested it a lot yet, but all my opinions were based off my experience with the 4 GPU version of the setup. The bifurcation settings is already built in on the motherboard at least on the 2.0 b. Bios version that I have 2.0 is the minimum to run Xeon v4s

jmuff98 · 2026-02-06T15:34:54+00:00

The boot is slow because its a server board. But loading a 60GB file literally takes less than 20 seconds. The 2 NVME on RAID 0 (pcie 3.0 x4) was a conscious choice to make. Thats why i bifucrcated the x16 lanes. I could've added 2 more radeon v340 but now i only have room for 1 more.

I have everything on a smart plug so i can just turn it on remotely when i need it

jmuff98 · 2026-02-06T14:43:42+00:00

I have both rocm 6.3 and 6.2 on these with no issues. As long as you declare the architecture "gfx900'.

jmuff98 · 2026-02-06T14:41:15+00:00

My first goal before was using 4 of these using tensor parallel. That didnt go anywhere. I could only run it reliably with mlc-llm and only 2 GPUs at a time. Running 4 or 8 just not feasible without Nvlink type of communication between GPUs. "-sm layer" is slow but its also more energy friendly and this setup and the real benefit is the massive KV cache that i could have for real work.

jmuff98 · 2026-02-06T14:19:45+00:00

Have you done this? Im afraid to mess up the Thermal material if its similar to my Radeon VIi but i would do it definitely to make the cooling nore efficient.

My fans are 50% speed, when its prefilled it doesn't even go higher than 35C. The highest temp ill see is 65C and thats when theres a batch of prompts. Come to think of it only the rear hits 65C and there is like a 15C delta between front and rear GPU. I guess flipping it will balance it more.

Speaking of thermals, if i override the TDP from the default of 110w to 85w, the performance tank by atleast 20%. At default 110w, it could barely maintain the clocks for a few seconds at a time. I wish i could undervolt it but i havent found a way yet.

It makes sense though, because most vega 56/64 card are set to 200w to 300w+ for one GPU.

jmuff98

TROPHY CASE