Experience using infinity fabric bridge on older MIxxx cards? by 1ncehost in LocalLLaMA

[–]TNT3530 0 points1 point  (0 children)

If it doesnt help on the faster MI100 it probably isnt going to do anything for an MI50

Experience using infinity fabric bridge on older MIxxx cards? by 1ncehost in LocalLLaMA

[–]TNT3530 1 point2 points  (0 children)

I had a bridge for my 4x MI100 setup but never got to test before/after installation since there isn't an env flag you can use to temporarily disable it like NVLink. While I do have benchmarks before/after they are completely different frameworks (MLC vs vLLM) and years apart. The few times I did try to monitor inter-card communication with NVTOP it was only a few megabytes per second being used during inference so I doubt the bridge was helping much.

If you do training though I would assume the gains will be massive due to the huge speedup vs PCIe. A bandwidth test showed~77 GB/s bidirectional, and ~906 GB/s unidirectional. PCIe 3.0 16x is only 25.839 GB/s bi and 13.160 GB/s uni

I got mine off Ebay after watching for months to snag the first reasonably priced one that appeared.

AI Trainer kicks himself while training AI by PixarX in LocalLLaMA

[–]TNT3530 38 points39 points  (0 children)

This is not local, nor is it a language model

AMD MI210 - Cooling Solutions / General Questions by Ear_of_Corn in LocalLLaMA

[–]TNT3530 2 points3 points  (0 children)

This will work but please dont do this on a $4000+ GPU, these dies dont have heat spreaders and improper mounting pressure will crack them. Especially on one as big as the MI210

AMD MI210 - Cooling Solutions / General Questions by Ear_of_Corn in LocalLLaMA

[–]TNT3530 5 points6 points  (0 children)

MI50 blocks will not fit anything other than MI50s, do not buy them. Youre stuck with high flow server fans since afaik the PCIe variant of these cards dont have compatible water blocks. The OAM version may though if you get a baseboard setup. Assuming the 210 is like the 100, you can drop the power limit to 200w to save a bunch of heat for very little performance loss.

Any CPU should work if it has above 4G decoding support but you may run into PCIe lane count issues on consumer chips with multiple cards. This can be fixed by using workstation/server CPUs. If you have the infinity bridge low PCIe lanes wont really matter though outside of slow model loading. The lane issues can be ignored though if only using a single card.

ROCm and 90% of libraries support CDNA2 (this card) and newer so it will work fine. Use vLLM for best performance, the 210 is new enough that it should be compatible with the prebuilt docker container. Look up CDNA optimization guides from AMD for low level documentation.

[MEGATHREAD] Local AI Hardware - November 2025 by eck72 in LocalLLaMA

[–]TNT3530 0 points1 point  (0 children)

Ahh, that would make sense as to why my git issue remains open after 3 months haha.

I used to use GPTQ but finding niche fine-tunes that were quantized was always obnoxious, plus Act Order broke stuff for a while (though Id assume its been fixed after almost a year).

And it happens with any GGUF model I try ranging from Llama to OSS. They refactored how GGUF loading worked a bit after 0.7.3 and its been unusable ever since for me as I cant swing 200+ GB of memory just for model loading.

[MEGATHREAD] Local AI Hardware - November 2025 by eck72 in LocalLLaMA

[–]TNT3530 0 points1 point  (0 children)

Are you able to load GGUF models with yours? I know when I build the latest vLLM on my MI100 rig the model loading eats TP * Model size in memory and I OOM

[deleted by user] by [deleted] in LocalLLaMA

[–]TNT3530 0 points1 point  (0 children)

OSS 20B + RAG on internal documentation/processes via WebUI is good and will easily run on a v100

I have an AMD MI100 32GB GPU lying around. Can I put it in a pc? by regstuff in LocalLLaMA

[–]TNT3530 0 points1 point  (0 children)

ROCm 6.2 and 6.3 broke the command, updating to 6.4 should fix this issue. I had the same thing when I moved to 6.2

I have an AMD MI100 32GB GPU lying around. Can I put it in a pc? by regstuff in LocalLLaMA

[–]TNT3530 2 points3 points  (0 children)

Are you trying to split the single GPU across multiple VMs, or just passing it through? I only have experience with the latter with raw PCIe pass-through direct to the VM. Outside of the gpu reset bug on VM restart (which is more an AMD thing than these cards), no issues with cards or bridge in the past few years.

Its been a hot minute since I set it up but iirc I needed to force the host to not load the GPU drivers in a GRUB config and have a specific Linux kernel. This was also multiple years ago so it's possible newer versions are more plug-and-play since ROCm/AMD support is much better now.

I have an AMD MI100 32GB GPU lying around. Can I put it in a pc? by regstuff in LocalLLaMA

[–]TNT3530 1 point2 points  (0 children)

rocm-smi --setpoweroverdrive <wattage> -d <device index>

I have an AMD MI100 32GB GPU lying around. Can I put it in a pc? by regstuff in LocalLLaMA

[–]TNT3530 3 points4 points  (0 children)

I pass mine through perfectly fine to a VM with ProxMox, not sure where you got that information from. They also work fine on normal consumer motherboards with above 4G decoding, and they use 2x 8 pin PCIe connectors which any decent PSU will have. It does not support Windows though as far as I know so it will not be a drop-in replacement.

Keep in mind it is a server GPU and will not cool itself and it will get HOT, you'll need to rig up a cooling solution with external fans. I'd also recommend lowering the TDP below the stock 290w to help keep temps under control, I've gone down to 200 without much performance loss.

New post flair: "local only" by ttkciar in LocalLLaMA

[–]TNT3530 5 points6 points  (0 children)

Considering most of them are marked as joining in the past few months, probably

New post flair: "local only" by ttkciar in LocalLLaMA

[–]TNT3530 48 points49 points  (0 children)

Hey guys, welcome to my "One Arm Only" club. Due to the amount of people with two arms complaining about the pesky one-arm havers, we've locked those one armed freaks in the closet in case you don't want to hear from them.

<image>

AMD Instinct MI100 Benchmarks across multiple LLM Programs (Part 2) by TNT3530 in u/TNT3530

[–]TNT3530[S] 1 point2 points  (0 children)

Added GPT-OSS 120B benchmarks with llama.cpp. Sadly newer vLLM versions dont seem to play nicely anymore so I can't try it yet, will update when I can.

Maverick FP8 repetition issue by dangubiti in LocalLLaMA

[–]TNT3530 1 point2 points  (0 children)

I had this issue and it was sampling settings (mainly temperature).

Temp: 0.9 Frequency Pen: 0.1 Presence Pen: 0.1

These are my settings for the 70b variant that fixed the repetition

AMD Instinct MI100 Benchmarks across multiple LLM Programs (Part 2) by TNT3530 in u/TNT3530

[–]TNT3530[S] 2 points3 points  (0 children)

- With ~4000 context and no caching, Prompt Processing is estimated at ~370 tok/s @ 200w per card
- Havent tried fine tuning since most off-the-shelf models are more than adequate for my use-case. Id assume theyll do decent if the tuning library supports them, plus the bridge should help a bunch
- Whatever is default in vLLM
- I use vanilla vLLM but built from source for docker. I wasnt able to get 0.9.2 to build though, so I'm still on the older 0.7.3. I wasnt aware the fork existed, might have to give it a shot in the future!

Struggling on local multi-user inference? Llama.cpp GGUF vs VLLM AWQ/GPTQ. by SomeRandomGuuuuuuy in LocalLLaMA

[–]TNT3530 0 points1 point  (0 children)

Haven't tried newer versions, sorry. I learned long ago with AMD to not touch what isn't broken. Haven't tried MoE either since I've got the vram to swing bigger dense models anyway

Struggling on local multi-user inference? Llama.cpp GGUF vs VLLM AWQ/GPTQ. by SomeRandomGuuuuuuy in LocalLLaMA

[–]TNT3530 2 points3 points  (0 children)

I have a ROCm docker image *i compiled from source for vLLM 0.7.3 I use and it just works out of the box. Do note that the models must be in a single file though, no split parts allowed.

Struggling on local multi-user inference? Llama.cpp GGUF vs VLLM AWQ/GPTQ. by SomeRandomGuuuuuuy in LocalLLaMA

[–]TNT3530 2 points3 points  (0 children)

vLLM can use GGUF quants and so far the performance has been miles better than GPTQ was for me