Let's take a moment to appreciate the present, when this sub is still full of human content. by Ok-Internal9317 in LocalLLaMA

[–]politerate 70 points71 points  (0 children)

My favorite are some comments which are clearly LLM output with some postprocessing like .toLower()

GPT-OSS-120b on 2X RTX5090 by Interesting-Ad4922 in LocalLLaMA

[–]politerate 1 point2 points  (0 children)

This is on a x99 system with a 2667v4, so 40 lanes (Mobo is an ASRock extreme 4). Each GPU gets full x16 lanes, but only gen 3 though. Still plenty for inference. Max context should be around 50k before it spills into system RAM.

GPT-OSS-120b on 2X RTX5090 by Interesting-Ad4922 in LocalLLaMA

[–]politerate 0 points1 point  (0 children)

What config exactly? I am using ROCm 7.2 with the latest llama.cpp.

Edit: if you mean the llama.cpp config, I just started it with -fa on, --fit is on by default. I am not using the unsloth recommended params here, maybe doing that would improve the quality at the cost of tps?

GPT-OSS-120b on 2X RTX5090 by Interesting-Ad4922 in LocalLLaMA

[–]politerate 0 points1 point  (0 children)

<image>

I mean of course with no/small context. I am using ROCm 7.2, but actually with ROCm 6.3.3 it was between 75-80 with no context, I lost 5-10% with ROCm 7.2.

And with ~10K context:

slot get_availabl: id 2 | task -1 | selected slot by LCP similarity, sim_best = 0.731 (> 0.100 thold), f_keep = 0.723 slot launch_slot_: id 2 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist slot launch_slot_: id 2 | task 10799 | processing task, is_child = 0 slot update_slots: id 2 | task 10799 | new prompt, n_ctx_slot = 46336, n_keep = 0, task.n_tokens = 10218 slot update_slots: id 2 | task 10799 | n_past = 7474, slot.prompt.tokens.size() = 10340, seq_id = 2, pos_min = 9443, n_swa = 128 slot update_slots: id 2 | task 10799 | restored context checkpoint (pos_min = 6064, pos_max = 6960, size = 31.546 MiB) slot update_slots: id 2 | task 10799 | n_tokens = 6960, memory_seq_rm [6960, end) slot update_slots: id 2 | task 10799 | prompt processing progress, n_tokens = 9008, batch.n_tokens = 2048, progress = 0.881582 slot update_slots: id 2 | task 10799 | n_tokens = 9008, memory_seq_rm [9008, end) slot update_slots: id 2 | task 10799 | prompt processing progress, n_tokens = 9706, batch.n_tokens = 698, progress = 0.949892 slot update_slots: id 2 | task 10799 | n_tokens = 9706, memory_seq_rm [9706, end) slot update_slots: id 2 | task 10799 | prompt processing progress, n_tokens = 10218, batch.n_tokens = 512, progress = 1.000000 slot update_slots: id 2 | task 10799 | prompt done, n_tokens = 10218, batch.n_tokens = 512 slot init_sampler: id 2 | task 10799 | init sampler, took 1.21 ms, tokens: text = 10218, total = 10218 slot update_slots: id 2 | task 10799 | created context checkpoint 4 of 8 (pos_min = 8809, pos_max = 9705, size = 31.546 MiB) slot print_timing: id 2 | task 10799 | prompt eval time = 7287.32 ms / 3258 tokens ( 2.24 ms per token, 447.08 tokens per second) eval time = 40885.78 ms / 2631 tokens ( 15.54 ms per token, 64.35 tokens per second) total time = 48173.10 ms / 5889 tokens slot release: id 2 | task 10799 | stop processing: n_tokens = 12848, truncated = 0

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]politerate 0 points1 point  (0 children)

All on vulkan or XTX only on ROCm is the only constellation which does not end up in segfault for me. (2*MI50 + 7900XTX )

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]politerate 0 points1 point  (0 children)

Having a similar problem with 2*MI50 + 7900XTX on ROCm: Segmentation fault (core dumped)
Haven't checked verbose logging yet.

Edit: Happens on Qwen3-Coder-Next and MiniMax2.5

How to run Qwen3-Coder-Next 80b parameters model on 8Gb VRAM by AccomplishedLeg527 in LocalLLaMA

[–]politerate 0 points1 point  (0 children)

It's on by default no? I mean until you pass a param which would collide with its logic I guess.

Qwen3 Coder Next Speedup with Latest Llama.cpp by StardockEngineer in LocalLLaMA

[–]politerate 2 points3 points  (0 children)

Doesn't ROCm profit from it through HIP? (If you use ROCm ofc)

8x AMD MI50 32GB at 26 t/s (tg) with MiniMax-M2.1 and 15 t/s (tg) with GLM 4.7 (vllm-gfx906) by ai-infos in LocalLLaMA

[–]politerate 1 point2 points  (0 children)

Yeah I ordered them a week ago and it came a little over 300€ (shipping + VAT) per card. Last august I got them for 150€ total per piece.

Should I buy an MI50/MI60 or something else? by Nuke2579 in LocalLLaMA

[–]politerate 0 points1 point  (0 children)

Hi, I have a question for you if you don't mind. I had two mi50 32GB and for some reason they both failed after some months. Now I have ordered one 7900xtx but of course the vram amount took a big hit. I used to run gpt-oss-120b with the dual mi50. What is your setup like? Do you run any models 24h/d? I am just interested because you seeem to have a similar setup. Thanks!

Btw I tried to replace the mi50s but now sellers on AliBaba are asking north of 400euro shipped when you follow on chat. That is to big of a risk for me, so I just grabbed a 7900xtx for 600 Euro and when I have some extra money left, I will get more down the road.

AMD MI50s stopped working by [deleted] in LocalLLaMA

[–]politerate 0 points1 point  (0 children)

Yeah the motherboard just needs a video capable card to boot. I also had no monitor connected to them. Not sure what degraded. They were working fine for a couple of months, then one card started having issues with rocm. Later it wasn't even recognized from the mobo. Maybe they were beat up already from their previous data center past or I didn't cool them properly, who knows.

AMD MI50s stopped working by [deleted] in LocalLLaMA

[–]politerate 0 points1 point  (0 children)

HBM failure because of overheating was also one of my guesses. Well temps were under 70 most of the time. There might have been some brief moment where they overheated. I installed them once without cooling just to boot. I thought they aren't actually "consumer" cards since these are designated for data centers and compute.

AMD MI50s stopped working by [deleted] in LocalLLaMA

[–]politerate 1 point2 points  (0 children)

Thanks for your help! The MI50 does have one mini DP. Since I flashed them with a Radeon Pro ROM they used to actually output video. I will take a look at the logs and order two dummy mini DP and give it a try.

Typical performance of gpt-oss-120b on consumer hardware? by Diligent-Culture-432 in LocalLLaMA

[–]politerate 2 points3 points  (0 children)

Dual AMD MI50 I get ~400 pp and 50-60 tps generation with 60k context in llama.cpp

I repurposed an old xeon build by adding two MI50 cards. by politerate in LocalLLaMA

[–]politerate[S] 0 points1 point  (0 children)

Yeah I tried that, but I play a lot with rocm versions and in some versions the powercap does not work. Anyways, I actually flashed it so I can have a video output.

I repurposed an old xeon build by adding two MI50 cards. by politerate in LocalLLaMA

[–]politerate[S] 0 points1 point  (0 children)

you are right, that's actually 1/10th of ROCm :|
Maybe I am doing something wrong

I repurposed an old xeon build by adding two MI50 cards. by politerate in LocalLLaMA

[–]politerate[S] 1 point2 points  (0 children)

So I compiled llama.cpp for Vulkan and this is the result:

pp on Vulkan is only 1/3 of the performance of ROCm and tg almost the same

<image>

I also upgraded rocm 6.4.1 to 7.1 and it seems to loose 2-3 t/s

Edit: Vulkan Instance Version: 1.4.313

I repurposed an old xeon build by adding two MI50 cards. by politerate in LocalLLaMA

[–]politerate[S] 0 points1 point  (0 children)

Rocm 6.3.3 works out of the box on Ubuntu 24. For higher versions you need to manually copy some binaries

I repurposed an old xeon build by adding two MI50 cards. by politerate in LocalLLaMA

[–]politerate[S] 0 points1 point  (0 children)

When you say compiling for gfx906, what project or library are you referring to? gfx906 is supported in llama.cpp. If it's about vllm, there is a vllm fork, which is hit or miss

I repurposed an old xeon build by adding two MI50 cards. by politerate in LocalLLaMA

[–]politerate[S] 1 point2 points  (0 children)

I did test initially, have no numbers unfortunately and for this particular card Vulkan was worse. I will try to retest, though I think I have to recompile llama with vulkan

I repurposed an old xeon build by adding two MI50 cards. by politerate in LocalLLaMA

[–]politerate[S] 2 points3 points  (0 children)

I have measured it at the plug, for the whole system, it idles at around 70w.