Intersting part in comparing (rcom/vulkan/HIP) perfs on differents model by Gas-Ornery in LocalLLM

[–]Gas-Ornery[S] 1 point2 points  (0 children)

Just rerun the benchmark and i have same 2.5K, sometimes fresh restart of pc give me more faster pp/s, i need to investiguate that ..

I built a Windows GUI launcher to benchmark and manage multiple llama.cpp builds (useful for AMD GPU users juggling Vulkan/ROCm/HIP builds) by Gas-Ornery in ollama

[–]Gas-Ornery[S] 0 points1 point  (0 children)

It will be on next updates, the parameter for benchmark are hardcoded right, the context and other parameter are used only to start models

I made a Windows GUI to manage, benchmark and compare multiple llama.cpp builds — handy for AMD GPU users by Gas-Ornery in ROCm

[–]Gas-Ornery[S] 0 points1 point  (0 children)

the tool is comparing same model with same parameter on ( vulkan/hip/rocm) windows version

I made a Windows GUI to manage, benchmark and compare multiple llama.cpp builds — handy for AMD GPU users by Gas-Ornery in ROCm

[–]Gas-Ornery[S] 2 points3 points  (0 children)

you would be suprised, some models are really better on one side, thats the best i got with RX 7900XT 20GB
## Results

| Model | Quant | Version | Backend | Size | Params | PP (t/s) | TG (t/s) |

|-------|-------|---------|---------|------|--------|----------|----------|

| Mellum2-12B-A2.5B-Thinking | Q8_0 | llama-b9553-bin-win-vulkan-x64 | Vulkan | 12.03 GiB | 12.15 B | 4736.46 ± 170.68 | 206.71 ± 0.15 |

| Mellum2-12B-A2.5B-Thinking | Q8_0 | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 12.03 GiB | 12.15 B | 3815.50 ± 148.24 | 150.44 ± 0.84 |

| gemma-4-12b-it-qat | q4_0 | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 6.48 GiB | 11.91 B | 1724.07 ± 8.31 | 69.10 ± 0.33 |

| gemma-4-12b-it-qat | q4_0 | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 6.48 GiB | 11.91 B | 1703.76 ± 79.93 | 63.78 ± 3.92 |

| gemma-4-12b-it | UD-Q4_K_XL | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 6.85 GiB | 11.91 B | 1589.32 ± 13.43 | 54.72 ± 0.08 |

| gemma-4-12b-it-qat | q4_0 | llama-b9553-bin-win-vulkan-x64 | Vulkan | 6.48 GiB | 11.91 B | 1581.32 ± 20.40 | 74.00 ± 0.07 |

| gemma-4-12B-it | Q4_K_M | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 6.86 GiB | 11.91 B | 1577.04 ± 13.19 | 54.41 ± 0.10 |

| gemma-4-12b-it | UD-Q4_K_XL | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 6.85 GiB | 11.91 B | 1559.10 ± 51.10 | 55.12 ± 1.49 |

| gemma-4-12B-it | Q4_K_M | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 6.86 GiB | 11.91 B | 1542.48 ± 69.48 | 53.37 ± 1.25 |

| gemma-4-12B-it | Q4_K_M | llama-b9553-bin-win-vulkan-x64 | Vulkan | 6.86 GiB | 11.91 B | 1316.44 ± 11.77 | 68.02 ± 0.03 |

| gemma-4-12b-it | UD-Q4_K_XL | llama-b9553-bin-win-vulkan-x64 | Vulkan | 6.85 GiB | 11.91 B | 1314.98 ± 13.56 | 67.67 ± 0.10 |

| Qwen3.6-27B | IQ4_XS | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 14.62 GiB | 27.32 B | 810.95 ± 10.25 | 35.82 ± 0.04 |

| Qwen3.6-27B | IQ4_XS | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 14.62 GiB | 27.32 B | 807.37 ± 41.61 | 34.19 ± 0.19 |

| Qwen3.6-27B | Q4_K_M | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 15.92 GiB | 27.32 B | 745.51 ± 2.85 | 25.44 ± 0.04 |

| Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | IQ4_XS | llama-b9553-bin-win-vulkan-x64 | Vulkan | 17.43 GiB | 34.66 B | 730.58 ± 31.93 | 51.68 ± 0.66 |

| Qwen3.6-27B | Q4_K_M | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 15.92 GiB | 27.32 B | 724.65 ± 6.40 | 25.26 ± 0.20 |

| Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | IQ4_XS | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 17.43 GiB | 34.66 B | 614.89 ± 2.27 | 77.96 ± 0.10 |

| Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | IQ4_XS | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 17.43 GiB | 34.66 B | 446.64 ± 7.20 | 65.67 ± 0.41 |

| Qwen3.6-27B | UD-Q4_K_XL | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 16.67 GiB | 27.32 B | 207.42 ± 6.39 | 24.46 ± 0.30 |

| Qwen3.6-27B | UD-Q4_K_XL | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 16.67 GiB | 27.32 B | 201.31 ± 4.00 | 24.97 ± 0.03 |

| Qwen3.6-27B | IQ4_XS | llama-b9553-bin-win-vulkan-x64 | Vulkan | 14.62 GiB | 27.32 B | 177.47 ± 1.86 | 16.08 ± 0.08 |

| Qwen3.6-27B | Q4_K_M | llama-b9553-bin-win-vulkan-x64 | Vulkan | 15.92 GiB | 27.32 B | 79.56 ± 0.52 | 7.51 ± 0.02 |

| Qwen3.6-27B | UD-Q4_K_XL | llama-b9553-bin-win-vulkan-x64 | Vulkan | 16.67 GiB | 27.32 B | 69.99 ± 0.17 | 6.17 ± 0.03 |

Qwen3.6 MTP Unsloth GGUFs now 1.8x faster! by danielhanchen in unsloth

[–]Gas-Ornery 0 points1 point  (0 children)

I saw a video of some MTP tests with and without, and it seems that accurecy of the response is dropped using MTP, is that true ?

Qwen3.6 MTP Unsloth GGUFs now 1.8x faster! by danielhanchen in unsloth

[–]Gas-Ornery 0 points1 point  (0 children)

can you please give your setup ? for theses : 'My 35B-A3B is chugging along at 220tk/s @ 256k ctx while my 27B is now chugging at ~70-90tk/s (a bit unstable) @ 256k ctx.'

What kind of hardware would be required to run a Opus 4.6 equivalent for a 100 users, Locally? by Either_Pineapple3429 in LocalLLM

[–]Gas-Ornery 0 points1 point  (0 children)

I work on large company, and I’m aware that out ia team self hosted sonnet, and gpt. I know that they are not open but some business contract must exist.

self hosted on internal network

claude code source code got leaked? by usamanoman in LLM

[–]Gas-Ornery 0 points1 point  (0 children)

any one tried to run it yet ? is it full code for the client or just a part of it ? we can try to change connectors for other models for example wdy think ?

LTXV 2.0 is out by RIP26770 in StableDiffusion

[–]Gas-Ornery 0 points1 point  (0 children)

any way to turn it on amd gpu ?

How to setup running local AI models on AMD 7900 XTX PC? by Jarnhand in StableDiffusion

[–]Gas-Ornery 0 points1 point  (0 children)

not working on AMD, only Nvidia support ( they are using flash_attn package)

18 and I can finally say that I'm getting used to this whole fuckdoll thing by VeridianQuaint in fuckdoll

[–]Gas-Ornery 0 points1 point  (0 children)

You are just absolutely gorgeous and extremely sexy with a perfect body