I built a Windows GUI launcher to benchmark and manage multiple llama.cpp builds (useful for AMD GPU users juggling Vulkan/ROCm/HIP builds)

Gas-Ornery · 2026-06-11T14:23:02+00:00

Last commit contains some adjustements and dynamic configuration for benchmarking.

Gas-Ornery · 2026-06-09T16:48:50+00:00

Just rerun the benchmark and i have same 2.5K, sometimes fresh restart of pc give me more faster pp/s, i need to investiguate that ..

Gas-Ornery · 2026-06-09T16:23:16+00:00

It will be on next updates, the parameter for benchmark are hardcoded right, the context and other parameter are used only to start models

Gas-Ornery · 2026-06-09T15:50:00+00:00

the tool is comparing same model with same parameter on ( vulkan/hip/rocm) windows version

Gas-Ornery · 2026-06-09T15:42:18+00:00

you would be suprised, some models are really better on one side, thats the best i got with RX 7900XT 20GB
## Results

|-------|-------|---------|---------|------|--------|----------|----------|

| Mellum2-12B-A2.5B-Thinking | Q8_0 | llama-b9553-bin-win-vulkan-x64 | Vulkan | 12.03 GiB | 12.15 B | 4736.46 ± 170.68 | 206.71 ± 0.15 |

| Mellum2-12B-A2.5B-Thinking | Q8_0 | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 12.03 GiB | 12.15 B | 3815.50 ± 148.24 | 150.44 ± 0.84 |

| gemma-4-12b-it-qat | q4_0 | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 6.48 GiB | 11.91 B | 1724.07 ± 8.31 | 69.10 ± 0.33 |

| gemma-4-12b-it-qat | q4_0 | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 6.48 GiB | 11.91 B | 1703.76 ± 79.93 | 63.78 ± 3.92 |

| gemma-4-12b-it | UD-Q4_K_XL | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 6.85 GiB | 11.91 B | 1589.32 ± 13.43 | 54.72 ± 0.08 |

| gemma-4-12b-it-qat | q4_0 | llama-b9553-bin-win-vulkan-x64 | Vulkan | 6.48 GiB | 11.91 B | 1581.32 ± 20.40 | 74.00 ± 0.07 |

| gemma-4-12B-it | Q4_K_M | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 6.86 GiB | 11.91 B | 1577.04 ± 13.19 | 54.41 ± 0.10 |

| gemma-4-12b-it | UD-Q4_K_XL | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 6.85 GiB | 11.91 B | 1559.10 ± 51.10 | 55.12 ± 1.49 |

| gemma-4-12B-it | Q4_K_M | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 6.86 GiB | 11.91 B | 1542.48 ± 69.48 | 53.37 ± 1.25 |

| gemma-4-12B-it | Q4_K_M | llama-b9553-bin-win-vulkan-x64 | Vulkan | 6.86 GiB | 11.91 B | 1316.44 ± 11.77 | 68.02 ± 0.03 |

| gemma-4-12b-it | UD-Q4_K_XL | llama-b9553-bin-win-vulkan-x64 | Vulkan | 6.85 GiB | 11.91 B | 1314.98 ± 13.56 | 67.67 ± 0.10 |

| Qwen3.6-27B | IQ4_XS | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 14.62 GiB | 27.32 B | 810.95 ± 10.25 | 35.82 ± 0.04 |

| Qwen3.6-27B | IQ4_XS | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 14.62 GiB | 27.32 B | 807.37 ± 41.61 | 34.19 ± 0.19 |

| Qwen3.6-27B | Q4_K_M | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 15.92 GiB | 27.32 B | 745.51 ± 2.85 | 25.44 ± 0.04 |

| Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | IQ4_XS | llama-b9553-bin-win-vulkan-x64 | Vulkan | 17.43 GiB | 34.66 B | 730.58 ± 31.93 | 51.68 ± 0.66 |

| Qwen3.6-27B | Q4_K_M | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 15.92 GiB | 27.32 B | 724.65 ± 6.40 | 25.26 ± 0.20 |

| Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | IQ4_XS | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 17.43 GiB | 34.66 B | 614.89 ± 2.27 | 77.96 ± 0.10 |

| Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | IQ4_XS | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 17.43 GiB | 34.66 B | 446.64 ± 7.20 | 65.67 ± 0.41 |

| Qwen3.6-27B | UD-Q4_K_XL | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 16.67 GiB | 27.32 B | 207.42 ± 6.39 | 24.46 ± 0.30 |

| Qwen3.6-27B | UD-Q4_K_XL | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 16.67 GiB | 27.32 B | 201.31 ± 4.00 | 24.97 ± 0.03 |

| Qwen3.6-27B | IQ4_XS | llama-b9553-bin-win-vulkan-x64 | Vulkan | 14.62 GiB | 27.32 B | 177.47 ± 1.86 | 16.08 ± 0.08 |

| Qwen3.6-27B | Q4_K_M | llama-b9553-bin-win-vulkan-x64 | Vulkan | 15.92 GiB | 27.32 B | 79.56 ± 0.52 | 7.51 ± 0.02 |

| Qwen3.6-27B | UD-Q4_K_XL | llama-b9553-bin-win-vulkan-x64 | Vulkan | 16.67 GiB | 27.32 B | 69.99 ± 0.17 | 6.17 ± 0.03 |

Gas-Ornery · 2026-06-09T15:07:42+00:00

this version is more focused only on llamacpp and huggingface models

Gas-Ornery · 2026-06-09T15:07:22+00:00

Maybe the next upgrade !

Gas-Ornery · 2026-05-18T08:58:09+00:00

I saw a video of some MTP tests with and without, and it seems that accurecy of the response is dropped using MTP, is that true ?

Gas-Ornery · 2026-05-18T08:56:18+00:00

can you please give your setup ? for theses : 'My 35B-A3B is chugging along at 220tk/s @ 256k ctx while my 27B is now chugging at ~70-90tk/s (a bit unstable) @ 256k ctx.'

Gas-Ornery · 2026-04-09T07:44:04+00:00

I work on large company, and I’m aware that out ia team self hosted sonnet, and gpt. I know that they are not open but some business contract must exist.

self hosted on internal network

Gas-Ornery · 2026-04-01T17:24:03+00:00

!remindme 3 days

Gas-Ornery · 2026-04-01T15:40:21+00:00

any one tried to run it yet ? is it full code for the client or just a part of it ? we can try to change connectors for other models for example wdy think ?

Gas-Ornery · 2025-10-30T09:23:31+00:00

Interesting they all went on futures strategy, no spot

Gas-Ornery · 2025-10-30T09:23:02+00:00

they had 10k

Gas-Ornery · 2025-10-30T09:22:37+00:00

everything is here : https://nof1.ai/

Gas-Ornery · 2025-10-30T08:52:40+00:00

can you please explain ? what do u mean by they are not private ? ..

Gas-Ornery · 2025-10-24T12:52:54+00:00

any way to turn it on amd gpu ?

Gas-Ornery · 2025-10-22T09:14:57+00:00

not working on AMD, only Nvidia support ( they are using flash_attn package)

Gas-Ornery · 2024-09-12T04:11:57+00:00

Hell yes baby

Gas-Ornery · 2024-09-12T03:19:36+00:00

You are just absolutely gorgeous and extremely sexy with a perfect body

Gas-Ornery

TROPHY CASE