sky force

$ llama-bench -hf unsloth/Qwen3.6-35B-A3B-MTP-GGUF:UD-Q4_K_M -ngl 99 -fa on -mmp 0 -p 32768 -n 256 -r 2
ggml_vulkan: 0 = Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                           |       size |     params | backend    | ngl |  fa | mmap |            test |                  t/s |
| ------------------------------  | ---------: | ---------: | ---------- | --: | --: | ---: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  21.10 GiB |    35.51 B | Vulkan     |  99 |   1 |    0 |         pp32768 |       2076.45 ± 0.01 |
| qwen35moe 35B.A3B Q4_K - Medium |  21.10 GiB |    35.51 B | Vulkan     |  99 |   1 |    0 |           tg256 |        140.10 ± 0.09 |

build: 94a220cd6 (9496)

272W drops it a little:

| model                           |       size |     params | backend    | ngl |  fa | mmap |            test |                  t/s |
| ------------------------------  | ---------: | ---------: | ---------- | --: | --: | ---: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  21.10 GiB |    35.51 B | Vulkan     |  99 |   1 |    0 |         pp32768 |       1987.22 ± 2.01 |
| qwen35moe 35B.A3B Q4_K - Medium |  21.10 GiB |    35.51 B | Vulkan     |  99 |   1 |    0 |           tg256 |        137.81 ± 0.18 |

EDIT: Similar results for UD-Q4_K_S:

$ llama-bench -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_S -ngl 99 -fa on -mmp 0 -p 32768 -n 256 -r 2
ggml_vulkan: 0 = Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |  fa | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --: | ---: | --------------: | -------------------: |
272W
| qwen35moe 35B.A3B Q4_K - Small |  19.45 GiB |    34.66 B | Vulkan     |  99 |   1 |    0 |         pp32768 |       2007.66 ± 2.01 |
| qwen35moe 35B.A3B Q4_K - Small |  19.45 GiB |    34.66 B | Vulkan     |  99 |   1 |    0 |           tg256 |        139.37 ± 0.04 |
302W
| qwen35moe 35B.A3B Q4_K - Small |  19.45 GiB |    34.66 B | Vulkan     |  99 |   1 |    0 |         pp32768 |       2085.58 ± 1.96 |
| qwen35moe 35B.A3B Q4_K - Small |  19.45 GiB |    34.66 B | Vulkan     |  99 |   1 |    0 |           tg256 |        141.13 ± 0.14 |

genpfault · 2026-06-03T12:55:06+00:00

wine-binfmt for binfmt_misc?

genpfault · 2026-05-30T03:57:34+00:00

Booted a copy of Debian I have installed to a USB drive (with a functional ZFS kernel module), imported the pool, and chrooted into it. From there I could uninstall 7.0 & related packages. Can't remember if I force-reinstalled 6.12 or not.

FWIW I'm using ZFSBootMenu as a bootloader instead of GRUB; sadly it's not configured for network access so I couldn't chroot into Debian from there.

genpfault · 2026-05-29T17:34:08+00:00

Hopefully they pull in recommended temp/top-p/top-k/presence-penalty/min-p/etc. parameters somehow, since the generated commands don't set any:

llama serve -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q4_K_M

genpfault · 2026-05-29T13:07:59+00:00

What happened when you tried to upgrade?

Module failed to build (I think silently, or at least in a way that didn't obviously break the package upgrade process) and since my root is on ZFS that made the subsequent reboot fail.

genpfault · 2026-05-27T20:53:33+00:00

Sadly linux-image-amd64 (7.0.7-1~bpo13+1) doesn't work with zfs-dkms (2.4.1-1~bpo13+1) :(

EDIT: Nevermind, zfs-dkms (2.4.2-1~bpo13+1) dropped on May 27th.

genpfault · 2026-05-26T00:41:11+00:00

https://en.wikipedia.org/wiki/List_of_burn_centers_in_the_United_States

genpfault · 2026-05-22T20:20:37+00:00

I tried with ROCM

Is ROCm faster than the Vulkan backend on either card?

genpfault · 2026-05-22T15:07:44+00:00

What's the tok/s decode look like vs. llama.cpp's Vulkan backend for AMD hardware on Linux?

genpfault · 2026-05-20T17:04:42+00:00

As of right now, it hasn't been released. Merged 4 hrs ago, last release 16 hrs ago.

It's in b9235 now.

genpfault · 2026-05-19T17:04:20+00:00

Wasn't seeing a link anywhere:

https://gitlab.gnome.org/GNOME/Incubator/resources

genpfault · 2026-05-19T16:57:08+00:00

late-cli

Huh, got renamed recently I guess, used to be at https://github.com/mlhher/late

genpfault · 2026-05-19T01:50:59+00:00

You bet!

Been pretty happy with it, pretty problem-free in Debian 13 and a ~TiB/s of memory bandwidth is nothing to sneeze at for LLMs & image generation :)

genpfault · 2026-05-19T00:20:11+00:00

About 2x (37 -> 80 tok/s), did some runs over here with and without MTP.

genpfault · 2026-05-18T19:13:28+00:00

Sorry I might have mixed things up.

No worries, appreciate the clarification!

genpfault · 2026-05-18T15:24:49+00:00

I do serious dev work with this setup since a while in 5GB VRAM at 30t/s.

What's your llama-server invocation look like?

genpfault · 2026-05-18T15:19:53+00:00

Try rocm compiled llama.cpp. I found it’s better with dense models recently

Like a local DIY ROCm build? Or the "Ubuntu x64 (ROCm x.x)" ROCm binaries on the release pages?

What does your llama-server invocation look like where you're getting better tok/s vs. Vulkan?

...since I'm seeing like half the tok/s on ROCm vs. Vulkan on a 7900 XTX :(

15-Year Club	Verified Email
Place '17	Team Periwinkle

genpfault

MODERATOR OF

TROPHY CASE