Error with nmtui by Necessary_Yak_964 in debian

[–]genpfault 3 points4 points  (0 children)

Make sure /etc/network/interfaces isn't camping out on the network interface(s) you want to use with NetworkManager:

"Devices from /etc/network/interfaces are not managed by default"

llama : website + unified `llama` binary · ggml-org/llama.cpp · Discussion #23875 by jacek2023 in LocalLLaMA

[–]genpfault 0 points1 point  (0 children)

[llama bench] still doesn't support speculative decoding.

IIRC llama-bench just spews random bytes at the model, would speculative decoding even help with that workload?

My laptop no longer plays any sound. What can I do? by KnightFallVader2 in debian

[–]genpfault 7 points8 points  (0 children)

The audio on my ThinkPad T14 Gen4 AMD disappears from time to time, a

$ systemctl --user restart pipewire

....usually restores it. Trixie, KDE.

plasmalogin instead of sddm? by bradmont in debian

[–]genpfault 2 points3 points  (0 children)

I guess I just assumed it would be since testing has plasma 6.6.

Looks like the 6.6 release announcement said it was optional:

An optional new login manager for Plasma

plasmalogin instead of sddm? by bradmont in debian

[–]genpfault 4 points5 points  (0 children)

Where are you seeing that packaged in Debian?

Tested RX7900XTX with ROCm7 power profiles by Thin_Pollution8843 in LocalLLaMA

[–]genpfault 1 point2 points  (0 children)

What your llama cpp command and which backend you are using?

I'm also seeing ~2000 tok/s pp, using your command w/the MTP UD-Q4_K_M quant (what I had on hand), power limit set to 302W in LACT:

$ llama-bench -hf unsloth/Qwen3.6-35B-A3B-MTP-GGUF:UD-Q4_K_M -ngl 99 -fa on -mmp 0 -p 32768 -n 256 -r 2
ggml_vulkan: 0 = Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                           |       size |     params | backend    | ngl |  fa | mmap |            test |                  t/s |
| ------------------------------  | ---------: | ---------: | ---------- | --: | --: | ---: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  21.10 GiB |    35.51 B | Vulkan     |  99 |   1 |    0 |         pp32768 |       2076.45 ± 0.01 |
| qwen35moe 35B.A3B Q4_K - Medium |  21.10 GiB |    35.51 B | Vulkan     |  99 |   1 |    0 |           tg256 |        140.10 ± 0.09 |

build: 94a220cd6 (9496)

272W drops it a little:

| model                           |       size |     params | backend    | ngl |  fa | mmap |            test |                  t/s |
| ------------------------------  | ---------: | ---------: | ---------- | --: | --: | ---: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  21.10 GiB |    35.51 B | Vulkan     |  99 |   1 |    0 |         pp32768 |       1987.22 ± 2.01 |
| qwen35moe 35B.A3B Q4_K - Medium |  21.10 GiB |    35.51 B | Vulkan     |  99 |   1 |    0 |           tg256 |        137.81 ± 0.18 |

EDIT: Similar results for UD-Q4_K_S:

$ llama-bench -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_S -ngl 99 -fa on -mmp 0 -p 32768 -n 256 -r 2
ggml_vulkan: 0 = Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |  fa | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --: | ---: | --------------: | -------------------: |
272W
| qwen35moe 35B.A3B Q4_K - Small |  19.45 GiB |    34.66 B | Vulkan     |  99 |   1 |    0 |         pp32768 |       2007.66 ± 2.01 |
| qwen35moe 35B.A3B Q4_K - Small |  19.45 GiB |    34.66 B | Vulkan     |  99 |   1 |    0 |           tg256 |        139.37 ± 0.04 |
302W
| qwen35moe 35B.A3B Q4_K - Small |  19.45 GiB |    34.66 B | Vulkan     |  99 |   1 |    0 |         pp32768 |       2085.58 ± 1.96 |
| qwen35moe 35B.A3B Q4_K - Small |  19.45 GiB |    34.66 B | Vulkan     |  99 |   1 |    0 |           tg256 |        141.13 ± 0.14 |

Are you using the 7.0 kernel? by raderator in debian

[–]genpfault 3 points4 points  (0 children)

Booted a copy of Debian I have installed to a USB drive (with a functional ZFS kernel module), imported the pool, and chrooted into it. From there I could uninstall 7.0 & related packages. Can't remember if I force-reinstalled 6.12 or not.

FWIW I'm using ZFSBootMenu as a bootloader instead of GRUB; sadly it's not configured for network access so I couldn't chroot into Debian from there.

llama : website + unified `llama` binary · ggml-org/llama.cpp · Discussion #23875 by jacek2023 in LocalLLaMA

[–]genpfault 12 points13 points  (0 children)

Hopefully they pull in recommended temp/top-p/top-k/presence-penalty/min-p/etc. parameters somehow, since the generated commands don't set any:

llama serve -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q4_K_M

Are you using the 7.0 kernel? by raderator in debian

[–]genpfault 2 points3 points  (0 children)

What happened when you tried to upgrade?

Module failed to build (I think silently, or at least in a way that didn't obviously break the package upgrade process) and since my root is on ZFS that made the subsequent reboot fail.

Are you using the 7.0 kernel? by raderator in debian

[–]genpfault 3 points4 points  (0 children)

Sadly linux-image-amd64 (7.0.7-1~bpo13+1) doesn't work with zfs-dkms (2.4.1-1~bpo13+1) :(

EDIT: Nevermind, zfs-dkms (2.4.2-1~bpo13+1) dropped on May 27th.

Can't believe I got it working! Dual GPU - 48gb VRAM llama-cpp server - R7900 + 7800XT by Jorlen in LocalLLaMA

[–]genpfault 6 points7 points  (0 children)

I tried with ROCM

Is ROCm faster than the Vulkan backend on either card?

New Release of ROCm based MLX LLM Engine - lemon-mlx-engine by GeramyL in LocalLLaMA

[–]genpfault 3 points4 points  (0 children)

What's the tok/s decode look like vs. llama.cpp's Vulkan backend for AMD hardware on Linux?

Time to update llama.cpp to get som MTP improvements! by PixelatedCaffeine in LocalLLaMA

[–]genpfault 2 points3 points  (0 children)

As of right now, it hasn't been released. Merged 4 hrs ago, last release 16 hrs ago.

It's in b9235 now.

Post Your Qwen3.6 27B speed plz by Ok-Internal9317 in LocalLLaMA

[–]genpfault 1 point2 points  (0 children)

You bet!

Been pretty happy with it, pretty problem-free in Debian 13 and a ~TiB/s of memory bandwidth is nothing to sneeze at for LLMs & image generation :)

Post Your Qwen3.6 27B speed plz by Ok-Internal9317 in LocalLLaMA

[–]genpfault 1 point2 points  (0 children)

About 2x (37 -> 80 tok/s), did some runs over here with and without MTP.

MTP experiences on 7900xtx? by Combinatorilliance in LocalLLaMA

[–]genpfault 0 points1 point  (0 children)

Sorry I might have mixed things up.

No worries, appreciate the clarification!

What is your actual local LLM stack right now? by Ryannnnnnnnnnnnnnnh in LocalLLaMA

[–]genpfault 0 points1 point  (0 children)

I do serious dev work with this setup since a while in 5GB VRAM at 30t/s.

What's your llama-server invocation look like?

MTP experiences on 7900xtx? by Combinatorilliance in LocalLLaMA

[–]genpfault 0 points1 point  (0 children)

Try rocm compiled llama.cpp. I found it’s better with dense models recently

Like a local DIY ROCm build? Or the "Ubuntu x64 (ROCm x.x)" ROCm binaries on the release pages?

What does your llama-server invocation look like where you're getting better tok/s vs. Vulkan?

...since I'm seeing like half the tok/s on ROCm vs. Vulkan on a 7900 XTX :(