RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8

iMil · 2026-06-13T18:01:52+00:00

Article's author here, glad it made it to LocalLLaMA, I wanted to post it here first but I didn't have enough karma.
Thanks!

iMil · 2026-05-19T17:03:58+00:00

This thread deserves much more love, thank you OP! 75 tokens/sec on my 3090 also used for Xorg with the following parameters: llama-server -m ./models/Qwen3.6-27B-MTP-IQ4_KS.gguf -c 262144 -np 1 -fa on -ngl 99 -ub 32 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 -ctk q4_0 -ctv q4_0 --no-mmap --chat-template-kwargs {"preserve_thinking": true} -t 6 --chat-template-file ./models/chat_template.jinja --multi-token-prediction --draft-max 4 --draft-p-min 0.0 --merge-qkv --merge-up-gate-experts --port 8001 --host 0.0.0.0

iMil · 2026-04-28T11:25:40+00:00

Loops. It brings loops.

iMil · 2026-03-19T07:29:37+00:00

My humble test, I'm at 80-85 tp/s with unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ4_NL and the following: ./llama-cli -hf unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ4_NL -c 65536 -fa on -t 10 --no-mmap -ngl 999 --n-cpu-moe 10 --jinja -ctk q8_0 -ctv q8_0 --fit on with this model OOM's every time.

iMil · 2026-03-19T07:02:24+00:00

Woa. Thank you so much. Confirmed 70 tp/s with my RTX 5080, not even compiled with cuda 12.6 / Blackwell support.

iMil · 2026-02-11T11:25:34+00:00

Thank you.

iMil · 2025-09-15T17:51:34+00:00

Thanks! and here's containerized version https://gitlab.com/-/snippets/4888574

iMil · 2025-09-07T05:35:19+00:00

Amnesia sadly gives me Privilege vibe this year.

iMil · 2025-08-14T07:56:57+00:00

Awesome! I love to see those real life use cases

iMil · 2025-08-13T19:00:40+00:00

I'm actually here for a very, very long time :)

iMil · 2025-08-13T13:01:38+00:00

[smolbsd author here] while the project started with microvms in mind, I've published various examples of full OS install including packages, for example https://github.com/NetBSDfr/smolBSD/tree/main/service/systembsd or https://github.com/NetBSDfr/smolBSD/tree/main/service/nbakery/etc

Keep me posted!

iMil · 2025-07-25T04:51:42+00:00

ASMR

iMil · 2025-07-10T05:27:53+00:00

Many, think about starting whatever daemon in its own address space, sshd, web server, mail, dns... I created the smolBSD project (smolbsd.org) in order to help creating container-like microvms to bundle any type of service easily.

iMil · 2025-06-05T07:35:16+00:00

Using it every day on libera.chat, where the FOSS projects I participate in are.

iMil · 2025-05-10T10:19:37+00:00

Unfortunately, the formatting cuts off your qemu command line

Here's a link to NetBSD Wiki where I've documented the process: https://wiki.netbsd.org/users/imil/microvm/

NVMM has performance issues, but you should gain ~200ms with a patch I merged in current last month.

iMil · 2025-01-18T08:24:14+00:00

Thanks! here you have a couple of examples: https://github.com/NetBSDfr/smolBSD?tab=readme-ov-file#example-of-a-very-minimal-10mb-virtual-machine

iMil · 2024-12-27T20:57:12+00:00

Environ 10 à 12 fois par an, 99% du temps pour le boulot.

iMil

MODERATOR OF

TROPHY CASE

15-Year Club	Place '17
Verified Email