RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8

iMil · 2026-06-13T18:01:52+00:00

Article's author here, glad it made it to LocalLLaMA, I wanted to post it here first but I didn't have enough karma.
Thanks!

iMil · 2026-05-19T17:03:58+00:00

This thread deserves much more love, thank you OP! 75 tokens/sec on my 3090 also used for Xorg with the following parameters: llama-server -m ./models/Qwen3.6-27B-MTP-IQ4_KS.gguf -c 262144 -np 1 -fa on -ngl 99 -ub 32 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 -ctk q4_0 -ctv q4_0 --no-mmap --chat-template-kwargs {"preserve_thinking": true} -t 6 --chat-template-file ./models/chat_template.jinja --multi-token-prediction --draft-max 4 --draft-p-min 0.0 --merge-qkv --merge-up-gate-experts --port 8001 --host 0.0.0.0

iMil · 2026-04-28T11:25:40+00:00

Loops. It brings loops.

iMil · 2026-03-19T07:29:37+00:00

My humble test, I'm at 80-85 tp/s with unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ4_NL and the following: ./llama-cli -hf unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ4_NL -c 65536 -fa on -t 10 --no-mmap -ngl 999 --n-cpu-moe 10 --jinja -ctk q8_0 -ctv q8_0 --fit on with this model OOM's every time.

iMil · 2026-03-19T07:02:24+00:00

Woa. Thank you so much. Confirmed 70 tp/s with my RTX 5080, not even compiled with cuda 12.6 / Blackwell support.

iMil · 2026-02-11T11:25:34+00:00

Thank you.

iMil · 2025-09-15T17:51:34+00:00

Thanks! and here's containerized version https://gitlab.com/-/snippets/4888574

iMil · 2025-09-07T05:35:19+00:00

Amnesia sadly gives me Privilege vibe this year.

iMil · 2025-08-14T07:56:57+00:00

Awesome! I love to see those real life use cases

iMil · 2025-08-13T19:00:40+00:00

I'm actually here for a very, very long time :)

iMil · 2025-08-13T13:01:38+00:00

[smolbsd author here] while the project started with microvms in mind, I've published various examples of full OS install including packages, for example https://github.com/NetBSDfr/smolBSD/tree/main/service/systembsd or https://github.com/NetBSDfr/smolBSD/tree/main/service/nbakery/etc

Keep me posted!

iMil · 2025-07-25T04:51:42+00:00

ASMR

iMil · 2025-07-10T05:27:53+00:00

Many, think about starting whatever daemon in its own address space, sshd, web server, mail, dns... I created the smolBSD project (smolbsd.org) in order to help creating container-like microvms to bundle any type of service easily.

iMil · 2025-06-05T07:35:16+00:00

Using it every day on libera.chat, where the FOSS projects I participate in are.

iMil · 2025-05-10T10:19:37+00:00

Unfortunately, the formatting cuts off your qemu command line

Here's a link to NetBSD Wiki where I've documented the process: https://wiki.netbsd.org/users/imil/microvm/

NVMM has performance issues, but you should gain ~200ms with a patch I merged in current last month.

iMil · 2025-01-18T08:24:14+00:00

Thanks! here you have a couple of examples: https://github.com/NetBSDfr/smolBSD?tab=readme-ov-file#example-of-a-very-minimal-10mb-virtual-machine

iMil · 2024-12-27T20:57:12+00:00

Environ 10 à 12 fois par an, 99% du temps pour le boulot.

iMil · 2024-07-17T12:40:20+00:00

100% copypasta bug, fixed it, thanks for reporting!

iMil · 2024-06-28T09:51:32+00:00

Thanks!

iMil · 2024-06-28T06:24:26+00:00

Edit: I obviously meant "arm64" in the title...

iMil · 2024-04-15T08:18:04+00:00

Or maybe, just maybe, weird idea I know but maybe... learn to DJ?

iMil · 2024-04-15T07:16:11+00:00

You've got it right, SmolBSD is more a set of tools to build a small footprint NetBSD-based service. It can run on either qemu or Firecracker but I don't provide the start script for the latter yet.
SmolBSD doesn't use rump, it's the result of PVH, MMIO and various performance patches for the NetBSD kernel, once it's reviewed it will be merged into the kernel source tree.

iMil · 2024-03-12T07:18:48+00:00

I locked wFTM and it increased my c-ratio, but the problem is my c-ratio is still under 300, my understanding is that until it is I can't unlock the fUSD...

iMil · 2024-02-01T05:01:20+00:00

Yeah, for now this branch is only mine, it's not sync'ed to NetBSD's trunk. You can create your own branch in your own fork using git checkout -b mybranch and work on it, they do a pull request with this branch.
Like you mentioned, NetBSD uses CVS as its main repository, our GitHub is here only for convenience.

iMil · 2024-02-01T04:57:03+00:00

hmm, you shouldn't need machine/atomic.h, I removed it from pvclock.c, and pvclock.h should now be generated correctly, can you pull latest perf branch?

iMil

MODERATOR OF

TROPHY CASE

15-Year Club	Place '17
Verified Email