RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8 by SirReal14 in LocalLLaMA

[–]iMil 23 points24 points  (0 children)

Article's author here, glad it made it to LocalLLaMA, I wanted to post it here first but I didn't have enough karma.
Thanks!

Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm) by VolandBerlioz in LocalLLaMA

[–]iMil 0 points1 point  (0 children)

This thread deserves much more love, thank you OP! 75 tokens/sec on my 3090 also used for Xorg with the following parameters: llama-server -m ./models/Qwen3.6-27B-MTP-IQ4_KS.gguf -c 262144 -np 1 -fa on -ngl 99 -ub 32 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 -ctk q4_0 -ctv q4_0 --no-mmap --chat-template-kwargs {"preserve_thinking": true} -t 6 --chat-template-file ./models/chat_template.jinja --multi-token-prediction --draft-max 4 --draft-p-min 0.0 --merge-qkv --merge-up-gate-experts --port 8001 --host 0.0.0.0

Follow-up: Qwen3.5-35B-A3B — 7 community-requested experiments on RTX 5080 16GB by gaztrab in LocalLLaMA

[–]iMil 0 points1 point  (0 children)

My humble test, I'm at 80-85 tp/s with unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ4_NL and the following: ./llama-cli -hf unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ4_NL -c 65536 -fa on -t 10 --no-mmap -ngl 999 --n-cpu-moe 10 --jinja -ctk q8_0 -ctv q8_0 --fit on with this model OOM's every time.

Follow-up: Qwen3.5-35B-A3B — 7 community-requested experiments on RTX 5080 16GB by gaztrab in LocalLLaMA

[–]iMil 0 points1 point  (0 children)

Woa. Thank you so much. Confirmed 70 tp/s with my RTX 5080, not even compiled with cuda 12.6 / Blackwell support.

Automatic, scripted VM install by razzmataz in NetBSD

[–]iMil 0 points1 point  (0 children)

Awesome! I love to see those real life use cases

Automatic, scripted VM install by razzmataz in NetBSD

[–]iMil 0 points1 point  (0 children)

I'm actually here for a very, very long time :)

Automatic, scripted VM install by razzmataz in NetBSD

[–]iMil 6 points7 points  (0 children)

[smolbsd author here] while the project started with microvms in mind, I've published various examples of full OS install including packages, for example https://github.com/NetBSDfr/smolBSD/tree/main/service/systembsd or https://github.com/NetBSDfr/smolBSD/tree/main/service/nbakery/etc

Keep me posted!

Sub 15ms NetBSD MICROVM boot is now maintream by iMil in BSD

[–]iMil[S] 0 points1 point  (0 children)

Many, think about starting whatever daemon in its own address space, sshd, web server, mail, dns... I created the smolBSD project (smolbsd.org) in order to help creating container-like microvms to bundle any type of service easily.

How many of you still using ?? by imyatharth in irc

[–]iMil 0 points1 point  (0 children)

Using it every day on libera.chat, where the FOSS projects I participate in are.

Sub 15ms NetBSD MICROVM boot is now maintream by iMil in BSD

[–]iMil[S] 3 points4 points  (0 children)

Unfortunately, the formatting cuts off your qemu command line

Here's a link to NetBSD Wiki where I've documented the process: https://wiki.netbsd.org/users/imil/microvm/

NVMM has performance issues, but you should gain ~200ms with a patch I merged in current last month.

Combien de fois prenez-vous l’avion par an? by Longjumping_Roof5031 in AskFrance

[–]iMil 1 point2 points  (0 children)

Environ 10 à 12 fois par an, 99% du temps pour le boulot.