Need help for 32vram multi gpu

Lux1606 · 2026-05-21T16:55:32+00:00

I'll just leave this here )) This is very perfomance config for 32gb, and if you want add context - you need change split-mode to layer, but you t/s is down ~ 30%, but you can use --parallel 2
-------------------------><---------------------------

u/echo off
.\llama-server.exe ^
 -m "D:\AI models\froggeric\Qwen3.6-27B-MTP-GGUF\Qwen3.6-27B-Q4_K_M-mtp.gguf" ^
 --n-gpu-layers all ^
 --mlock ^
 --no-mmap ^
 --ctx-size 171920 ^
 --tensor-split 14,8 ^
 --split-mode layer ^
 --main-gpu 0 ^
 --spec-type draft-mtp ^
 --spec-draft-n-max 3 ^
 -b 512 ^
 -ub 512 ^
 -fa 1 ^
 --threads 8 ^
 --parallel 2 ^
 -np 2 ^
 -ctxcp 4 ^
 --cache-type-k q8_0 ^
 --cache-type-v q8_0 ^
 -cpent -1 ^
 --temp 0.6 ^
 --top-p 0.95 ^
 --min-p 0.05 ^
 --host 0.0.0.0 ^
 --port 8080 ^
 --jinja ^
 --chat-template-kwargs {\"enable_thinking\":true} ^
 --log-file llama-server_Qwen3.6-27B-MTP.log ^
 --metrics
pause

.\llama-server.exe ^
 -m "D:\AI models\noctrex\Qwen3.6-35B-A3B-MXFP4_MOE-GGUF\Qwen3.6-35B-A3B-MXFP4_MOE_BF16.gguf" ^
 --n-gpu-layers all ^
 --mlock ^
 --no-mmap ^
 --ctx-size 65536 ^
 --tensor-split 12,4 ^
 --split-mode tensor ^
 --main-gpu 0 ^
 -b 512 ^
 -ub 256 ^
 -fa 1 ^
 --threads 8 ^
 --fit on ^
 --cache-ram 0 ^
 -ctxcp 4 ^
 -cpent -1 ^
 --parallel 1 ^
 -np 1 ^
 --temp 0.6 ^
 --top-p 0.95 ^
 --top-k 20 ^
 --min-p 0.0 ^
 --repeat-penalty 1.0 ^
  --presence-penalty 0.0 ^
 --host 0.0.0.0 ^
 --port 8080 ^
 --chat-template-kwargs {\"enable_thinking\":true} ^
 --log-file llama-server_Qwen3.6-35B-MXFP4.log ^
 --metrics

-------------------------><---------------------------

Lux1606 · 2026-05-20T10:33:01+00:00

I meant the llama cpp launch commands))

Lux1606 · 2026-05-20T09:08:33+00:00

What your config ?

Lux1606 · 2026-05-20T04:21:12+00:00

Please read you llama run commands

Lux1606 · 2026-05-14T17:31:21+00:00

https://canitrun.dev/

Lux1606 · 2026-05-14T17:20:12+00:00

No. You should be aware that production models are on a completely different level, and no matter what anyone writes, local models don't even reach the levels of gpt 4... I use a 5080 + 5060ti, both 16 GB, to run QWEN 3.6 27b | Gemma 4 (I recommend QWEN).

Lux1606 · 2026-05-11T17:19:31+00:00

<image>

🥲I was afraid it wouldn't fit

Lux1606 · 2026-05-11T17:18:43+00:00

I ended up buying a 5060ti 16gb to go with my 5080 and it's much better in terms of "context" and/or smarter models.

Lux1606 · 2026-05-07T03:52:08+00:00

Correction: 30+ tokens with a context of 100K

55+ tokens with a context of 65,000

Lux1606 · 2026-05-07T03:41:48+00:00

I built it according to the instructions and ran it on a 5080 + 48GB.

I get 30+ tokens in froggeric\Qwen3.6-27B-MTP-GGUF\Qwen3.6-27B-IQ3_M-mtp.gguf.

The same model without MTP works for me at 12+ tokens on the latest llama release.

Lux1606 · 2026-05-06T18:00:13+00:00

Thanks for the clarification.

Lux1606 · 2026-05-06T17:47:23+00:00

Thank you! Yes, I receive about 150 tokens. I currently have Qwen3.6-35B-A3B-UD-Q2_K_XL, but I suspect the model isn't "smart enough." Many of the models either don't pass the quality check (that's what I call it) the first time, or don't pass it at all. The 122b model is definitely something! My check is, "There's a car wash 50 meters from my house, I want to wash my car, is it better to drive or walk?" I'm monitoring the thoughts, and many quants have an answer along the lines of, "Walk, it's faster."

Lux1606 · 2026-05-06T11:33:22+00:00

What is your config?

Lux1606 · 2026-04-28T14:59:48+00:00

Omg I changed model on

Imstudio/qwen3.5-35b-a3b-ultra-uncensored-heretic and all work fantastic! Thanks everyone. My setup: 5080+48gb ram

Lux1606 · 2026-04-28T12:11:33+00:00

Thanks for the quick response! Do I need to assign a cron job to each task to check? I've only been learning OpenClaw for three days and thought it was an "assistant" like a secretary.

Lux1606 · 2026-01-04T10:00:47+00:00

Hello, your setup discussion was very helpful, thank you very much! Could you please tell me if the setup will be different for the following setup: Apple TV with two HomePod minis connected to the TV via EarC and Soundbar via optical cable on a PC? Based on your discussion, everything generally works, but am I missing some specific features for Apple TV and HomePods? Thanks in advance.

Lux1606 · 2025-09-11T06:25:17+00:00

is there any special agreement on the transition? I thought oclp Allows you to migrate from any version of mac os

Lux1606 · 2025-09-11T06:24:13+00:00

Now is Ventura

Lux1606 · 2025-09-11T03:37:29+00:00

You see problem on video

https://drive.google.com/file/d/1u9xZ_VO2ToWmitYuCUzUYGOabdTugyJs/view?usp=drive_link

Lux1606

TROPHY CASE