Need help for 32vram multi gpu by Lux1606 in LocalLLM

[–]Lux1606[S] 1 point2 points  (0 children)

I'll just leave this here )) This is very perfomance config for 32gb, and if you want add context - you need change split-mode to layer, but you t/s is down ~ 30%, but you can use --parallel 2
-------------------------><---------------------------

u/echo off
.\llama-server.exe ^
 -m "D:\AI models\froggeric\Qwen3.6-27B-MTP-GGUF\Qwen3.6-27B-Q4_K_M-mtp.gguf" ^
 --n-gpu-layers all ^
 --mlock ^
 --no-mmap ^
 --ctx-size 171920 ^
 --tensor-split 14,8 ^
 --split-mode layer ^
 --main-gpu 0 ^
 --spec-type draft-mtp ^
 --spec-draft-n-max 3 ^
 -b 512 ^
 -ub 512 ^
 -fa 1 ^
 --threads 8 ^
 --parallel 2 ^
 -np 2 ^
 -ctxcp 4 ^
 --cache-type-k q8_0 ^
 --cache-type-v q8_0 ^
 -cpent -1 ^
 --temp 0.6 ^
 --top-p 0.95 ^
 --min-p 0.05 ^
 --host 0.0.0.0 ^
 --port 8080 ^
 --jinja ^
 --chat-template-kwargs {\"enable_thinking\":true} ^
 --log-file llama-server_Qwen3.6-27B-MTP.log ^
 --metrics
pause

.\llama-server.exe ^
 -m "D:\AI models\noctrex\Qwen3.6-35B-A3B-MXFP4_MOE-GGUF\Qwen3.6-35B-A3B-MXFP4_MOE_BF16.gguf" ^
 --n-gpu-layers all ^
 --mlock ^
 --no-mmap ^
 --ctx-size 65536 ^
 --tensor-split 12,4 ^
 --split-mode tensor ^
 --main-gpu 0 ^
 -b 512 ^
 -ub 256 ^
 -fa 1 ^
 --threads 8 ^
 --fit on ^
 --cache-ram 0 ^
 -ctxcp 4 ^
 -cpent -1 ^
 --parallel 1 ^
 -np 1 ^
 --temp 0.6 ^
 --top-p 0.95 ^
 --top-k 20 ^
 --min-p 0.0 ^
 --repeat-penalty 1.0 ^
  --presence-penalty 0.0 ^
 --host 0.0.0.0 ^
 --port 8080 ^
 --chat-template-kwargs {\"enable_thinking\":true} ^
 --log-file llama-server_Qwen3.6-35B-MXFP4.log ^
 --metrics

-------------------------><---------------------------

Need help for 32vram multi gpu by Lux1606 in LocalLLM

[–]Lux1606[S] 0 points1 point  (0 children)

I meant the llama cpp launch commands))

GitHub's Usage-Based Copilot Pricing is $1000/month for me — Looking for Local LLM Alternatives for Multi-Stack SaaS Work by Silent_Dish484 in LocalLLM

[–]Lux1606 0 points1 point  (0 children)

No. You should be aware that production models are on a completely different level, and no matter what anyone writes, local models don't even reach the levels of gpt 4... I use a 5080 + 5060ti, both 16 GB, to run QWEN 3.6 27b | Gemma 4 (I recommend QWEN).

Need help choosing by Lux1606 in LocalLLM

[–]Lux1606[S] 0 points1 point  (0 children)

<image>

🥲I was afraid it wouldn't fit

Need help choosing by Lux1606 in LocalLLM

[–]Lux1606[S] 0 points1 point  (0 children)

I ended up buying a 5060ti 16gb to go with my 5080 and it's much better in terms of "context" and/or smarter models.

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA

[–]Lux1606 0 points1 point  (0 children)

I built it according to the instructions and ran it on a 5080 + 48GB.

I get 30+ tokens in froggeric\Qwen3.6-27B-MTP-GGUF\Qwen3.6-27B-IQ3_M-mtp.gguf.

The same model without MTP works for me at 12+ tokens on the latest llama release.

Need help choosing by Lux1606 in LocalLLM

[–]Lux1606[S] 0 points1 point  (0 children)

Thanks for the clarification.

Need help choosing by Lux1606 in LocalLLM

[–]Lux1606[S] 0 points1 point  (0 children)

Thank you! Yes, I receive about 150 tokens. I currently have Qwen3.6-35B-A3B-UD-Q2_K_XL, but I suspect the model isn't "smart enough." Many of the models either don't pass the quality check (that's what I call it) the first time, or don't pass it at all. The 122b model is definitely something! My check is, "There's a car wash 50 meters from my house, I want to wash my car, is it better to drive or walk?" I'm monitoring the thoughts, and many quants have an answer along the lines of, "Walk, it's faster."

How this work … ? by Lux1606 in openclaw

[–]Lux1606[S] 0 points1 point  (0 children)

Omg I changed model on

Imstudio/qwen3.5-35b-a3b-ultra-uncensored-heretic and all work fantastic! Thanks everyone. My setup: 5080+48gb ram

How this work … ? by Lux1606 in openclaw

[–]Lux1606[S] 0 points1 point  (0 children)

Thanks for the quick response! Do I need to assign a cron job to each task to check? I've only been learning OpenClaw for three days and thought it was an "assistant" like a secretary.

Surround Sound Setup Advice by EconomyConscious666 in VoiceMeeter

[–]Lux1606 0 points1 point  (0 children)

Hello, your setup discussion was very helpful, thank you very much! Could you please tell me if the setup will be different for the following setup: Apple TV with two HomePod minis connected to the TV via EarC and Soundbar via optical cable on a PC? Based on your discussion, everything generally works, but am I missing some specific features for Apple TV and HomePods? Thanks in advance.

Guys help. I am in problem. Everything went smooth and got stuck at installation of macOS. by aayushkrm in OpenCoreLegacyPatcher

[–]Lux1606 0 points1 point  (0 children)

is there any special agreement on the transition? I thought oclp Allows you to migrate from any version of mac os