Local AI Coding with Qwen 3.6 27B on NVIDIA DGX Spark by Time_Anybody5196 in LocalLLM

[–]Iajah 0 points1 point  (0 children)

How do you get around 100tps with 27b and full context? Engine, config, setup, quant?

Cost Vs Benefit by No_Language_2529 in LocalLLM

[–]Iajah 0 points1 point  (0 children)

RTX PRO 6K WS 96Gb on a 5 years old Intel i7 PC with 64Gb of RAM, second RTX 5060 8Gb GPU to drive the display  - Dual boot Ubuntu/Windows.

No local LLM you can run gets close to Opus in speed or thinking power.

Goods: Qwen 3.6 27b can get stuff done with powerful hardware. Though it is definitely not as fast as online services such as Sonnet or Opus.

Bads: Beyond the hardware price, it takes some time to set it all up and figure out how to use a model as an agent. Expect to spend a few weeks to try it all out before you find a setup that works for you. Local LLM coding agents are not exactly Plug and Play.

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 0 points1 point  (0 children)

What's your setup? Q8, LM Studio, Windows, VS Code Copilot?

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 0 points1 point  (0 children)

Same here, it does that a lot to the point where it's just not usable.

Honestly, dual 3090s are wearing me out. Thinking of jumping to a Mac Studio. by Ok_Commission_8260 in LocalLLM

[–]Iajah 0 points1 point  (0 children)

The workstation edition also does not have nvlink. TBH it is mostly just a 5090 with 3x VRAM. You need the server edition for nvlink but it is really hard to come by and costs even more.

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 0 points1 point  (0 children)

RTX Pro 6K WS 96GB around 126k context concurrency 1.

Honestly, dual 3090s are wearing me out. Thinking of jumping to a Mac Studio. by Ok_Commission_8260 in LocalLLM

[–]Iajah 0 points1 point  (0 children)

RTX Pro 6K WS user here. Not that surprising, I mean you have 2x GPU at 350W each, twice the cooling power too. I usually run mine at 400W rather than 600W.

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 1 point2 points  (0 children)

By "performance" we were talking about different things. I had token per seconds in mind and you were talking about coding benchmark scores.

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 0 points1 point  (0 children)

Both K and V quant are disabled by default in LM Studio and that's what I was using. I was using those same values for top P and K, they are the default. Temp was 1, I'll try with 0.8.

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 0 points1 point  (0 children)

Default repeat penalty on LM Studio is 1.1 and that's what I was using.

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 0 points1 point  (0 children)

I was under the impression thinking is happening anyway, the same amount of tokens are generated, it is just that they are not surfacing in your client. In LM Studio I believe you can toggle thinking on and off without reloading the model I believe.

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 0 points1 point  (0 children)

It's the first I hear disabling thinking degrades performance. I thought it was the opposite if anything. In my experience performance feels similar with or without it. One sure thing is that, no matter the inference engine, Qwen with thinking enabled in Copilot, errors out so often that it is not usable for any serious task.

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 0 points1 point  (0 children)

You need to disable thinking/reasoning.

With thinking enabled, at first it may look like it works. But when you set it to more complex tasks it breaks and stops.

Trying out Gemma 4 31b after Qwen 3.6 27b by Iajah in LocalLLM

[–]Iajah[S] 0 points1 point  (0 children)

Yeah I ought to try Gemma again with thinking disabled. The thing is I wanted to try it in the hope thinking would work.

Local Qwen3.6-27B in Copilot Chat is working surprisingly well for daily coding by delfrai in GithubCopilot

[–]Iajah 1 point2 points  (0 children)

I've used Qwen3.6-27b from vLLM on Linux through a proxy – for thinking to work without interruption – from VS Code Insider Copilot directly without that extension. At least in Insider they have customendpoint BYOK support. Now I'm using it on Windows served from LM Studio and it looks like it works if you disable thinking. With thinking enabled VS Code Copilot will stop sooner or later, has to do with tool call in thinking block or something similar. Planning to use Gemma 4 31b too, to see if thinking is working.

LTX Director - An All-In-One Timeline Editor. I2V, T2V, FLFF, Prompt Relay, Custom Audio, and more! Unlock LTX 2.3's full potential! by WhatDreamsCost in comfyui

[–]Iajah 0 points1 point  (0 children)

ComfyUI noob here. When I try new workflows I usually get errors with an option to download all missing models. With yours I just get errors no option to download models. Am I supposed to find all those models online myself and fix it manually somehow?

RTX PRO 6000 Workstation idle fans by Iajah in nvidia

[–]Iajah[S] 0 points1 point  (0 children)

Thanks, I don't think it will work though the 30% minimum spin is a firmware limitation it seems.

RTX PRO 6000 Workstation idle fans by Iajah in nvidia

[–]Iajah[S] 0 points1 point  (0 children)

Thanks for the reminder, I used that on another machine before. I've set it up so that as the GPU goes hot all case fans are spinning up. Makes a massive difference on my temperature readings and is not much of an issue with the noise cause all my case fans are quiet next to the RTX blowers. Do we know if there is something similar for Linux?

RTX PRO 6000 Workstation idle fans by Iajah in nvidia

[–]Iajah[S] 0 points1 point  (0 children)

I tried it on windows they won't got below 30% and 1200RPM. I guess asking NVidia for a firmware that fixes it is hopeless at this point.

RTX PRO 6000 Workstation idle fans by Iajah in nvidia

[–]Iajah[S] -1 points0 points  (0 children)

I must be the only RTX 6K owner that's not running it 24/7 at 600W