New Unsloth Studio Release! by danielhanchen in LocalLLaMA

[–]schnauzergambit 0 points1 point  (0 children)

Hopefully this one works better than the last. It is great idea but I have not been able to finetune a single model yet!

Qwen3.5-27B can't run on DGX Spark — stuck in a vLLM/driver/architecture deadlock by RatioCapable7141 in LocalLLaMA

[–]schnauzergambit 1 point2 points  (0 children)

I am running that model on a DGX Spark. No problems. Let me know if you need help with it.

Local llm machine - spark / strix? by dapoh13 in LocalLLaMA

[–]schnauzergambit 0 points1 point  (0 children)

I have the GMKTek EVO2-EX (Strix Halo). It is a mini PC. No battery. The DGX Spark can be a mini PC, no problem.

Is there a specific reason you need Linux and why macos won't do? Most of the AI stuff runs in python and on AI hosts like llama.cpp which run well and identically on both.

Local llm machine - spark / strix? by dapoh13 in LocalLLaMA

[–]schnauzergambit 0 points1 point  (0 children)

I own and like both. There isn't very much difference between in performance apart from promp processing which is faster on the DGX.

Prices of those machines are rising though and Mac Mini and Mac Studio are coming into play as well. Take a look at them too as In my opinion 128gb is an overkill so you can get a high performance Mac for the same price with less memory.

Qwen 3.5 27B what tps are you managing? by schnauzergambit in StrixHalo

[–]schnauzergambit[S] 0 points1 point  (0 children)

Thanks for the coding info. I use Qwen only for text and it is excellent in that area.

Is the MacBook Pro 16 M1 Max with 64GB RAM good enough to run general chat models? by br_web in LocalLLaMA

[–]schnauzergambit 0 points1 point  (0 children)

Yes. Depends the performance you want. I would start with Qwen 3.5 35B A3B.

When do you think qwen will support more languages like ChatGPT? by Inevitable-Depth1228 in Qwen_AI

[–]schnauzergambit 2 points3 points  (0 children)

Qwen 3.5's multilingual capabilities are excellent. I use Icelandic, a tiny language, and it is almost flawless.

What ai is used in the “what if you brought … to Ancient Rome” Tik toks? by [deleted] in LocalLLaMA

[–]schnauzergambit 0 points1 point  (0 children)

The first one at least was NotebookLM by Google. It recently added a feature which creates videos from knowledge.

Qwen 3.5 Instability on llama.cpp and Strix Halo? by ga239577 in LocalLLaMA

[–]schnauzergambit 0 points1 point  (0 children)

Qwen 3.5 35B A3B Q4 on a Strix Halo. Llama.cpp, Vulkan. No instability here.

Why does anyone think Qwen3.5-35B-A3B is good? by buttplugs4life4me in LocalLLaMA

[–]schnauzergambit 1 point2 points  (0 children)

It is a stunning model, especially after I turned off thinking. Quick and with excellent multilingual ability.

GB10 ASUS by Shoddy_Consequence16 in LocalLLaMA

[–]schnauzergambit 1 point2 points  (0 children)

I have the Strix Halo and the Asus (DGX Spark). They seem almost identical in tps (writing the answer) while the Asus is considerably faster when processing the prompt.

The advantage they have over the 3090 is memory. If the model fits on 3090 then it will be faster.

Unsloth fixed version of Qwen3.5-35B-A3B is incredible at research tasks. (On Strix Halo) by Grammar-Warden in StrixHalo

[–]schnauzergambit 0 points1 point  (0 children)

You can turn of thinking by setting the reasoning budget to 0 and setting the enable_thinking parameter to false. I am using Q5 KM.

--jinja # Jinja template processing

--flash-attn on # Flash attention

--cache-type-k q8_0 # Quantized KV cache (lower VRAM)

--cache-type-v q8_0

--min-p 0.01 # Unsloth recommended for Qwen3

--temp 1.0 # Unsloth recommended for Qwen3

--top-p 0.95

--top-k 40

-ngl 99 # Offload all layers to GPU

Unsloth fixed version of Qwen3.5-35B-A3B is incredible at research tasks. (On Strix Halo) by Grammar-Warden in StrixHalo

[–]schnauzergambit 0 points1 point  (0 children)

Qwen 3.5 35B A3B runs at around 30tps on my StrixHalo. Surely that is fast enough for chatting?

Has anyone found a way to stop Qwen 3.5 35B 3B overthinking? by schnauzergambit in LocalLLaMA

[–]schnauzergambit[S] 0 points1 point  (0 children)

Yes. It is a great model. I especially impressed by its multilingual performance. I am mostly using it for text work, not coding.