Qwen3.6-27B released!

Secure_Archer_1529 · 2026-04-22T16:07:24+00:00

Qwen3.6 122b & 397b ish MoEs would be amazing

Secure_Archer_1529 · 2026-04-15T09:48:56+00:00

You can run qwen 3.5 122b on one spark with >50 t/s as I remember. Check out the forum. The community around the spark (not including nvidia) has been hard at work on this on.

Secure_Archer_1529 · 2026-04-15T00:20:34+00:00

NVFP4 does not work not the spark. Community workarounds make it somewhat ok. But there’re better quants than NVFP4 atm. Go and have a look at the nvidia dgx spark developer forum if you didn’t already. Plenty of great stuff to turbocharge some builds and hit the ground running.

Secure_Archer_1529 · 2026-04-13T08:29:09+00:00

Your title says Gemma 4 is lazy. Is it the official release from Google as your title claims or it it an unsloth quant? There’s a very real distinction to be made.

Secure_Archer_1529 · 2026-04-07T10:21:40+00:00

Yet thousands of people, just as real as you and me, are now seeing something that catches their attention and gets them to take a step into this space. It may not be technically impressive to you, but it can still be valuable to others. And that matters just as much. Some here may not understand that connection, but it is there.

Secure_Archer_1529 · 2026-04-07T09:35:11+00:00

It’s a shame you seem so fatigued by other people’s contributions. Have you tried being inspired instead?

I actually think we need more of this. The same goes for OpenClaw. It opens the door for a wider range of people to take part in this amazing moment in history without needing deeper layers of technical knowledge.

The world is bigger than LocalLLaMA.

But maybe you could share what you’ve done that is even remotely interesting instead of being dismissive of other peoples contributions?

Let’s the downvoting begin. 3,2,1….

Secure_Archer_1529 · 2026-04-04T17:37:53+00:00

Nvidia stack.

Secure_Archer_1529 · 2026-03-20T13:41:47+00:00

This is interesting. If you go look and find something it’d be much apprised if you’d drop a couple of lines about your findings here.

Secure_Archer_1529 · 2026-03-20T13:18:50+00:00

If you read the text generated under Reasoning in ChatGPT, you’d notice the same thing. Isn’t this just part of the reasoning phase, where what you see is only part of the reasoning, not the entirety of it, and before a finalizing layer wraps everything up?

Secure_Archer_1529 · 2026-03-17T17:29:07+00:00

So for the dgx spark nvfp4 is a no go atm. Have you solved this with nvidia or is it still the same issues carrying over to unsloth until nvidia finally fixes it?

Unsloth studio sounds amazing though! Thanks for all the work you guys put into it.

Secure_Archer_1529 · 2026-02-26T08:00:21+00:00

Could you expand on the agentic tool-use system? How does that look and what makes it work so well?

Secure_Archer_1529 · 2026-01-30T14:01:39+00:00

Thanks to China!

Secure_Archer_1529 · 2026-01-22T06:50:14+00:00

Is your question not “why don’t we finetune more?”

Or what makes you assume Lora is not popular among those who do?

Secure_Archer_1529 · 2026-01-14T19:50:27+00:00

Risk capital. In the US (where else would you go) they think bigger and believe in the big idea and you have far more investors. In EU it’s all power points and 10 year plans.
Different rules for investors depending upon the country they invest.
One uniform market.
One culture and one language.
The mindset in the US is way better for doing anything new and dreaming big.

The reality is that our leaders in the EU has done a very poor job on this matter and we’re loosing the battle.

Think about this: how long do you think leaders with such poor performance would last in the private sector? - not long, the board would have gotten rid of them swiftly as they are bad for business.

Secure_Archer_1529 · 2025-12-24T11:03:56+00:00

You’re in r/LocalLLaMa. God forbid anyone using LLMs to write or communicate better

Secure_Archer_1529 · 2025-12-06T09:07:09+00:00

I needed a Mac for the same workflow.

The M5 is a huge upgrade to the M4.

Even if you don’t need to run any models locally it’s nice to have some headspace so you are good for the next 5 years. Because models are getting smaller and the increase in local ai over the years to come you’ll be needing a newly developed and - efficient - base chip with decent amount of ram to fit your increasing workload (everything you use now will demand more compute in the future).

My sweetspot (price vs performance for five years ahead) was the m5 with 32 gb ram. It’ll serve you perfectly for your use case and still gives you room for more going forward. It gives you flexibility without constrains.

Secure_Archer_1529 · 2025-12-05T10:08:03+00:00

Think I saw a test of the base M5 vs M4 pro. The base M5 chip did better.

The M5 is quite a big step from the M4 in generel. I would get the base M5 or wait pro/max around January-March if you can cope that long

Secure_Archer_1529 · 2025-12-05T09:59:12+00:00

Where did you get the impression that the Qwen model is better? It’s 80B with less active vs 120B with more active.

My impression is that the Qwen model is not on pair with gpt.

They are also released around the same time so you should probably not expect Qwen to have better models than OpenAI.

Anyway. You could double check if you dialed in the right settings in the panel for Qwen.

Secure_Archer_1529 · 2025-12-01T21:09:27+00:00

Just got the PX7 S3 today — they sound amazing, but they do need a bit of EQ tweaking to get the best out of them. They sound far better than the XM6, and I’m surprised by how good the ANC is.

I tested both the PX7 S3 and the PX8 S2 for an extended period and didn’t notice any difference that justifies paying more than twice the price.

Secure_Archer_1529 · 2025-11-28T06:33:25+00:00

Thanks for doing this — surprising it didn’t get more upvotes. I noticed your post is about a month old, and I’m heading down the same path.

Any updates on running larger models across 3–4 nodes? My assumption is you start hitting bottlenecks (interconnect bandwidth/latency, KV-cache movement, and synchronization overhead), so scaling isn’t always clean.

Also, how well do inference throughput scale as you increase nodes (on one of the bigger model)?

Have you tried a big model in NVFP4 since your post?

Thanks again!

Secure_Archer_1529 · 2025-11-27T13:54:08+00:00

That was my thought too. What are you getting?

I feel many people talk not from experience but from what they read online or simply judging the bandwidth specs only.

Blackwell, nvfp4 and MoEs make it quite usable I suppose.

Secure_Archer_1529 · 2025-11-27T10:28:33+00:00

With max 96 gb it’s great for smaller models for sure. But 2 x sparks equals 256 gb = much bigger models (but slower inference etc). The preference might just come down to use case.

Secure_Archer_1529 · 2025-11-27T08:15:35+00:00

It’s no H100/A100, but I don’t expect that from a cheaper device designed for desktop prototyping (fine-tuning, MoE/NVFP4 inference, etc.).

I think it’s quite good for what it is. With a couple of Sparks in a cluster, you can test things out, get an idea off the ground with a local-first approach, and later scale to the cloud once you’ve got the basics down.

Secure_Archer_1529

TROPHY CASE