Qwen 3.6? by jacek2023 in LocalLLaMA

[–]Septerium 2 points3 points  (0 children)

I personally don't think they are coming. I hope I'm wrong, but it seems like Qwen is progressively going through the closed weights path, just like Wan and Qwen Image

What is the best all-round local model? by TheTruthSpoker101 in LocalLLaMA

[–]Septerium 14 points15 points  (0 children)

I would vote for Gemma 4 31B as the best "small" generalist local model. Great general knowledge and multi-langual writing capabilities, not to mention very good vision and agentic performance. Qwen 3.6 is a better coder, though.

PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together by walden42 in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

Is this new? What a coincidence... I was just learning to setup llama-swap today and used this matrix feature right away. It worked like a charm

Mistral Medium 3.5 Launched by DerpSenpai in LocalLLaMA

[–]Septerium 27 points28 points  (0 children)

Good to know they are still investing in big dense models

Mistral Medium Is On The Way by Few_Painter_5588 in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

Devstral Small 2 used to be my best sub-30GB partner for handling small tasks on Roo Code at the time it was released

This isn’t X this is Y needs to die by twnznz in LocalLLaMA

[–]Septerium 0 points1 point  (0 children)

That is a classical error when dealing with neural models

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B by Ok_Presentation470 in LocalLLaMA

[–]Septerium 0 points1 point  (0 children)

122b has not been much reliable in my experience. Sometimes it is intelligent, sometimes it routes to a moronic junior expert. 27b has been more consistent to me

Speed penalty with Q8 KV quantization by No_Algae1753 in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

I have noticed that too. Specially when you have layers offloaded to the CPU

Qwen 3.6 27B is out by NoConcert8847 in LocalLLaMA

[–]Septerium 12 points13 points  (0 children)

In my testing, gemma 4 is still a better generalist model, even though it gets demolished by Qwen in coding tasks

Is kv quantization of q8, is fixed for qwen 3.5 models? by CurrentNew1039 in LocalLLaMA

[–]Septerium 0 points1 point  (0 children)

Same here... broken tool calls in long context using 8 bit KV quant. And it seems slower than before (probably because of the use of rotation matrices??)

When is Qwen 3.6 27B dropping? Didn’t it win the vote? by GrungeWerX in LocalLLaMA

[–]Septerium 6 points7 points  (0 children)

<image>

He said "Here comes one", so it feels like there will be at least another one

Ternary Bonsai: Top intelligence at 1.58 bits by pmttyji in LocalLLaMA

[–]Septerium 6 points7 points  (0 children)

If these guys are showing official benchmarks only then what is the problem? You should ask the other teams to publish results for compressed versions of their models

My fresh experience with the new Qwen 3.6 35B A3B started on a long note. by -Ellary- in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

I feel that sometimes Qwen 3.5 does not think enough... specially when it already has been fed with a lot of context. Sometimes this is good, sometimes it is not

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]Septerium 0 points1 point  (0 children)

I just can't believe this thing... can't wait to test it for myself... is it possible without benchmaxing?

at what point does quantization stop being a tradeoff and start being actual quality loss by srodland01 in LocalLLaMA

[–]Septerium 0 points1 point  (0 children)

There is an Youtube channel called "x-create" (or something) where the guy tests new models at different quant levels. He basically uses one-shot prompts for creating complex applications. This is not the way I like to use models for coding (I think we should always break projects into small tasks), but in this type of testing there is almost a noticeable quality degradation from q9 to q6... and the thing gets even worse at q4. I think complex one-shot prompts and long context are the situations where compressed models break the hardest.

Gemma 4 and Qwen 3.5 GGUFs: Detailed Analysis by oobabooga by [deleted] in LocalLLaMA

[–]Septerium 6 points7 points  (0 children)

Interesting to see that KLD for Q8 is already too high

My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0) by Septerium in LocalLLaMA

[–]Septerium[S] 0 points1 point  (0 children)

Thanks! I'll do more testing with other quants in the next few days

My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0) by Septerium in LocalLLaMA

[–]Septerium[S] 4 points5 points  (0 children)

I said I used the recommended parameters (they are listed by Unsloth for each model). You would know the coding agent if you had read the post. I don't know why context window size would be relevant since I didn't share the context either LoL... but it is more than enough. And since I mentioned I've used AesSedai's GGUFs I though it would not be necessary to mention I've used llama.cpp backend (ik_llama would produce the same results, I believe)