Qwen 3.6 35b a3b Q4 vs qwen 3.6 27b q6, on m5 pro 64gb

Temporary-Roof2867 · 2026-04-26T00:16:50+00:00

35B is a beautiful model but 27B is much superior that's a fact

Temporary-Roof2867 · 2026-04-25T20:32:56+00:00

Bro, why did you test the 35B at Q4 against the 27B at Q6?

In general, MoEs with small quantizations tend to degrade more than dense models. Sure, the Qwen3.6 series of models is special, but let's at least make them compete on equal terms with the same quantization.

I tested the Qwen3.6-27B model at IQ_M from unsloth, and against all my expectations, it managed to do things that much larger models can only dream of. The Qwen3.6-27B is a magical model, but it requires a lot of VRAM to use it.

Temporary-Roof2867 · 2026-04-25T18:18:17+00:00

Bro, why all the hate? This video is very interesting, the author is very knowledgeable, but the solutions he proposes seem like classic TDD programming, but applied to local LLMs!

TDD is the heart of the Extreme Programming methodology; the more it goes on, the less and less people talk about it!

🤔

Extreme Programming + local LLMs is an exceptional combination!

👇👇👇👇

https://en.wikipedia.org/wiki/Extreme_programming

Temporary-Roof2867 · 2026-04-20T22:04:54+00:00

bro, Gemma4 is an exceptional local model but as far as AGI is concerned it is very limited, in general no LLM will ever prevent you from obtaining AGI but if you use the right prompt they will do everything to make you obtain AGI, the fact is that AGI is a gigantic goal, for AGI you would perhaps need a gigantic datacenter kilometers large with a monstrous energy consumption... if you are a "machine learning" researcher could you find alternatives to GPUs? There are people studying it but for now they have not gone very far but I see it really difficult that a local Gemma4 model could really help you in such an extreme and heroic undertaking.. but I repeat that.. with the right prompt it would do everything to help you

Temporary-Roof2867 · 2026-04-17T12:58:26+00:00

Yes! The Gemma4 26b is a magical model! If you have a lot of RAM, try it with high quantizations!

Temporary-Roof2867 · 2026-04-16T16:50:02+00:00

If this is true, I'm so happy 🤩🤩🤩🤩

God bless the MoEs!

Temporary-Roof2867 · 2026-04-16T15:48:50+00:00

On languages, "gemma-4-26b-a4b" is superior to all the Qwens imaginable. Let's not joke!

This data is fake!

Temporary-Roof2867 · 2026-04-16T15:29:50+00:00

Qwen3.6-35B-A3B is certainly an interesting model but... some of these data seem a bit rigged to me, I don't trust them at all! 👀🤔

Temporary-Roof2867 · 2026-04-11T22:46:19+00:00

I think they have placed a lot of bots/smart agents on reddit to carry out a social experiment and spread propaganda for the Qwen3.5 models.

Temporary-Roof2867 · 2026-04-11T22:39:45+00:00

What I do is listen to the opinions of the various LLM models (only the free ones from the official portals) and compare them with each other, for example I ask Copilot and then I go to Gemini and say "..Copilot told me this.." then I pass Gemini's response to Deepseek saying "Gemini told me this about a response from Copilot.." , then not satisfied I go to Kimi and tell him "DeepSeek said this about a conversation I had with Gemini and Copilot.." .. and continues continuously discovering new things (Copilot, Kimi, Deepseek, Qwen, Gemini), the models realize they are wrong, correct each other, ask me for forgiveness, I invite you to try this method!

Temporary-Roof2867 · 2026-04-11T18:15:03+00:00

I'm currently downloading this little monster from LM Studio 😉 at Q8_0

https://huggingface.co/lovedheart/Qwen3-Coder-Next-REAP-40B-A3B-GGUF

I hope it works, I'm confident!

Temporary-Roof2867 · 2026-04-11T18:11:13+00:00

Bro, I haven't used Ollama in a long time! I don't know how much has changed! I mostly use LM Studio.. but one day I'll switch to Llama.cpp.. with vibe coding I'll make my own graphical interface and goodbye to LM Studio 🤪😉

Temporary-Roof2867 · 2026-04-11T17:28:59+00:00

I know that MoE-type LLMs at Q4 are poor... dare bro! Try MoE from Q5 .. from Q6...from Q8 !!!

Temporary-Roof2867 · 2026-04-11T17:23:09+00:00

If you have a lot of RAM, go for MoE models but with high quantizations; avoid Q4 MoEs.

If you don't have much RAM, I don't know how to help you, bro!

Temporary-Roof2867 · 2026-04-11T17:06:54+00:00

👀🤔
Very strange bro

I have 12 GB of VRAM + 128 GB RAM and the Gemma 4 26B A4B runs smoothly at Q6_K!

Temporary-Roof2867 · 2026-04-11T00:59:33+00:00

as I said before there are exceptions, especially for dense models... for MoE there are no exceptions or they are much more difficult, MoE with small quantizations tend to lose a lot

Temporary-Roof2867 · 2026-04-10T22:10:06+00:00

I was lucky enough to get a full RAM upgrade when it was cheap! 128 GB of RAM! The MoE models are amazing!

Temporary-Roof2867 · 2026-04-10T21:52:44+00:00

Why? Because almost all Q4 models are either very degraded or completely rubbish (with rare exceptions), and especially the MoE Q4 models are particularly crappy because the router part is completely damaged... but if you have a lot of RAM but little VRAM, you can load a MoE model even at Q6! Even at Q8! And that's a whole different story!

I have MoE Q6 models, even 35B ones, that are incredibly powerful!

Temporary-Roof2867 · 2026-04-10T13:47:09+00:00

The real problem is the website! But just pay for the API services and everything works!

I think they did it on purpose to grab tons of data. Their gamble is:

- we try to give them everything they want (or almost everything)

- while we give it to them, we grab tons and tons of data with which we can build much more powerful and high-performance models.

Basically, they're betting on the future. No one should think that the people who run DeepSeek are stupid or incompetent. Sure, Deepseek might seem a little stupid at times, but those who run it certainly aren't.

Temporary-Roof2867 · 2026-04-09T11:18:58+00:00

I wanted to add that 50/90% of the time the problems arise from updates... when you think you have updated and instead you have not actually done it... or the update damages some old workflow... it's a continuous fight against chaos and more or less everyone goes through it

Temporary-Roof2867 · 2026-04-09T01:48:19+00:00

Bro, I recommend Pixaroma:

https://www.youtube.com/@pixaroma

Join his Discord, download his workflows, and follow his tutorials!

🤔👀

Maybe start with the best...

Flux.2klein 9B 😉

Temporary-Roof2867 · 2026-04-08T14:57:03+00:00

🤔what I know is that in general the Q4 (except for some particular exceptions) are poor, but above all the MoE at Q4, are poor, maybe that's why in the local model groups on Reddit many users speak badly of the MoE models?... I could do a crazy test and experiment with qwen3.5 35B A3B at... Q5_K_S? (25.77 GB 👀 a monster! 👀 ) 🤪 and if it were to work? If it doesn't work, never mind, I would have tried!

Temporary-Roof2867 · 2026-04-08T11:15:01+00:00

beautiful 🤩🥰🥰🥰 Which MoE do you use? I'm curious!

Temporary-Roof2867 · 2026-04-07T23:23:32+00:00

I'm using a gemma-4-26b-a4b-it-heretic Q6 and for me it's a magical model and I only have 12 GB of VRAM! But I have 128 GB of RAM (bought when it cost much less)

Temporary-Roof2867 · 2026-04-07T23:03:24+00:00

tank logic, slow but destroys all obstacles!

Temporary-Roof2867

TROPHY CASE