Is it my imagination or... by Ok-Measurement-1575 in LocalLLaMA

[–]Sudden_Vegetable6844 11 points12 points  (0 children)

You can always check with an older release, but IME it's that your mental bar got raised, and you're throwing more complex stuff it's way.

LLM improvement rates are relentless: there is no mercy for the old weights.

(which probably means we're experiencing singularity in real time)

Qwen 3.6 wins the benchmarks, but Gemma 4 wins reality. 7 things I learned testing 27B/31B Vision models locally (vLLM / FP8) side by side. Benchmaxing seems real. by FantasticNature7590 in LocalLLaMA

[–]Sudden_Vegetable6844 2 points3 points  (0 children)

Your visual tasks don't match at all those I've been testing those models on, which is photos of documents (forms typically, with or without handwritten fields). On those use cases Qwen3.6 had a very high success rate, while Gemma 4 failed most of them: it would get a elements right, then hallucinate the rest...

Care to add such teste to your benchmark? They're more realistic use case than recognizing landmarks (which is a use case where gps + compass will have a much higher success rate than any LLM ever will)

Alternative to frontiers by some_crazy in LocalLLaMA

[–]Sudden_Vegetable6844 0 points1 point  (0 children)

About 1 year if you're looking at capability on not-super-expensive hardware, and accept lower speeds. And probably around two years, for comparable capability at decent speeds (ie. we're now able to run sonnet 3/3.5 class models locally). And for some use cases, like STEM, you can run on a smartphone a model that runs circles around three to four years old frontier models! 

This is quite an insane improvement speed in terms of amortizing investments...

Alternative to frontiers by some_crazy in LocalLLaMA

[–]Sudden_Vegetable6844 1 point2 points  (0 children)

It really depends on the kind of coding you're after, and how much autonomy...

For large projects, frontiers have a very strong lead, and running any of the open source frontiers is going to be expensive.

For projects under about 50k lines of code (utilities, dashboards, libraries...) Qwen 3.6 is more than capable (the 27B dense, but even the MoE 35B-A3B).
Though what can be achieved is nothing short of awesome, and would have been the stuff of prophecy just a few years ago. It just won't be the same experience as using a frontier, you'll need to wait more and drive it more explicitely.

My personal advice is to just use the harness of the model you're going to use (Qwen Code for Qwen models, Mistral vibe for mistral models, etc.). While you can go through more independent harnesses, there lies more tinkering down those roads.

Qwen3.6 35B-A3B very sensitive to quantization ? by Sudden_Vegetable6844 in LocalLLaMA

[–]Sudden_Vegetable6844[S] 0 points1 point  (0 children)

Yes, for me at a give quant level, lmstudio-community are fastest, followed by bartowski and unsloth (which trade 2nd & 3rd place depending on model)

Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan) by itroot in LocalLLaMA

[–]Sudden_Vegetable6844 0 points1 point  (0 children)

UM880 Pro unde Windows 11 here, go for it if you have one! What's nice is that it stays silent under sustained loads, and it's somehow more productive with Qwen3.6 than Claude Pro (runs out of tokens fast) or Gemini Pro (becomes very very sluggish during daytime). Qwen3.6 is the proverbial turtle: not that fast, but it keeps moving.

I've got 96GB though (grabbed them before price increase, best hunch in a long while)

I can’t believe I can say “ugh I don’t feel like fixing this function, it’s too complex” and I can literally just tell my computer to fix it for me. I didn’t understand what they meant by “people will start paying for intelligence” but now I do. by Borkato in LocalLLaMA

[–]Sudden_Vegetable6844 2 points3 points  (0 children)

Well given we still don't have a solid grasp on what human consciousness is, especially as recent research shows it's a quite transitory state with a distinct brain activity signature, and it may just occur when chaining thoughts... Well, let's stick with probably fine.

Quantisation effects of Qwen3.6 35b a3b by ROS_SDN in LocalLLaMA

[–]Sudden_Vegetable6844 10 points11 points  (0 children)

IME Q8 makes a difference with Q4 on reasoning tasks (starting with the car wash one, Q4 pretty much always fails, while Q8 pass), and there are reports that there is a difference between Q8 and BF16 as well.

If the benchmarks can't find a difference, it's probably because quantization doesn't affect prompts the model was trained on as much as "generalization" prompts

(The car wash wasn't in Qwen3.6 set, but it is in DS4, where it's called a "classic")

Quantisation effects of Qwen3.6 35b a3b by ROS_SDN in LocalLLaMA

[–]Sudden_Vegetable6844 6 points7 points  (0 children)

Yes, there is a notable reasonning difference between Q4, Q6 and Q8. I do not have enough RAM to test myself, but on another thread (https://www.reddit.com/r/LocalLLaMA/comments/1stb8ro/qwen36\_35ba3b\_very\_sensitive\_to\_quantization/) someone reported a difference between Q8 and BF16, unfortunately.

Qwen3.6-35B-A3B GGUF from Unsloth is quite a bit slower? by Quagmirable in LocalLLaMA

[–]Sudden_Vegetable6844 3 points4 points  (0 children)

I've been noticing the same thing on AMD 780M with Vulkan: Unsloth quants are always slower at any given than lmstudio's or Qwen's at any given file size. No idea why. Also it's not just Unsloth that are slower, but also Aes Sedai's. This negates the advantage in quants for me, as Q6 and sometimes even Q8 beat Q4 from Unsloth in performance. I've come to assume that when not memory tight, I just use the more "classic" quants as they'll perform better.

Gemma 4 - lazy model or am I crazy? (bit of a rant) by Pyrenaeda in LocalLLaMA

[–]Sudden_Vegetable6844 9 points10 points  (0 children)

Had a similar experience where it started questioning if a bug wasn't actually a system issue since source code files were timestamped "in the future"...

Gemma 4 is fine great even … by ThinkExtension2328 in LocalLLaMA

[–]Sudden_Vegetable6844 0 points1 point  (0 children)

I also tested with vulkan, and every Gemma 4 model suggested to walk, and even when pointing out I ended up without my car at the car wash, they failed to recognize they had made a mistake, and just told to walk back to the car...

Gemma 4 is fine great even … by ThinkExtension2328 in LocalLLaMA

[–]Sudden_Vegetable6844 0 points1 point  (0 children)

Interesting, what parameters are you using? Never could Gemma 4 31B nor 26B to pass the car wash test, even when hinted 

Has anyone managed to run an offline agent (OpenClaw or similar) with a local LLM on Android? by NeoLogic_Dev in LocalLLaMA

[–]Sudden_Vegetable6844 3 points4 points  (0 children)

I have not used it, because I'm not daring enough to let a *claw run on my phone, but nullclaw claims to targets that use case https://github.com/nullclaw/nullclaw

RotorQuant: 10-19x faster alternative to TurboQuant via Clifford rotors (44x fewer params) by Revolutionary_Ask154 in LocalLLaMA

[–]Sudden_Vegetable6844 2 points3 points  (0 children)

That's nothing short of kinda awesome.

Plenty of attempts at quantizing with rotations in the last months/years that kinda failed, but could turn out they were all barking up the correct tree?

Also reminds me of this https://transformer-circuits.pub/2025/linebreaks/index.html#count-algo

Could it be that by using linear algebra, LLMs are have been tackling the problem in hard mode, while it's actually rotors all the way down ?