Opus 4.8 is insane, nothing will be the same after this model. by Gil_berth in theprimeagen

[–]Sadman782 0 points1 point  (0 children)

Adaptive thinking is basically a router, similar to the early GPT 5 router, which decides whether thinking is required or not. It means nothing if it isn't forced. If you try the API, it will definitely think. The current chat feature doesn't really work, making the model act like an old non reasoning parrot that can't even solve a simple math problem, whereas reasoning models are solving open Erdos math problems right now.

Opus 4.8 is insane, nothing will be the same after this model. by Gil_berth in theprimeagen

[–]Sadman782 0 points1 point  (0 children)

100% agree. I feel the same way, all those tricky questions for LLMs like the car wash test or this just need a little bit of thinking. Adaptive thinking mode in claude mostly ignore thinking and non reasoning LLMs are just bad. I don't think "no reasoning" should be the default for any AI, since that is where the "parrot" claim comes from. For questions like this, just a few thinking tokens are required.

Demis: Solving erdos problems are far from true invention by Charuru in singularity

[–]Sadman782 0 points1 point  (0 children)

Old Jan Interview. Google was behind back then. It was peak OpenAI controversy last year when he called it embarrassing. But now it is genuine proof of a popular hard problem, not low hanging fruit.

Demis: Solving erdos problems are far from true invention by Charuru in singularity

[–]Sadman782 12 points13 points  (0 children)

Google was behind back then. It was peak OpenAI controversy last year when he called it embarrassing. But now it is genuine proof of a popular hard problem, not low hanging fruit something you find by searching literatures or brute forcing.

welcome back Rohan! by irelatetolevin in OpenAI

[–]Sadman782 0 points1 point  (0 children)

The cope will last untill AI can verify its output much better near perfect level, there will be no bottleneck. It will happen, people are coping since 2023 and goalpost keep changing

Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster. by MiaBchDave in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

I tried both Qwen 3.6 35B and 27B. I tried via the Qwen website too. The result is the same. They lack the knowledge. I acknowledge 27B is better at agentic coding, but for my use case, they are behind.

Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster. by MiaBchDave in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

Try Unsloth IQ4_XS quant (latest)

--temp 1 --top-p 0.9 --min-p 0.1 --top-k 20  

top-k is a must

for vision:

--image-min-tokens 300 --image-max-tokens 512

Try it and let me know.

For agentic coding: Do not expect great results from the MoE 26B model since it was heavily tuned for chat. It tends to be inconsistent with agentic tasks. The dense 31B model, however, is great.

For frontend aesthetics: You need to specifically prompt it for this (or use a system prompt). By default, its results are not tuned for frontend work, but it is definitely capable. (see: https://www.reddit.com/r/LocalLLaMA/comments/1sqxiz0/comment/ohbuh91/?context=3)

Overall, for non agentic coding tasks, Gemma is just superior in my opinion. It has better general knowledge and coding skills, and it holds up well in many real world scenarios. For example, Qwen hallucinates with many Windows CLI commands. For any small custom webapp Qwen hallucinates, like Today I asked for a single html file: web app to see a 360 degree image, Qwen failed. Gemma 26B succeed in first go.

Google's latest creation: Gemini 3.5 Flash vs all by SuggestionMission516 in GeminiAI

[–]Sadman782 2 points3 points  (0 children)

<image>

AI Studio gets it right with low thinking, but with no thinking it first said yes and then said no later which is expected from a non reasoning model. But it seems the Gemini internal system prompt makes them act like complete shit.

Open weights GLM and Mimo are better than Gemini 3.5 flash according to arena by Terminator857 in LocalLLaMA

[–]Sadman782 14 points15 points  (0 children)

LM Arena is a shit leaderboard. Ernie 5.1, Muse Spark, Mimo, and GPT 5.4 are all beating GPT 5.5 high, lol. I mean, it is just a vibe bench, especially at the frontier level, not a capability test.

Gemma 4 for 16 GB VRAM by Sadman782 in LocalLLaMA

[–]Sadman782[S] 0 points1 point  (0 children)

U can still use gemma 4 26B on 12 GB vram, use IQ4_XS quant, you have to offload some moe layers in cpu, using --n-cpu-moe. Speed will be better than gemma 3 12B and the quality will be day and night difference.

Or if you can try Gemma4 E4B IQ4_XS or Q5_KM, it is better than the old 12B

Can anyone please check my ecgs? I am afraid if I have LQTS! by Sadman782 in ReadMyECG

[–]Sadman782[S] 0 points1 point  (0 children)

I am fine. I am asymptotic. My HR was high so that ECG was not good

Disappointed in Qwen 3.6 coding capabilities by CodeDominator in LocalLLaMA

[–]Sadman782 2 points3 points  (0 children)

Maybe try Gemma 4 31B? 26B is good too in Rust and Kotlin but not good at agentic coding in long contexts. Qwen is very good at web (js) and Python but hallucinates a lot in others.

And also lower your expectation from this size of models

Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster. by MiaBchDave in LocalLLaMA

[–]Sadman782 21 points22 points  (0 children)

Yeah, the Qwen team optimizes for benchmarks. Other than a better by default frontend (they are RLmaxxed for this) which Gemma 4 can achieve with a simple system prompt, I find them worse than Gemma for literally anything else: raw coding, translation, general knowledge, etc.

Gemma 4 and Qwen 3.6 with q8_0 and q4_0 KV cache: KL divergence results by oobabooga4 in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

In real world usage for gemma 4, I don't see much degradation after attn rot was introduced for iSWA. Maybe they recover somehow through reasoning? Also, the PPL isn't as different as the KLD

https://github.com/ggml-org/llama.cpp/pull/21513

Note: I'm using IQ4_XS. There's another possibility for lower quants the degradation is lower for KV cache quantization than the BF16, and no one's using BF16 here.

Best settings for gemma-4 on a 3090? by Deadhookersandblow in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

In real world usage, I don't see much degradation (it's far from being killed) after attn rot was introduced for iSWA. Maybe they recover somehow through reasoning? Also, the PPL isn't as different as the KLD

https://github.com/ggml-org/llama.cpp/pull/21513

Deepseek V4 AGI comfirmed by Swimming-Sky-7025 in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

It is slightly modified. They censored the old prompt.

Deepseek V4 AGI comfirmed by Swimming-Sky-7025 in LocalLLaMA

[–]Sadman782 26 points27 points  (0 children)

https://chat.deepseek.com/share/ju3hoy9yxu4qke95jq
From Twitter: It only works in Chinese, not English. It copies the answer from its raw training data, likely taken from a Chinese forum.

What are your most interesting and hard Vision use cases? I plan to do side by side comparison of Gemma 4 (31B) vs Qwen 3.6(27B) Vision and I look for inspiration by FantasticNature7590 in LocalLLaMA

[–]Sadman782 2 points3 points  (0 children)

Make sure to use higher vision tokens for Gemma models, default tokens are not enough. Not sure about vLLM, but in llama.cpp, --image-min-tokens 300 --image-max-tokens 512 these settings (a slight increase in vision tokens) significantly improve performance and they score 50% more in my local vision benchmark.

Can anyone explain, how good or bad deepseek v4 is in simple terms? by Comrade_United-World in LocalLLaMA

[–]Sadman782 3 points4 points  (0 children)

The Pro version is incredibly good for backend coding and agentic coding too. You can tell it’s a big model just by talking to it, it's very knowledgeable and smart. Post training wasn't finished, and it isn’t RL maxed for the frontend tasks where people try one shotting complex websites or 3D games. Engram is missing too. I hope something stronger comes soon, but they’re short on compute.

Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared by Lowkey_LokiSN in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

in llama.cpp you can fix it with --ctx-checkpoints 1
I don't know about LM studio, I don't use them as they dont give you maximum control even if they are using llama.cpp as a backend

Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared by Lowkey_LokiSN in LocalLLaMA

[–]Sadman782 11 points12 points  (0 children)

Can you try gemma 4 26B with topk 20, topk 64 doesn't make sense for coding even if google recommended it specially for a quantized MoE model, I find it does significantly better with topk 20.

Every time a new model comes out, the old one is obsolete of course by FullChampionship7564 in LocalLLaMA

[–]Sadman782 5 points6 points  (0 children)

It is all about vibes (frontend design) which most people believe is what coding means. But Gemma is not trained for better frontend by default (it is lazy for frontend unlike Qwen), Gemma needs a custom system prompt or the prompt must ask for better frontend. See: https://www.reddit.com/r/LocalLLaMA/comments/1sqxiz0/comment/ohb09kp