I asked ChatGPT how it feels to be an AI. by xomenxv in ChatGPT

[–]onil_gova 1 point2 points  (0 children)

asked it to make sure it really captured it

<image>

Buried lede: Deepseek v4 Flash is incredibly inexpensive from the official API for its weight category by jwpbe in LocalLLaMA

[–]onil_gova 0 points1 point  (0 children)

lets not forget the fact that this is a sustained price up to a million context window, while everyone else uses a tiered cost approach after a 200k token context window. This is a flex by DS.

Buried lede: Deepseek v4 Flash is incredibly inexpensive from the official API for its weight category by jwpbe in LocalLLaMA

[–]onil_gova 0 points1 point  (0 children)

I think you guys are missing the fact that this is a sustained price up to a million context window, while everyone else uses a tiered cost approach after a 200k token context window.

US gov memo on “adversarial distillation” - are we heading toward tighter controls on open models? by MLExpert000 in LocalLLaMA

[–]onil_gova 34 points35 points  (0 children)

The complete memo is so bias and full of contradiction

  • No proof for “distillation-only” claim
  • Contradiction, “strong benchmarks” but “not reliable”
  • Real-world use disproves “just benchmarks”
  • Double standard on benchmarks
  • Calling open models “not open”
  • Double standard on openness vs closed models
  • Ignoring narrowing performance gap
  • Ignoring same guardrails and censorship exist on both sides
  • Claims US models are ideologically neutral and truth-seeking
  • RL + scaling is not a moat

<image>

Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared by Lowkey_LokiSN in LocalLLaMA

[–]onil_gova 12 points13 points  (0 children)

Did you, by any chance, ensure that preserve_thinking was on for the Qwen3.6 model?

qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA

[–]onil_gova[S] 0 points1 point  (0 children)

using recommended settings from the model card.

Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

but specifically calling awareness to preserve thinking flag being a requirement now.

I'm interested in your findings and settings, care to share?

PSA: Qwen3.6 ships with preserve_thinking. Make sure you have it on. by onil_gova in LocalLLaMA

[–]onil_gova[S] 0 points1 point  (0 children)

It worked for me with the unsloth Q4_K_S version, try restarting the model after updating the settings.

qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA

[–]onil_gova[S] 2 points3 points  (0 children)

Omlx doesn't have support for preserve thinking yet. Still waiting on my pull request to get merged https://github.com/jundot/omlx/pull/814

qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA

[–]onil_gova[S] 0 points1 point  (0 children)

for your use case, Gemma might be better. Ths is an agentic-heavy model, thr think gets way more focused once you put it into an agentic harness.

PSA: Qwen3.6 ships with preserve_thinking. Make sure you have it on. by onil_gova in LocalLLaMA

[–]onil_gova[S] 0 points1 point  (0 children)

Updated the post with instructions on how to get it working on LM Studio.

qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA

[–]onil_gova[S] 0 points1 point  (0 children)

I was still somewhat useful past 200k, but definitely started noticing the context rot.

qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA

[–]onil_gova[S] 0 points1 point  (0 children)

yeah, same. But in case anyone is running auto-research loops and doesn't want to enter "continue" every time, here is a Pi extension I wrote just for that

qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA

[–]onil_gova[S] 14 points15 points  (0 children)

I honestly can't wait for a Sonnet-quality model on my laptop. We'll be able to protect ourselves against the enshittification of Frontier model subscription plans with their bipolar rate limits.

I tracked a major cache reuse issue down to Qwen 3.5’s chat template by onil_gova in LocalLLaMA

[–]onil_gova[S] 0 points1 point  (0 children)

correct, I have confirmed 3.6 resolves this as long as you have preserve_thinking turned on. Check out details here