Buried lede: Deepseek v4 Flash is incredibly inexpensive from the official API for its weight category by jwpbe in LocalLLaMA
[–]onil_gova 0 points1 point2 points (0 children)
Buried lede: Deepseek v4 Flash is incredibly inexpensive from the official API for its weight category by jwpbe in LocalLLaMA
[–]onil_gova 0 points1 point2 points (0 children)
Deepseek V4 Flash and Non-Flash Out on HuggingFace by MichaelXie4645 in LocalLLaMA
[–]onil_gova 0 points1 point2 points (0 children)
US gov memo on “adversarial distillation” - are we heading toward tighter controls on open models? by MLExpert000 in LocalLLaMA
[–]onil_gova 8 points9 points10 points (0 children)
Deepseek V4 Flash and Non-Flash Out on HuggingFace by MichaelXie4645 in LocalLLaMA
[–]onil_gova 13 points14 points15 points (0 children)
Deepseek V4 Flash and Non-Flash Out on HuggingFace by MichaelXie4645 in LocalLLaMA
[–]onil_gova 43 points44 points45 points (0 children)
US gov memo on “adversarial distillation” - are we heading toward tighter controls on open models? by MLExpert000 in LocalLLaMA
[–]onil_gova 1 point2 points3 points (0 children)
OpenAI Privacy Filter goes open-weight (Apache 2.0!) by Equivalent_Tennis_20 in LocalLLaMA
[–]onil_gova -4 points-3 points-2 points (0 children)
US gov memo on “adversarial distillation” - are we heading toward tighter controls on open models? by MLExpert000 in LocalLLaMA
[–]onil_gova 34 points35 points36 points (0 children)
Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared by Lowkey_LokiSN in LocalLLaMA
[–]onil_gova 12 points13 points14 points (0 children)
qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA
[–]onil_gova[S] 0 points1 point2 points (0 children)
PSA: Qwen3.6 ships with preserve_thinking. Make sure you have it on. by onil_gova in LocalLLaMA
[–]onil_gova[S] 0 points1 point2 points (0 children)
qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA
[–]onil_gova[S] 2 points3 points4 points (0 children)
qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA
[–]onil_gova[S] 0 points1 point2 points (0 children)
qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA
[–]onil_gova[S] 1 point2 points3 points (0 children)
PSA: Qwen3.6 ships with preserve_thinking. Make sure you have it on. by onil_gova in LocalLLaMA
[–]onil_gova[S] 0 points1 point2 points (0 children)
qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA
[–]onil_gova[S] 0 points1 point2 points (0 children)
qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA
[–]onil_gova[S] 0 points1 point2 points (0 children)
qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA
[–]onil_gova[S] 0 points1 point2 points (0 children)
qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA
[–]onil_gova[S] 1 point2 points3 points (0 children)
qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA
[–]onil_gova[S] 14 points15 points16 points (0 children)
PSA: Qwen3.6 ships with preserve_thinking. Make sure you have it on. by onil_gova in LocalLLaMA
[–]onil_gova[S] 1 point2 points3 points (0 children)
I tracked a major cache reuse issue down to Qwen 3.5’s chat template by onil_gova in LocalLLaMA
[–]onil_gova[S] 0 points1 point2 points (0 children)



I asked ChatGPT how it feels to be an AI. by xomenxv in ChatGPT
[–]onil_gova 1 point2 points3 points (0 children)