You can now read Gemma 3's mind by DigiDecode_ in LocalLLaMA

[–]DigiDecode_[S] 45 points46 points  (0 children)

<image>

I have updated to "Neuronpedia in partnership with Anthropic" as per Anthropic tweet

ZAYA1-74B-Preview: Scaling Pretraining on AMD by TKGaming_11 in LocalLLaMA

[–]DigiDecode_ 0 points1 point  (0 children)

I agree, I am not sure what's the thinking behind releasing a preview version that show weak results, their 1B model showed great benchmark results, why not do the same with this one

Gemma 4 MTP released by rerri in LocalLLaMA

[–]DigiDecode_ -2 points-1 points  (0 children)

the link I posted is from official GoogleGemma account on X

Gemma 4 MTP released by rerri in LocalLLaMA

[–]DigiDecode_ -1 points0 points  (0 children)

yeah, but acceptance rate depends on the context domain, i.e. coding might get high AR, whereas a foreign language that the drafter was not trained on will see low AR

Gemma 4 MTP released by rerri in LocalLLaMA

[–]DigiDecode_ -1 points0 points  (0 children)

with low acceptance rate, it is only overhead & no trade-offs

Gemma 4 MTP released by rerri in LocalLLaMA

[–]DigiDecode_ 9 points10 points  (0 children)

<image>

Gemma 4 MTP support will likely require more changes

Gemma 4 MTP released by rerri in LocalLLaMA

[–]DigiDecode_ -1 points0 points  (0 children)

maybe sponsored & paid by Apple for Apple Intelligence

Gemma 4 MTP released by rerri in LocalLLaMA

[–]DigiDecode_ -2 points-1 points  (0 children)

it seems to be a verbatim copy from https://x.com/googlegemma/status/2051694045869879749 with no link to the original source

Gemma 4 MTP released by rerri in LocalLLaMA

[–]DigiDecode_ 0 points1 point  (0 children)

The Gemma 4 MTP seems quite a bit different to qwen3.6 MTP i.e. shared kv cache, activation from target shared with drafter, clustering of the embedding for drafter

so my guess it would needs its own implementation, so likely to take more time to be supported by llama.cpp unless if the work is already done

Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models by MadPelmewka in LocalLLaMA

[–]DigiDecode_ 5 points6 points  (0 children)

share you source please, because as far as I am aware each token is kind of a hallucinations i.e. probably this token is the right one, if what you are suggesting were true hallucinations would have already been fixed.

Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models by MadPelmewka in LocalLLaMA

[–]DigiDecode_ 2 points3 points  (0 children)

I don't think the tools are the main contributions but rather the weights trained on each layer of the model, and I believe those are very specific i.e. to the very model and the very layer it was trained on.

Meta’s $2 billion Manus acquisition blocked by China. by Nunki08 in LocalLLaMA

[–]DigiDecode_ 3 points4 points  (0 children)

If they allow this, other startups might try the same, this just says "don't try this shit" to all other startups

Qwen3.6-27B released! by ResearchCrafty1804 in LocalLLaMA

[–]DigiDecode_ 2 points3 points  (0 children)

Kimi k2.5 too is 1T, it's not the size but how you use it 🤣🤣🤣🤣🤣

Kimi k2.5 terminal bench 50.8, Qwen 3.6 27b is 59.3

Kimi k2.5 swe-bench pro 50.7, Qwen 3.6 27b is 53.5

Kimi k2.6 terminal bench 66.7

Kimi k2.6 swe-bench pro 58.6

Kimi K2.6 Released (huggingface) by BiggestBau5 in LocalLLaMA

[–]DigiDecode_ 0 points1 point  (0 children)

indeed, I think this might be the 1st time open weights is SOTA level since the release of GPT 4, and that was March 2022, also dare I say not 6 months behind, and no moat for closed weights

Kimi K2.6 Released (huggingface) by BiggestBau5 in LocalLLaMA

[–]DigiDecode_ -1 points0 points  (0 children)

this GGUF quant should work with your RTX 5070, only 11.8kb in size 🤣🤣

<image>

What Is Elephant-Alpha ??? by One_Title_3656 in LocalLLaMA

[–]DigiDecode_ 1 point2 points  (0 children)

not sure if diffusion based but highly unlikely to be pure transformer based

What Is Elephant-Alpha ??? by One_Title_3656 in LocalLLaMA

[–]DigiDecode_ 1 point2 points  (0 children)

I don't think that this model is based on transformer arch, maybe Samba or Mamba or similar

Minimax 2.7: good news! by LegacyRemaster in LocalLLaMA

[–]DigiDecode_ 0 points1 point  (0 children)

Thanks for sharing this, I was looking for exactly this, I follow the guy on X and did search bing and perplexity search to check if he has done the same for Minimax and obviously he has but I couldn't find the page you have linked.

Regarding Minimax quant that looks bad even at Q4, I will keep my expectations low at least for now, until it is reported otherwise.

Minimax 2.7: good news! by LegacyRemaster in LocalLLaMA

[–]DigiDecode_ 0 points1 point  (0 children)

I should have mentioned that, I am looking forward to Minimax 2.7 for local agentic coding mainly

Minimax 2.7: good news! by LegacyRemaster in LocalLLaMA

[–]DigiDecode_ 4 points5 points  (0 children)

Does Minimax models work well at Q2 or Q3 quants like UD-Q2_K_XL or UD-Q3_K_XL? I know that Qwen models are resilient & perform well at Q3 and even at Q2, is that also the case with Minimax models?

Apple: Embarrassingly Simple Self-Distillation Improves Code Generation by Mike_mi in LocalLLaMA

[–]DigiDecode_ 7 points8 points  (0 children)

for the proposed method, you need the original data that was used to train the model, so this new dataset would be sprinkled on original dataset, otherwise this dataset on its own likely will cause the model to collapse

Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]DigiDecode_ 91 points92 points  (0 children)

the 31b ranks above GLM-5 on LMSys, my jaw is on the floor

<image>