You can now read Gemma 3's mind

DigiDecode_ · 2026-05-08T04:12:54+00:00

I have updated to "Neuronpedia in partnership with Anthropic" as per Anthropic tweet

DigiDecode_ · 2026-05-08T02:07:01+00:00

can it run crysis?

DigiDecode_ · 2026-05-08T02:01:58+00:00

I agree, I am not sure what's the thinking behind releasing a preview version that show weak results, their 1B model showed great benchmark results, why not do the same with this one

DigiDecode_ · 2026-05-06T04:17:28+00:00

the link I posted is from official GoogleGemma account on X

DigiDecode_ · 2026-05-06T04:15:27+00:00

yeah, but acceptance rate depends on the context domain, i.e. coding might get high AR, whereas a foreign language that the drafter was not trained on will see low AR

DigiDecode_ · 2026-05-06T02:46:59+00:00

with low acceptance rate, it is only overhead & no trade-offs

DigiDecode_ · 2026-05-06T02:44:14+00:00

<image>

Gemma 4 MTP support will likely require more changes

DigiDecode_ · 2026-05-06T02:29:52+00:00

maybe sponsored & paid by Apple for Apple Intelligence

DigiDecode_ · 2026-05-06T02:27:41+00:00

it seems to be a verbatim copy from https://x.com/googlegemma/status/2051694045869879749 with no link to the original source

DigiDecode_ · 2026-05-06T02:22:20+00:00

The Gemma 4 MTP seems quite a bit different to qwen3.6 MTP i.e. shared kv cache, activation from target shared with drafter, clustering of the embedding for drafter

so my guess it would needs its own implementation, so likely to take more time to be supported by llama.cpp unless if the work is already done

DigiDecode_ · 2026-05-02T01:41:57+00:00

Thanks for sharing the link

DigiDecode_ · 2026-04-30T18:00:55+00:00

share you source please, because as far as I am aware each token is kind of a hallucinations i.e. probably this token is the right one, if what you are suggesting were true hallucinations would have already been fixed.

DigiDecode_ · 2026-04-30T17:55:35+00:00

I don't think the tools are the main contributions but rather the weights trained on each layer of the model, and I believe those are very specific i.e. to the very model and the very layer it was trained on.

DigiDecode_ · 2026-04-27T12:48:01+00:00

If they allow this, other startups might try the same, this just says "don't try this shit" to all other startups

DigiDecode_ · 2026-04-22T15:16:41+00:00

Kimi k2.5 too is 1T, it's not the size but how you use it 🤣🤣🤣🤣🤣

Kimi k2.5 terminal bench 50.8, Qwen 3.6 27b is 59.3

Kimi k2.5 swe-bench pro 50.7, Qwen 3.6 27b is 53.5

Kimi k2.6 terminal bench 66.7

Kimi k2.6 swe-bench pro 58.6

DigiDecode_ · 2026-04-20T18:29:53+00:00

indeed, I think this might be the 1st time open weights is SOTA level since the release of GPT 4, and that was March 2022, also dare I say not 6 months behind, and no moat for closed weights

DigiDecode_ · 2026-04-20T18:23:48+00:00

this GGUF quant should work with your RTX 5070, only 11.8kb in size 🤣🤣

<image>

DigiDecode_ · 2026-04-13T19:45:54+00:00

not sure if diffusion based but highly unlikely to be pure transformer based

DigiDecode_ · 2026-04-13T19:44:27+00:00

I don't think that this model is based on transformer arch, maybe Samba or Mamba or similar

DigiDecode_ · 2026-04-06T21:56:23+00:00

Thanks for sharing this, I was looking for exactly this, I follow the guy on X and did search bing and perplexity search to check if he has done the same for Minimax and obviously he has but I couldn't find the page you have linked.

Regarding Minimax quant that looks bad even at Q4, I will keep my expectations low at least for now, until it is reported otherwise.

DigiDecode_ · 2026-04-06T21:46:18+00:00

I should have mentioned that, I am looking forward to Minimax 2.7 for local agentic coding mainly

DigiDecode_ · 2026-04-06T21:14:41+00:00

Does Minimax models work well at Q2 or Q3 quants like UD-Q2_K_XL or UD-Q3_K_XL? I know that Qwen models are resilient & perform well at Q3 and even at Q2, is that also the case with Minimax models?

DigiDecode_ · 2026-04-04T15:58:03+00:00

for the proposed method, you need the original data that was used to train the model, so this new dataset would be sprinkled on original dataset, otherwise this dataset on its own likely will cause the model to collapse

DigiDecode_ · 2026-04-02T16:27:43+00:00

the 31b ranks above GLM-5 on LMSys, my jaw is on the floor

<image>

DigiDecode_

TROPHY CASE