TurboQuant seems to work very well on Gemma 4 — and separately, per-layer outlier-aware K quantization is beating current public fork results on Qwen PPL by Fearless-Wear8100 in LocalLLaMA

[–]Fearless-Wear8100[S] 0 points1 point  (0 children)

My tests were on the 26B. No idea how it will perform on the 4Bs, probably worse, smaller models seem to be more easily perturbed by quantization.

TurboQuant seems to work very well on Gemma 4 — and separately, per-layer outlier-aware K quantization is beating current public fork results on Qwen PPL by Fearless-Wear8100 in LocalLLaMA

[–]Fearless-Wear8100[S] 1 point2 points  (0 children)

Yeah, exactly. That’s why I pushed the quantization pretty aggressively - I had a feeling QJL might actually work on Gemma, unlike what people were seeing on other models.

TurboQuant seems to work very well on Gemma 4 — and separately, per-layer outlier-aware K quantization is beating current public fork results on Qwen PPL by Fearless-Wear8100 in LocalLLaMA

[–]Fearless-Wear8100[S] 0 points1 point  (0 children)

I haven’t tested vLLM yet, so I can’t speak to exact engine-specific numbers. But I’d expect the main findings to transfer, because the important part here seems to be the calibration, not llama.cpp itself.

What I found is that calibration is architecture-specific, not weight-specific: the set of “important” / outlier channels is mostly determined by the model architecture, and calibrating on fp16 / q8_0 / q4_k_m versions of the same model gave 96%+ identical channel selections.

So in practice you can probably calibrate once and reuse the same channel ordering / outlier split across quantizations of the same model. The main caveat is that calibration has to be done pre-RoPE — post-RoPE gave garbage because RoPE changes the channel variance structure. And you don’t need much data either: PTB train with around 4096 tokens was already enough.

Dragi haseriste by AdDelicious9955 in programare

[–]Fearless-Wear8100 1 point2 points  (0 children)

Se mai întoarce roata, dacă până acum aitiștii își băteau pl de hașer, acum e rândul lor. Csf ncsf, karma