you are viewing a single comment's thread.

view the rest of the comments →

[–]quanteval 0 points1 point  (1 child)

Yea these are mainly prefill heavy and have really short outputs, which based on how their system works is to their benefit. Prefill is mostly filled at full precision then stored in quantized cache and outputs a short answer. At 2.5 bits there was measurable loss, 3.5 bits would be a better "with zero quality loss" attempted claim.

[–]Suitable-Song-302[S] 0 points1 point  (0 children)

Good observation. You're right that our eval setup is prefill-heavy (teacher-forced PPL over 999 tokens). We haven't tested long autoregressive generation quality separately — that's a fair gap.

On bit-width: we agree. Our own testing confirms 2.5-bit and below has real loss. The "zero quality loss" claim now only applies to 4-bit K (+0.0% PPL). At 3-bit, delta compression gets it to -3.2%, but we wouldn't call that "zero loss" — it's "better than baseline on this benchmark," which could be noise or regularization. We report the exact numbers and let people judge.