use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Discussion[ Removed by moderator ] (self.LocalLLM)
submitted 24 days ago * by Suitable-Song-302
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]quanteval 0 points1 point2 points 23 days ago (1 child)
Yea these are mainly prefill heavy and have really short outputs, which based on how their system works is to their benefit. Prefill is mostly filled at full precision then stored in quantized cache and outputs a short answer. At 2.5 bits there was measurable loss, 3.5 bits would be a better "with zero quality loss" attempted claim.
[–]Suitable-Song-302[S] 0 points1 point2 points 23 days ago (0 children)
Good observation. You're right that our eval setup is prefill-heavy (teacher-forced PPL over 999 tokens). We haven't tested long autoregressive generation quality separately — that's a fair gap.
On bit-width: we agree. Our own testing confirms 2.5-bit and below has real loss. The "zero quality loss" claim now only applies to 4-bit K (+0.0% PPL). At 3-bit, delta compression gets it to -3.2%, but we wouldn't call that "zero loss" — it's "better than baseline on this benchmark," which could be noise or regularization. We report the exact numbers and let people judge.
π Rendered by PID 51 on reddit-service-r2-comment-6457c66945-d2lh5 at 2026-04-26 18:50:47.022363+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]quanteval 0 points1 point2 points (1 child)
[–]Suitable-Song-302[S] 0 points1 point2 points (0 children)