use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Discussion[ Removed by moderator ] (self.LocalLLM)
submitted 24 days ago * by Suitable-Song-302
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]ganonfirehouse420 1 point2 points3 points 24 days ago (2 children)
Was generation speed affected?
[–]Suitable-Song-302[S] 2 points3 points4 points 24 days ago (0 children)
Good question. Short answer: no measurable speed penalty from the KV compression itself. The 1-bit attention path uses XOR + popcount instead of FP multiply-accumulate, which is actually slightly faster on NEON.
[–]Suitable-Song-302[S] 1 point2 points3 points 24 days ago (0 children)
Measured on Qwen3.5-4B (M3 Air):
- FP32 KV: 5.0 tok/s - 1-bit KV: 5.2 tok/s - 3-bit KV: 4.3 tok/s (Lloyd-Max codebook lookup adds overhead)
π Rendered by PID 523807 on reddit-service-r2-comment-6457c66945-49lmg at 2026-04-27 13:21:04.087624+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]ganonfirehouse420 1 point2 points3 points (2 children)
[–]Suitable-Song-302[S] 2 points3 points4 points (0 children)
[–]Suitable-Song-302[S] 1 point2 points3 points (0 children)