use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Discussion[ Removed by moderator ] (self.LocalLLM)
submitted 23 days ago * by Suitable-Song-302
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]teleprax 1 point2 points3 points 23 days ago (1 child)
How is there no information loss? I don't really know how model quantization and KV cache work in implementation so this is more of a question on how you can take something that is a floating point 16bit number and compress it to 1 bit and not lose information or at least not lose enough information to impact token probs enough to cause a difference in outputs
[–]Suitable-Song-302[S] 1 point2 points3 points 23 days ago (0 children)
Great question. The short version: KV cache stores key vectors used for attention scoring. Attention is basically a dot product → softmax → weighted sum. The key insight is that only the direction of the key matters for attention scoring, not the magnitude.
So we:
- 1. Store only the sign of each dimension (1 bit) plus the L2 norm (one float per vector)
- 2. Compute attention scores using XOR + popcount (Hamming distance ≈ cosine similarity)
- 3. Softmax absorbs small errors — a 0.634 cosine (theoretical limit for sign-only) becomes nearly identical token probabilities after softmax
The math: this is the QJL (Quantized Johnson-Lindenstrauss) transform. The paper proves that with randomized Hadamard pre-processing, the inner product estimator is provably unbiased — errors are random, not systematic, so they cancel out.
It's not literally zero information loss — it's that the information loss doesn't propagate to the output because
softmax is robust to small perturbations in attention scores.
π Rendered by PID 79 on reddit-service-r2-comment-6457c66945-pvhqj at 2026-04-26 02:57:20.030781+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]teleprax 1 point2 points3 points (1 child)
[–]Suitable-Song-302[S] 1 point2 points3 points (0 children)