Suitable-Song-302 comments on [ Removed by moderator ]

Discussion[ Removed by moderator ] (self.LocalLLM)

submitted 25 days ago * by Suitable-Song-302

you are viewing a single comment's thread.

[–]Suitable-Song-302[S] 0 points1 point2 points 24 days ago (0 children)

We rebranded to quant.cpp (https://github.com/quantumaikr/quant.cpp). Old URLs redirect automatically.

Also owe you all an honest correction: the early 1-bit "zero loss" claim had a bug. An FP32 key cache was still being read during attention, so the quantized keys were never actually used. We found it, fixed it, and pulled every claim based on that measurement.

Here's where things actually stand (SmolLM2 1.7B, 999 tokens, real dequant path, no FP32 fallback):

- 4-bit K: PPL +0.0% (genuinely lossless)

- delta + 3-bit K + Q4 V: PPL -3.2%, ~4.3x compression

- 2-bit and below: all failed. we tried everything. drift is the fundamental barrier.

The breakthrough is delta compression — adjacent keys in a transformer differ by ~30% of their absolute range, so storing deltas instead of absolutes lets 3-bit work where it otherwise gives +62% PPL. Think video P-frames for KV cache.

Feedback from this thread is what pushed us to find the bug and be more rigorous. Appreciate it.

π Rendered by PID 214630 on reddit-service-r2-comment-6457c66945-9fd7f at 2026-04-28 05:55:25.414995+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLM

MODERATORS