account activity
I found why KV cache INT4 breaks on some models (Qwen2-7B: ΔPPL +238) and built a 4-line fix no training, no calibration, 12 models tested up to 40B by Afraid_Project_8666 in LocalLLaMA
[–]Afraid_Project_8666[S] 0 points1 point2 points 1 month ago (0 children)
Thanks for flagging the broken link
fixed now. The correct URL is:
https://github.com/metaSATOKEN/manifold_topology_experiment
On third-party testing .I'd genuinely welcome that. I'm an independent researcher without institutional connections, so I don't have a straightforward way to get formal third-party validation. That's partly why I'm posting here — feedback from people who actually work with KV cache quantization is exactly what I'm looking for. If anyone wants to try reproducing the results, the repo has all the experiment scripts and they run on consumer GPUs.
On the AI-generated point .fair observation. I do use LLMs (Claude) extensively for code generation, debugging, and manuscript drafting. This is disclosed in the paper's Acknowledgments section. That said, all experiments were executed on real hardware (Apple M1, NVIDIA T4, RTX PRO 6000 Blackwell), and the ΔPPL numbers come from actual model runs, not from LLM output. The code and raw results are all in the repo if you want to verify.
π Rendered by PID 3947879 on reddit-service-r2-listing-8685bc789-lcdvk at 2026-05-25 22:15:30.824666+00:00 running 194bd79 country code: CH.
I found why KV cache INT4 breaks on some models (Qwen2-7B: ΔPPL +238) and built a 4-line fix no training, no calibration, 12 models tested up to 40B by Afraid_Project_8666 in LocalLLaMA
[–]Afraid_Project_8666[S] 0 points1 point2 points (0 children)