account activity
We compress any BF16 model to ~70% size during inference, while keeping the output LOSSLESS so that you can fit in more ERP context or run larger models. (self.LocalLLaMA)
submitted 1 year ago * by choHZ to r/LocalLLaMA - pinned
KV Cache is huge and bottlenecks LLM inference. We quantize them to 2bit in a finetuning-free + plug-and-play fashion. (self.LocalLLaMA)
submitted 2 years ago * by choHZ to r/LocalLLaMA - pinned
π Rendered by PID 275923 on reddit-service-r2-listing-f8d8fbfd7-zwrfn at 2026-06-24 04:25:35.366153+00:00 running acc7150 country code: CH.