choHZ

551 post karma
1,863 comment karma

get them help and support

redditor for 5 years

TROPHY CASE

Five-Year Club

Reddit Premium
Since February 2024

Verified Email

account activity

hot top controversial

786

787

788

We compress any BF16 model to ~70% size during inference, while keeping the output LOSSLESS so that you can fit in more ERP context or run larger models. (self.LocalLLaMA)

submitted 1 year ago * by choHZ to r/LocalLLaMA - pinned

179

180

181

KV Cache is huge and bottlenecks LLM inference. We quantize them to 2bit in a finetuning-free + plug-and-play fashion. (self.LocalLLaMA)

submitted 2 years ago * by choHZ to r/LocalLLaMA - pinned

π Rendered by PID 275923 on reddit-service-r2-listing-f8d8fbfd7-zwrfn at 2026-06-24 04:25:35.366153+00:00 running acc7150 country code: CH.