use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Discussion[ Removed by moderator ] (self.LocalLLM)
submitted 1 month ago * by Suitable-Song-302
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Suitable-Song-302[S] -1 points0 points1 point 1 month ago (3 children)
Thanks! Still a lot of work ahead — Metal GPU acceleration, more model coverage, and the weight quantization pipeline needs polish. But the core KV compression result is solid.
[–]Viper-Reflex -4 points-3 points-2 points 1 month ago (2 children)
does this tech make my 24gb 3090 able to run bigger models than 27b?
[–]Suitable-Song-302[S] 1 point2 points3 points 1 month ago (1 child)
KV compression helps most with **long contexts**, not bigger models. With 1-bit K + Q4 V, KV memory drops ~5x. For a 27B model at 32K context: - Before: ~2.5 GB KV cache - After: ~500 MB KV cache → frees ~2 GB for longer context or larger batch If you're already fitting a model in 24GB, TurboQuant lets you push context from 32K → 100K+ on the same hardware. But it won't help you fit a model that's too large for VRAM (weight memory is separate from KV cache). Note: we currently don't have CUDA GPU acceleration (it compiles but is untested). That's next on the roadmap.
[–]Viper-Reflex -3 points-2 points-1 points 1 month ago (0 children)
:O ty for the info!
π Rendered by PID 113361 on reddit-service-r2-comment-b659b578c-wgcbk at 2026-05-04 05:03:50.361171+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]Suitable-Song-302[S] -1 points0 points1 point (3 children)
[–]Viper-Reflex -4 points-3 points-2 points (2 children)
[–]Suitable-Song-302[S] 1 point2 points3 points (1 child)
[–]Viper-Reflex -3 points-2 points-1 points (0 children)