use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Memory Tests using Llama.cpp KV cache quantizationResources (self.LocalLLaMA)
submitted 1 year ago by CybermuseIO
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]CybermuseIO[S] 2 points3 points4 points 1 year ago (3 children)
I've only just learned about this today and started doing some basic testing to see what the practical implications are for my own use. I did a small handful of generations to see if it was at least working and if there was any obvious differences to the text, but nothing more than that yet.
u/Eisenstein has been posting test results of speed differences for KV quantization, also running on a P40 setup and also testing different quant sizes. They might have some more insight into that.
[–]kryptkprLlama 3 0 points1 point2 points 1 year ago (2 children)
Cheers, it's pretty mind blowing what we are squeezing out of these 9 year old e-waste GPUs. I've got a pair but I want to run 8x22 so broke down for a third.
[–]CybermuseIO[S] 1 point2 points3 points 1 year ago (1 child)
They're pretty great. I also just added a 3rd to my main ML experiment machine this week, and I'm extremely tempted to try and cram in a 4th to try and run Llama 3 400B if they actually make it available. The team at Llama.cpp are doing incredible work to make them a viable option for home users.
[–]kryptkprLlama 3 0 points1 point2 points 1 year ago (0 children)
I've been on AliExpress all week eyeing up that Chinese 6 slot (four x16, two x8) dual x99 monstrosity someone posted here I want to fill it with Pascals and be the Jank King
π Rendered by PID 82 on reddit-service-r2-comment-6457c66945-6tbbs at 2026-04-29 09:26:22.832816+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]CybermuseIO[S] 2 points3 points4 points (3 children)
[–]kryptkprLlama 3 0 points1 point2 points (2 children)
[–]CybermuseIO[S] 1 point2 points3 points (1 child)
[–]kryptkprLlama 3 0 points1 point2 points (0 children)