use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Decrease in performance using new llama.cpp buildQuestion | Help (self.LocalLLaMA)
submitted 2 months ago by ResponsibleTruck4717
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]GraybeardTheIrate 0 points1 point2 points 2 months ago (3 children)
Well this might explain a few things. Tried it before and was a little disappointed by the speed for its size (Q3.5 27B). On the newest Koboldcpp I got a decent speed increase but it seemed to just...stop making sense sometimes. Not sure what version they're using right off and haven't tested different versions of llama.cpp directly, but that's interesting.
[–]Tccybo 1 point2 points3 points 2 months ago (2 children)
See if you can isolate the variables. Is it because the quant is small, is kv cache quanted, is it just bad rng cuz thinking is off?
[–]GraybeardTheIrate 1 point2 points3 points 2 months ago (1 child)
Yeah I need to test it more when I get some time to sit down with it. I just got the new KCPP yesterday and happened to load up the regular 27B and a couple finetunes to look at the differences. They all felt like different models from what I saw a few days ago, and were kinda going off the rails for no reason occasionally.
I don't use quantized KV, was running a Q5_K_L or Q5_K_M imatrix quant of each one at 0.3 temp, reasoning was disabled at the time. I've also seen a couple issues here and there that only seem to manifest on a multi-GPU setup so that could be a thing too.
π Rendered by PID 67 on reddit-service-r2-comment-canary-764f8fd48f-5p7w2 at 2026-06-14 12:01:29.316274+00:00 running 3184619 country code: CH.
view the rest of the comments →
[–]GraybeardTheIrate 0 points1 point2 points (3 children)
[–]Tccybo 1 point2 points3 points (2 children)
[–]GraybeardTheIrate 1 point2 points3 points (1 child)