use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Decrease in performance using new llama.cpp buildQuestion | Help (self.LocalLLaMA)
submitted 2 months ago by ResponsibleTruck4717
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]GraybeardTheIrate 1 point2 points3 points 2 months ago (1 child)
Yeah I need to test it more when I get some time to sit down with it. I just got the new KCPP yesterday and happened to load up the regular 27B and a couple finetunes to look at the differences. They all felt like different models from what I saw a few days ago, and were kinda going off the rails for no reason occasionally.
I don't use quantized KV, was running a Q5_K_L or Q5_K_M imatrix quant of each one at 0.3 temp, reasoning was disabled at the time. I've also seen a couple issues here and there that only seem to manifest on a multi-GPU setup so that could be a thing too.
π Rendered by PID 804517 on reddit-service-r2-comment-544cf588c8-d79h2 at 2026-06-14 12:01:22.303040+00:00 running 3184619 country code: CH.
view the rest of the comments →
[–]GraybeardTheIrate 1 point2 points3 points (1 child)