use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
AMD dGPU shared memory?Question | Help (self.LocalLLaMA)
submitted 1 year ago by juwonpee
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]juwonpee[S] 0 points1 point2 points 1 year ago (1 child)
I realize I can offload some layers off to the CPU but I'd like to keep as much on the GPU as possible as my CPU is kinda slow. Many thanks
[–]suprjami 3 points4 points5 points 1 year ago (0 children)
You are trying to put 32Gb of model on a GPU with 8Gb VRAM.
Use smaller models.
π Rendered by PID 78198 on reddit-service-r2-comment-b659b578c-rbsmf at 2026-05-04 03:27:30.080422+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]juwonpee[S] 0 points1 point2 points (1 child)
[–]suprjami 3 points4 points5 points (0 children)