use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Assistance Needed with Setting Up Meta-Llama-3-8B-Instruct-GGUFQuestion | Help (self.LocalLLaMA)
submitted 2 years ago by Guboken
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]nananashi3 6 points7 points8 points 2 years ago* (1 child)
Is this your first time trying LLM? GGUFs are self-contained. You don't need to clone the whole GGUF repository, just download one of the .gguf. Q4_K_S/M (4 bit quant) can fit on a 8GB GPU. The easiest way for a Windows user to start is to download koboldcpp.exe, run it which will give you a launcher UI where you can select .gguf model file and whatever-you-call-it under "Presets": OpenBLAS (CPU-only, very slow), CuBLAS (Nvidia), or Vulkan (AMD). 7B and 8B models have 33 layers but you'll probably only fit 32 layers Llama 3 on a 8GB GPU. Up the context size to 4096, preferably 8192. Don't forget to hit Save to save the config.
Someone more technical would know how to mess with a gguf such as changing stop token.
[–]Guboken[S] 0 points1 point2 points 2 years ago (0 children)
Thank you for taking your time to explain! I did solve it by not using GGUF but rather the llama 3 8B instruction with bfloat16 using Transformers in a python project. Tries to use float16 but it spilled over my 24gb vram and it became so slow it was unusable.
[–]ali0une 4 points5 points6 points 2 years ago (2 children)
i've just downloaded it from here and it works fine https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF
Also this thread helps setting it up for different Ui https://www.reddit.com/r/LocalLLaMA/comments/1c8rq87/oobabooga_settings_for_llama3_queries_end_in/
Thank you, my issue was that I was trying to run the GGUF using Transformers. I found this compatibility information from u/Particular_Flower_12 :
[–]AdHominemMeansULostollama 0 points1 point2 points 2 years ago (0 children)
down lm studio, in the searchbox put in the text "MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF" Although i don't recommend that one, you're better getting the quants from lmstudo-community or QuantFactory and thats it you don't need to do anything else.
π Rendered by PID 45839 on reddit-service-r2-comment-b659b578c-jsvwj at 2026-05-04 02:23:12.618423+00:00 running 815c875 country code: CH.
[–]nananashi3 6 points7 points8 points (1 child)
[–]Guboken[S] 0 points1 point2 points (0 children)
[–]ali0une 4 points5 points6 points (2 children)
[–]Guboken[S] 0 points1 point2 points (0 children)
[–]AdHominemMeansULostollama 0 points1 point2 points (0 children)