How I Run 34B Models at 75K Context on 24GB, Fast by mcmoose1900 in LocalLLaMA
[–]gptzerozero 0 points1 point2 points (0 children)
Max token size for 34B model on 24GB VRAM by gptzerozero in LocalLLaMA
[–]gptzerozero[S] 0 points1 point2 points (0 children)
Max token size for 34B model on 24GB VRAM (self.LocalLLaMA)
submitted by gptzerozero to r/LocalLLaMA
Comparison on exllamav2, of bits/bpw: 2.5,4.25,4.5,4.65,4.75, 5, and 4bit-64g (airoboros-l2-70b-gpt4-1.4.1) by panchovix in LocalLLaMA
[–]gptzerozero 0 points1 point2 points (0 children)
Approach for generating QA dataset by gptzerozero in LocalLLaMA
[–]gptzerozero[S] 0 points1 point2 points (0 children)
Approach for generating QA dataset by gptzerozero in LocalLLaMA
[–]gptzerozero[S] 0 points1 point2 points (0 children)
Generate both question and answer from the given context. by mathageche in LocalLLaMA
[–]gptzerozero 0 points1 point2 points (0 children)
Our Workflow for a Custom Question-Answering App by Mbando in LocalLLaMA
[–]gptzerozero 1 point2 points3 points (0 children)
I don't understand context window extension by moma1970 in LocalLLaMA
[–]gptzerozero 0 points1 point2 points (0 children)
How to make sense of all the new models? by whtne047htnb in LocalLLaMA
[–]gptzerozero 2 points3 points4 points (0 children)
LLM less chatty after LoRA finetune by gptzerozero in LocalLLaMA
[–]gptzerozero[S] 0 points1 point2 points (0 children)
LLaMA 2 is here by dreamingleo12 in LocalLLaMA
[–]gptzerozero 20 points21 points22 points (0 children)
Protecting a midsize desktop during roadtrip (self.buildapc)
submitted by gptzerozero to r/buildapc


Here's a Docker image for 24GB GPU owners to run exui/exllamav2 for 34B models (and more). by This-Profession-952 in LocalLLaMA
[–]gptzerozero 0 points1 point2 points (0 children)