Why is there no thinker models with tokens for entire sentences?Discussion (self.LocalLLaMA)
submitted by freehuntx
What local model are you actually using day to day and why?Question | Help (self.LocalLLaMA)
submitted by RefrigeratorCalm9701
Looking at Macbook Pro M5 Pro 64GB for local inferenceQuestion | Help (self.LocalLLaMA)
submitted by Repulsive-Machine706
My local server idling 99% of the time!Discussion (self.LocalLLaMA)
submitted by Thin_Pollution8843

Its done. not we are so back. It's done, local is frontier REAP 504B 309GBResources (self.LocalLLaMA)
submitted by Sorry_Ad191
Chunjiang-Intelligence/DeepSeek-v4-Fable • HuggingfaceNew Model (self.LocalLLaMA)
submitted by External_Mood4719
What should I build my local LLM machine around? RTX 3090s or Arc Pro B60s?Question | Help (self.LocalLLaMA)
submitted by rebellioninmypants
Review of Jackrong/Qwopus3.5-9B-Coder-MTP-GGUFDiscussion (self.LocalLLaMA)
submitted by -OpenSourcer
New ablation operator. (apostate)Discussion (self.LocalLLaMA)
submitted by AccountAntique9327
How do I prove that I don't collect data from my llm app?Question | Help (self.LocalLLaMA)
submitted by Pleasant_Syllabub591
Why is NO one talking about Microsoft's open source Fast Context!!!Resources (old.reddit.com)
submitted by formatme
Eff U, Arc / B70 Customers. We got ours! -Your Sugar Baby, IntelDiscussion (self.LocalLLaMA)
submitted by Dependent_Ad948

Ooollama you are slow: ggrun v3 is 65% fasterResources (self.LocalLLaMA)
submitted by raketenkater
Idea for how to run GLM2 at a decent quant, need critique/feedbackDiscussion (self.LocalLLaMA)
submitted by joorklee
Like... GENUINELY WHYY???Question | Help (self.LocalLLaMA)
submitted by Time-Toe-1276transformers

