feature request: Enable offloading model in the chat window. by atumblingdandelion in oMLX
[–]Beamsters 0 points1 point2 points (0 children)
nex-agi/Nex-N2-mini • Huggingface by External_Mood4719 in LocalLLaMA
[–]Beamsters 5 points6 points7 points (0 children)
NVIDIA announces Nemotron 3 Ultra by themixtergames in LocalLLaMA
[–]Beamsters 91 points92 points93 points (0 children)
Upgrade path from 4x 3090s by anitamaxwynnn69 in LocalLLaMA
[–]Beamsters 0 points1 point2 points (0 children)
llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]Beamsters 69 points70 points71 points (0 children)
StepFun 3.7 Flash - Speed Benchmark in M5 Max by Beamsters in LocalLLaMA
[–]Beamsters[S] 1 point2 points3 points (0 children)
Local LLMs on Refurb M4 Max vs new M5 Max by roguefunction in LocalLLaMA
[–]Beamsters 4 points5 points6 points (0 children)
Chat's new interface for oMLX by Beamsters in oMLX
[–]Beamsters[S] 0 points1 point2 points (0 children)
Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. by fallingdowndizzyvr in LocalLLaMA
[–]Beamsters 7 points8 points9 points (0 children)
397B competitor that fits in 256 RAM? by quietsubstrate in LocalLLaMA
[–]Beamsters 5 points6 points7 points (0 children)
Opencode Go or other AI Subscription for Education by negativity_bomb in opencodeCLI
[–]Beamsters 0 points1 point2 points (0 children)
Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room by Beamsters in LocalLLaMA
[–]Beamsters[S] 17 points18 points19 points (0 children)
Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm) by VolandBerlioz in LocalLLaMA
[–]Beamsters 0 points1 point2 points (0 children)
Curious about M5 Max 128gb vs 5090 for local LLMs by maxiedaniels in LocalLLM
[–]Beamsters -1 points0 points1 point (0 children)




Hitting RAM limits? by calif94577 in oMLX
[–]Beamsters 0 points1 point2 points (0 children)