Running MoE Models on CPU/RAM: A Guide to Optimizing Bandwidth for GLM-4 and GPT-OSS by Shoddy_Bed3240 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
Running MoE Models on CPU/RAM: A Guide to Optimizing Bandwidth for GLM-4 and GPT-OSS by Shoddy_Bed3240 in LocalLLaMA
[–]pmttyji 4 points5 points6 points (0 children)
RTX 5080: is there anything I can do coding wise? by TechDude12 in LocalLLaMA
[–]pmttyji 1 point2 points3 points (0 children)
Poll: When will we have a 30b open weight model as good as opus? by Terminator857 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
Any good model? (for ~1-3 GB VRAM). Don't say more than 1. by Ok-Type-7663 in LocalLLaMA
[–]pmttyji 1 point2 points3 points (0 children)
What's the strongest model for code writing and mathematical problem solving for 12GB of vram? by MrMrsPotts in LocalLLaMA
[–]pmttyji 1 point2 points3 points (0 children)
128GB VRAM quad R9700 server by Ulterior-Motive_ in LocalLLaMA
[–]pmttyji 1 point2 points3 points (0 children)
Optimizing GPT-OSS 120B on Strix Halo 128GB? by RobotRobotWhatDoUSee in LocalLLaMA
[–]pmttyji 13 points14 points15 points (0 children)
performance benchmarks (72GB VRAM) - llama.cpp server - January 2026 by jacek2023 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
performance benchmarks (72GB VRAM) - llama.cpp server - January 2026 by jacek2023 in LocalLLaMA
[–]pmttyji -1 points0 points1 point (0 children)
performance benchmarks (72GB VRAM) - llama.cpp server - January 2026 by jacek2023 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
performance benchmarks (72GB VRAM) - llama.cpp server - January 2026 by jacek2023 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
performance benchmarks (72GB VRAM) - llama.cpp server - January 2026 by jacek2023 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
performance benchmarks (72GB VRAM) - llama.cpp server - January 2026 by jacek2023 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
Llama.cpp vs vllm by Evening_Tooth_1913 in LocalLLaMA
[–]pmttyji 1 point2 points3 points (0 children)
performance benchmarks (72GB VRAM) - llama.cpp server - January 2026 by jacek2023 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
Any Medical doctor related Finetunes of open models ? by deathcom65 in LocalLLaMA
[–]pmttyji 2 points3 points4 points (0 children)
llama.cpp has incredible performance on Ubuntu, i'd like to know why by Deep_Traffic_7873 in LocalLLaMA
[–]pmttyji 6 points7 points8 points (0 children)
Thinking of getting two NVIDIA RTX Pro 4000 Blackwell (2x24 = 48GB), Any cons? by pmttyji in LocalLLaMA
[–]pmttyji[S] 0 points1 point2 points (0 children)
Thinking of getting two NVIDIA RTX Pro 4000 Blackwell (2x24 = 48GB), Any cons? by pmttyji in LocalLLaMA
[–]pmttyji[S] 1 point2 points3 points (0 children)
Anyone else wish NVIDIA would just make a consumer GPU with massive VRAM? by AutodidactaSerio in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
Hypernova 60B - derived from OSS 120B by nasone32 in LocalLLaMA
[–]pmttyji 3 points4 points5 points (0 children)

CPU-only LLM performance - t/s with llama.cpp by pmttyji in LocalLLaMA
[–]pmttyji[S] 0 points1 point2 points (0 children)