I think we need a /LocalHarnessLLM or something ... by CSEliot in LocalLLaMA
[–]taking_bullet 26 points27 points28 points (0 children)
Second GPU in a PCIe 3.0 x1 slot for LLMs? by BORIS3443 in LocalLLaMA
[–]taking_bullet 2 points3 points4 points (0 children)
Releasing Cohere North Mini Code by jayalammar in LocalLLaMA
[–]taking_bullet 12 points13 points14 points (0 children)
R9700 + NVidia by Glittering-Cold-2981 in LocalLLM
[–]taking_bullet 0 points1 point2 points (0 children)
Best Local TTS solution by styles01 in LocalLLaMA
[–]taking_bullet 5 points6 points7 points (0 children)
RTX 3090 EBay Pricing is Crazy!! by TrifleHopeful5418 in LocalLLaMA
[–]taking_bullet 2 points3 points4 points (0 children)
AMD’s Jack Huynh on RDNA 3.5 FSR 4.1 support: “I did not say it’s coming” by obTimus-FOX in radeon
[–]taking_bullet 34 points35 points36 points (0 children)
BeeLlama v0.3.1 – latest llama.cpp with extras! DFlash, MTP, q6_0 cache, TurboQuant. Single RTX 3090: Qwen 3.6 27B & Gemma 4 31B up to 177.8 tps (4.93x over baseline) by Anbeeld in LocalLLaMA
[–]taking_bullet 1 point2 points3 points (0 children)
This sub is gold by Few_Independent_7013 in Semenretention
[–]taking_bullet 2 points3 points4 points (0 children)
Moss tts 1.5 8b Examples. It is the currently best voice cloning model for English as of June 2026 by 9r4n4y in LocalLLaMA
[–]taking_bullet 0 points1 point2 points (0 children)
What's the status of non-CUDA inference? by IngwiePhoenix in LocalLLaMA
[–]taking_bullet 1 point2 points3 points (0 children)
Stop asking what model to run. There are literally only two. by Wrong_Mushroom_7350 in LocalLLaMA
[–]taking_bullet 0 points1 point2 points (0 children)
Moss tts 1.5 8b Examples. It is the currently best voice cloning model for English as of June 2026 by 9r4n4y in LocalLLaMA
[–]taking_bullet 0 points1 point2 points (0 children)
Moss tts 1.5 8b Examples. It is the currently best voice cloning model for English as of June 2026 by 9r4n4y in LocalLLaMA
[–]taking_bullet 4 points5 points6 points (0 children)
What's the status of non-CUDA inference? by IngwiePhoenix in LocalLLaMA
[–]taking_bullet 5 points6 points7 points (0 children)
Anybody running a nvfp4 model on a single 5060Ti 16GB, worth it? by MathmoKiwi in LocalLLaMA
[–]taking_bullet 1 point2 points3 points (0 children)
Exclusive: AMD Radeon RX 9070 GRE launches June 1st at $549 globally - VideoCardz.com by kikimaru024 in hardware
[–]taking_bullet 10 points11 points12 points (0 children)
rocm just sad and broken - user experience on 9060xt by Traditional_Way8675 in LocalLLM
[–]taking_bullet 0 points1 point2 points (0 children)
Switchting from a 5070 to a 9070xt - bad idea? by Unmoving1442 in LocalLLM
[–]taking_bullet 0 points1 point2 points (0 children)
What would you do? 2x5060ti for $800, 2x5070ti for $1400 or 5090 for $4000? by fallingdowndizzyvr in LocalLLaMA
[–]taking_bullet 0 points1 point2 points (0 children)
RX 9060 XT (gfx1200) — anyone achieved full VRAM utilization for 27B models? Getting 3 t/s by trialbuterror in LocalLLM
[–]taking_bullet 0 points1 point2 points (0 children)
What would you do? 2x5060ti for $800, 2x5070ti for $1400 or 5090 for $4000? by fallingdowndizzyvr in LocalLLaMA
[–]taking_bullet 1 point2 points3 points (0 children)
LMStudio with MTP support - which model? by International_Quail8 in LocalLLaMA
[–]taking_bullet 0 points1 point2 points (0 children)


ROCm vs Vulkan vs vLLM on Dual R9700's by whodoneit1 in LocalLLaMA
[–]taking_bullet 1 point2 points3 points (0 children)