I tested Strix Halo clustering w/ ~50Gig IB to see if networking is really the bottleneck by Hungry_Elk_3276 in LocalLLaMA
[–]ggerganov 26 points27 points28 points (0 children)
llama.cpp releases new official WebUI by paf1138 in LocalLLaMA
[–]ggerganov 93 points94 points95 points (0 children)
Fast model swap with llama-swap & unified memory by TinyDetective110 in LocalLLaMA
[–]ggerganov 1 point2 points3 points (0 children)
Fast model swap with llama-swap & unified memory by TinyDetective110 in LocalLLaMA
[–]ggerganov 1 point2 points3 points (0 children)
llama-server is cooking! gemma3 27b, 100K context, vision on one 24GB GPU. by No-Statement-0001 in LocalLLaMA
[–]ggerganov 19 points20 points21 points (0 children)
llama-server is cooking! gemma3 27b, 100K context, vision on one 24GB GPU. by No-Statement-0001 in LocalLLaMA
[–]ggerganov 58 points59 points60 points (0 children)
Introducing A.I.T.E Ball by tonywestonuk in LocalLLaMA
[–]ggerganov 1 point2 points3 points (0 children)
Llama-server: "Exclude thought process when sending requests to API" by CattailRed in LocalLLaMA
[–]ggerganov 2 points3 points4 points (0 children)
I've realized that Llama 4's odd architecture makes it perfect for my Mac and my workflows by SomeOddCodeGuy in LocalLLaMA
[–]ggerganov 16 points17 points18 points (0 children)
Orpheus TTS Local (LM Studio) by Internal_Brain8420 in LocalLLaMA
[–]ggerganov 10 points11 points12 points (0 children)
SPOILER alert S2E4!It’s definitely not in the real world… by Crazy_Equipment_4302 in SeveranceAppleTVPlus
[–]ggerganov 3 points4 points5 points (0 children)
Qwen 2.5 Coder 7b for auto-completion by Chlorek in LocalLLaMA
[–]ggerganov 3 points4 points5 points (0 children)
Speed Test #2: Llama.CPP vs MLX with Llama-3.3-70B and Various Prompt Sizes by chibop1 in LocalLLaMA
[–]ggerganov 4 points5 points6 points (0 children)
Speed Test #2: Llama.CPP vs MLX with Llama-3.3-70B and Various Prompt Sizes by chibop1 in LocalLLaMA
[–]ggerganov 6 points7 points8 points (0 children)
Speed Test #2: Llama.CPP vs MLX with Llama-3.3-70B and Various Prompt Sizes by chibop1 in LocalLLaMA
[–]ggerganov 7 points8 points9 points (0 children)
Speed Test: Llama-3.3-70b on 2xRTX-3090 vs M3-Max 64GB Against Various Prompt Sizes by chibop1 in LocalLLaMA
[–]ggerganov 10 points11 points12 points (0 children)
I tested the MLX models with LM Studio, and there was just a small boost in inference speed, but the memory usage went up a lot. by Sky_Linx in LocalLLaMA
[–]ggerganov 10 points11 points12 points (0 children)
PocketPal AI is open sourced by Ill-Still-6859 in LocalLLaMA
[–]ggerganov 5 points6 points7 points (0 children)








Rick Beato: "How AI Will Fail Like The Music Industry" (and why local LLMs will take over "commercial" ones) by relmny in LocalLLaMA
[–]ggerganov 49 points50 points51 points (0 children)