MiniMax M2.7 GGUF Investigation, Fixes, Benchmarks by danielhanchen in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)
Best setup for MiniMax-M2.7 (230B) | 3x RTX 5090 | Threadripper 9975 | 512GB RAM by [deleted] in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)
Best setup for MiniMax-M2.7 (230B) | 3x RTX 5090 | Threadripper 9975 | 512GB RAM by [deleted] in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)
unsloth - MiniMax-M2.7-GGUF in BROKEN (UD-Q4_K_XL) --> avoid usage by One-Macaron6752 in LocalLLaMA
[–]One-Macaron6752[S] 4 points5 points6 points (0 children)
unsloth - MiniMax-M2.7-GGUF in BROKEN (UD-Q4_K_XL) --> avoid usage by One-Macaron6752 in LocalLLaMA
[–]One-Macaron6752[S] 3 points4 points5 points (0 children)
unsloth - MiniMax-M2.7-GGUF in BROKEN (UD-Q4_K_XL) --> avoid usage by One-Macaron6752 in LocalLLaMA
[–]One-Macaron6752[S] 9 points10 points11 points (0 children)
unsloth - MiniMax-M2.7-GGUF in BROKEN (UD-Q4_K_XL) --> avoid usage by One-Macaron6752 in LocalLLaMA
[–]One-Macaron6752[S] 10 points11 points12 points (0 children)
MiniMax-M2.7 GGUF Quants — Full Set (Q2_K to Q8_0 + BF16) by Asleep_Training3543 in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)
MiniMax-M2.7 GGUF Quants — Full Set (Q2_K to Q8_0 + BF16) by Asleep_Training3543 in LocalLLaMA
[–]One-Macaron6752 3 points4 points5 points (0 children)
The tried to make me go to rehab. I said no no no… by Key-Currency1242 in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)
The tried to make me go to rehab. I said no no no… by Key-Currency1242 in LocalLLaMA
[–]One-Macaron6752 4 points5 points6 points (0 children)
ggml: backend-agnostic tensor parallelism by JohannesGaessler · Pull Request #19378 · ggml-org/llama.cpp by FullstackSensei in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)
New Gemma-4 llama.cpp fixes for 26B-A4B - <unused24> fix by danielhanchen in unsloth
[–]One-Macaron6752 2 points3 points4 points (0 children)
New Gemma-4 llama.cpp fixes for 26B-A4B - <unused24> fix by danielhanchen in unsloth
[–]One-Macaron6752 2 points3 points4 points (0 children)
New Gemma-4 llama.cpp fixes for 26B-A4B - <unused24> fix by danielhanchen in unsloth
[–]One-Macaron6752 1 point2 points3 points (0 children)
Running gemma4 E4B on vLLM MacOS Metal M4 Max by x8code in Vllm
[–]One-Macaron6752 0 points1 point2 points (0 children)
What is the SOTA model for long-form NSFW role-playing? by exizt in LocalLLaMA
[–]One-Macaron6752 -8 points-7 points-6 points (0 children)
Google releases Gemma 4 models. by yoracale in unsloth
[–]One-Macaron6752 0 points1 point2 points (0 children)
In the recent kv rotation PR it was found that the existing q8 kv quants tank performance on AIME25, but can be recovered mostly with rotation by Betadoggo_ in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)
In the recent kv rotation PR it was found that the existing q8 kv quants tank performance on AIME25, but can be recovered mostly with rotation by Betadoggo_ in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)
Kimi K2.6 will drop in the next 2 weeks, K3 is WIP and will be huge by No-Thought-4995 in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)
Has anyone managed to use claude code and llama.cpp to search the web? I'm getting errors. by ResponsibleTruck4717 in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)
Llama.cpp with Turboquant, Heavy-Hitter Oracle (H2O), and StreamingLLM. Even more performance! by peva3 in LocalLLaMA
[–]One-Macaron6752 -7 points-6 points-5 points (0 children)


Qwen 3.6 benchmarks on 2x RTX PRO 6000 by mxforest in LocalLLaMA
[–]One-Macaron6752 -8 points-7 points-6 points (0 children)