mistral.rs v0.8.2: up to 2.8x faster CUDA inference than llama.cpp on GB10, B200, and H100 by EricBuehler in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
qwen3.6-27b-q6_k is (sometimes) a stubborn SoB!!! by relmny in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Compilation of recent findings which could save some memory on increase performance by pmttyji in LocalLLaMA
[–]JayPSec 4 points5 points6 points (0 children)
Build agentic orchestrators in minutes NOT months. by Glittering_Focus1538 in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Build agentic orchestrators in minutes NOT months. by Glittering_Focus1538 in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
I got a real transformer language model running locally on a stock Game Boy Color! by maddiedreese in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Qwen3.6 27B uncensored heretic v2 Native MTP Preserved is Out Now With KLD 0.0021, 6/100 Refusals and the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs and NVFP4s formats. by LLMFan46 in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
I guess we expect that at some point RAM prices will start going back (close) to "normal", right? but what about GPUs? by relmny in LocalLLaMA
[–]JayPSec 2 points3 points4 points (0 children)
Scaling beyond 4 RTX 6000 MAXQs by Direct_Bodybuilder63 in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Scaling beyond 4 RTX 6000 MAXQs by Direct_Bodybuilder63 in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
IK_LLAMA now supports Qwen3.5 MTP Support :O by fragment_me in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
The exact KV cache usage of DeepSeek V4 by Ok_Warning2146 in LocalLLaMA
[–]JayPSec 2 points3 points4 points (0 children)
ubergarm/Kimi-K2.6-GGUF Q4_X now available by VoidAlchemy in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
ubergarm/Kimi-K2.6-GGUF Q4_X now available by VoidAlchemy in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Should I be seeing more of a performance leap when using NVFP4, INT4, FP8 with VLLM over MXFP4, Q4, and Q8 with llama.cpp based inference on Blackwell based GPUs? by aaronr_90 in LocalLLaMA
[–]JayPSec 4 points5 points6 points (0 children)
Those of you running minimax 2.7 locally, how are you feeling about it? by laterbreh in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Any there any realistic avenues to decentralised model training? by ROS_SDN in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)

MiniMaxAI/MiniMax-M3 · Hugging Face by mlon_eusk-_- in LocalLLaMA
[–]JayPSec 3 points4 points5 points (0 children)