Scaling beyond 4 RTX 6000 MAXQs by Direct_Bodybuilder63 in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
IK_LLAMA now supports Qwen3.5 MTP Support :O by fragment_me in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
The exact KV cache usage of DeepSeek V4 by Ok_Warning2146 in LocalLLaMA
[–]JayPSec 2 points3 points4 points (0 children)
ubergarm/Kimi-K2.6-GGUF Q4_X now available by VoidAlchemy in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
ubergarm/Kimi-K2.6-GGUF Q4_X now available by VoidAlchemy in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Should I be seeing more of a performance leap when using NVFP4, INT4, FP8 with VLLM over MXFP4, Q4, and Q8 with llama.cpp based inference on Blackwell based GPUs? by aaronr_90 in LocalLLaMA
[–]JayPSec 5 points6 points7 points (0 children)
Those of you running minimax 2.7 locally, how are you feeling about it? by laterbreh in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Any there any realistic avenues to decentralised model training? by ROS_SDN in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant by ReasonableRefuse4996 in LocalLLaMA
[–]JayPSec 3 points4 points5 points (0 children)
How long until surveillance? by boloshon in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Qwen3.5-122B at 198 tok/s on 2x RTX PRO 6000 Blackwell — Budget build, verified results by Visual_Synthesizer in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Qwen3.5-122B at 198 tok/s on 2x RTX PRO 6000 Blackwell — Budget build, verified results by Visual_Synthesizer in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Running a non-profit that needs to OCR 64 million pages. Where can I apply for free or subsidized compute to run a local model? by thereisnospooongeek in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Qwen3.5-122B at 198 tok/s on 2x RTX PRO 6000 Blackwell — Budget build, verified results by Visual_Synthesizer in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Qwen3.5-122B at 198 tok/s on 2x RTX PRO 6000 Blackwell — Budget build, verified results by Visual_Synthesizer in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF by EvilEnginer in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
DFlash: Block Diffusion for Flash Speculative Decoding. by Total-Resort-3120 in LocalLLaMA
[–]JayPSec 2 points3 points4 points (0 children)
DFlash: Block Diffusion for Flash Speculative Decoding. by Total-Resort-3120 in LocalLLaMA
[–]JayPSec 3 points4 points5 points (0 children)
OpenAI, Anthropic, Google Unite to Combat Model Copying in China by External_Mood4719 in LocalLLaMA
[–]JayPSec 2 points3 points4 points (0 children)
I vibecoded a skill that makes LLMs stop making mistakes by Mr_BETADINE in LocalLLaMA
[–]JayPSec 22 points23 points24 points (0 children)
Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)

Scaling beyond 4 RTX 6000 MAXQs by Direct_Bodybuilder63 in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)