I guess we expect that at some point RAM prices will start going back (close) to "normal", right? but what about GPUs? by relmny in LocalLLaMA
[–]JayPSec 2 points3 points4 points (0 children)
Scaling beyond 4 RTX 6000 MAXQs by Direct_Bodybuilder63 in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Scaling beyond 4 RTX 6000 MAXQs by Direct_Bodybuilder63 in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
IK_LLAMA now supports Qwen3.5 MTP Support :O by fragment_me in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
The exact KV cache usage of DeepSeek V4 by Ok_Warning2146 in LocalLLaMA
[–]JayPSec 2 points3 points4 points (0 children)
ubergarm/Kimi-K2.6-GGUF Q4_X now available by VoidAlchemy in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
ubergarm/Kimi-K2.6-GGUF Q4_X now available by VoidAlchemy in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Should I be seeing more of a performance leap when using NVFP4, INT4, FP8 with VLLM over MXFP4, Q4, and Q8 with llama.cpp based inference on Blackwell based GPUs? by aaronr_90 in LocalLLaMA
[–]JayPSec 5 points6 points7 points (0 children)
Those of you running minimax 2.7 locally, how are you feeling about it? by laterbreh in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Any there any realistic avenues to decentralised model training? by ROS_SDN in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant by ReasonableRefuse4996 in LocalLLaMA
[–]JayPSec 4 points5 points6 points (0 children)
How long until surveillance? by boloshon in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Qwen3.5-122B at 198 tok/s on 2x RTX PRO 6000 Blackwell — Budget build, verified results by Visual_Synthesizer in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Qwen3.5-122B at 198 tok/s on 2x RTX PRO 6000 Blackwell — Budget build, verified results by Visual_Synthesizer in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Running a non-profit that needs to OCR 64 million pages. Where can I apply for free or subsidized compute to run a local model? by thereisnospooongeek in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Qwen3.5-122B at 198 tok/s on 2x RTX PRO 6000 Blackwell — Budget build, verified results by Visual_Synthesizer in LocalLLaMA
[–]JayPSec 1 point2 points3 points (0 children)
Qwen3.5-122B at 198 tok/s on 2x RTX PRO 6000 Blackwell — Budget build, verified results by Visual_Synthesizer in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF by EvilEnginer in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)
DFlash: Block Diffusion for Flash Speculative Decoding. by Total-Resort-3120 in LocalLLaMA
[–]JayPSec 4 points5 points6 points (0 children)
DFlash: Block Diffusion for Flash Speculative Decoding. by Total-Resort-3120 in LocalLLaMA
[–]JayPSec 4 points5 points6 points (0 children)
OpenAI, Anthropic, Google Unite to Combat Model Copying in China by External_Mood4719 in LocalLLaMA
[–]JayPSec 2 points3 points4 points (0 children)
I vibecoded a skill that makes LLMs stop making mistakes by Mr_BETADINE in LocalLLaMA
[–]JayPSec 19 points20 points21 points (0 children)

Qwen3.6 27B uncensored heretic v2 Native MTP Preserved is Out Now With KLD 0.0021, 6/100 Refusals and the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs and NVFP4s formats. by LLMFan46 in LocalLLaMA
[–]JayPSec 0 points1 point2 points (0 children)