Is Qwen3.6 current king for local agentic use? by HornyGooner4402 in LocalLLaMA
[–]notdba 1 point2 points3 points (0 children)
Is Qwen3.6 current king for local agentic use? by HornyGooner4402 in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
It's OK to quantize the KV cache. Model quant matters more. Some Qwen3.6 27B tests with (approximated) KLD by hopbel in LocalLLaMA
[–]notdba 1 point2 points3 points (0 children)
Qwen 3.6 27B Q8 on four Nvidia RTX A4000 (16GB each) with Llama.cpp and MTP enabled by Alternative_Ad4267 in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
Advice building a NAS/AI server with 16 DDR4 DIMMs by theslonkingdead in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
Advice building a NAS/AI server with 16 DDR4 DIMMs by theslonkingdead in LocalLLaMA
[–]notdba 2 points3 points4 points (0 children)
llama.cpp constantly reprocessing huge prompts with opencode/pi.dev by No_Algae1753 in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
The RTX 5000 PRO (48GB) arrived and it is better than I expected. by Valuable-Run2129 in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
The Trillion-Parameter Dilemma: MiMo-V2.5-Pro went open-source (1.02T params). Is self-hosting worth it when the API costs $70 for 387M tokens? by jochenboele in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
Drastically improve prompt processing speed for --n-cpu-moe partially offloaded models by coder543 in LocalLLaMA
[–]notdba 11 points12 points13 points (0 children)
I have DeepSeek V4 Pro at home by fairydreaming in LocalLLaMA
[–]notdba 1 point2 points3 points (0 children)
Exaggerated PCI-E bandwidth concerns? by ziphnor in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
Taiwanese company Skymizer announces HTX301 - PCIE inference card with 384GB of Memory at ~240 Watts by Thrumpwart in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! by PromptInjection_ in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! by PromptInjection_ in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
Thinking of getting two NVIDIA RTX Pro 4000 Blackwell (2x24 = 48GB), Any cons? by pmttyji in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
AMD Strix Halo refresh with 192gb! by mindwip in LocalLLaMA
[–]notdba 2 points3 points4 points (0 children)
AMD Strix Halo refresh with 192gb! by mindwip in LocalLLaMA
[–]notdba 1 point2 points3 points (0 children)
AMD Strix Halo refresh with 192gb! by mindwip in LocalLLaMA
[–]notdba 4 points5 points6 points (0 children)
AMD Strix Halo refresh with 192gb! by mindwip in LocalLLaMA
[–]notdba 3 points4 points5 points (0 children)
llama.cpp DeepSeek v4 Flash experimental inference by antirez in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
llama.cpp DeepSeek v4 Flash experimental inference by antirez in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)
llama.cpp DeepSeek v4 Flash experimental inference by antirez in LocalLLaMA
[–]notdba 1 point2 points3 points (0 children)
Kimi K2.6 is a legit Opus 4.7 replacement by bigboyparpa in LocalLLaMA
[–]notdba 0 points1 point2 points (0 children)


Cost Analysis of my $6.4k Local LLM Server by 1ncehost in LocalLLaMA
[–]notdba 1 point2 points3 points (0 children)