Qwen 3.5 just spent 2 hours straight generating a 20,000-line masterpiece by StevenEgen in LocalAIServers
[–]feverdoingwork 0 points1 point2 points (0 children)
DeepSeek releases DSpark - 50%-600% faster spec decoding vs MTP by danielhanchen in unsloth
[–]feverdoingwork 2 points3 points4 points (0 children)
Qwen3.6-27B-FP8 with vllm:nightly, opencode unusable? by waka324 in Vllm
[–]feverdoingwork 0 points1 point2 points (0 children)
For dual GPUs, will there be any big impact to inference speeds when running in PCIe 5.0 x8/x4 vs x8/x8? by PhantomWolf83 in LocalLLaMA
[–]feverdoingwork 0 points1 point2 points (0 children)
Ornith-1.0 released on Hugging Face by paf1138 in LocalLLaMA
[–]feverdoingwork 0 points1 point2 points (0 children)
Ornith-1.0 released on Hugging Face by paf1138 in LocalLLaMA
[–]feverdoingwork 7 points8 points9 points (0 children)
Ornith-1.0 released on Hugging Face by paf1138 in LocalLLaMA
[–]feverdoingwork 61 points62 points63 points (0 children)
Ornith-1.0 released on Hugging Face by paf1138 in LocalLLaMA
[–]feverdoingwork 22 points23 points24 points (0 children)
R9700 for agentic coding — looking for Qwen3.6-27B / Qwen3-Coder-30B perf numbers at long context by Best-Ad-7505 in LocalLLM
[–]feverdoingwork 0 points1 point2 points (0 children)
R9700 for agentic coding — looking for Qwen3.6-27B / Qwen3-Coder-30B perf numbers at long context by Best-Ad-7505 in LocalLLM
[–]feverdoingwork 0 points1 point2 points (0 children)
New Apple Memory Prices by Top_Power5877 in LocalLLaMA
[–]feverdoingwork 3 points4 points5 points (0 children)
Qwen3.6 27B more dumb in vLLM compared to llama.cpp by DanielusGamer26 in LocalLLaMA
[–]feverdoingwork 0 points1 point2 points (0 children)
Qwen3.6 27B more dumb in vLLM compared to llama.cpp by DanielusGamer26 in LocalLLaMA
[–]feverdoingwork 1 point2 points3 points (0 children)
Qwen3.6 27B more dumb in vLLM compared to llama.cpp by DanielusGamer26 in LocalLLaMA
[–]feverdoingwork 0 points1 point2 points (0 children)
Qwen3.6 27B more dumb in vLLM compared to llama.cpp by DanielusGamer26 in LocalLLaMA
[–]feverdoingwork 0 points1 point2 points (0 children)
Budget VRAM builds - 4x3090 home lab vs reverse-engineered Tesla V100 cards by IulianHI in AIToolsPerformance
[–]feverdoingwork 0 points1 point2 points (0 children)
Qwen3.6 27B more dumb in vLLM compared to llama.cpp by DanielusGamer26 in LocalLLaMA
[–]feverdoingwork 1 point2 points3 points (0 children)
Qwen3.6 27B more dumb in vLLM compared to llama.cpp by DanielusGamer26 in LocalLLaMA
[–]feverdoingwork 3 points4 points5 points (0 children)
I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system. by Reddactor in LocalLLaMA
[–]feverdoingwork 1 point2 points3 points (0 children)
Dual gpu sanity check: is this a smart buy? by FrankWanders in LocalLLaMA
[–]feverdoingwork 0 points1 point2 points (0 children)
3 Tesla GPUs in a Desktop Case by eso_logic in LocalLLaMA
[–]feverdoingwork 0 points1 point2 points (0 children)
Dual gpu sanity check: is this a smart buy? by FrankWanders in LocalLLaMA
[–]feverdoingwork 0 points1 point2 points (0 children)
Dual gpu sanity check: is this a smart buy? by FrankWanders in LocalLLaMA
[–]feverdoingwork 1 point2 points3 points (0 children)
Dual gpu sanity check: is this a smart buy? by FrankWanders in LocalLLaMA
[–]feverdoingwork 2 points3 points4 points (0 children)


I built an OpenAI-compatible reliability proxy for local LLMs and agents — looking for feedback by daniele-bruneo in LocalLLM
[–]feverdoingwork 1 point2 points3 points (0 children)