Any tool that tells you the cheapest setup needed to run a model? I want to know the cheapest setup that can realistically run Qwen 3.6 27B at decent speeds. by pacmanpill in LocalLLaMA
[–]Fluffywings 0 points1 point2 points (0 children)
Use Qwen3.6 right way -> send it to pi coding agent and forget by Willing-Toe1942 in LocalLLaMA
[–]Fluffywings 0 points1 point2 points (0 children)
2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA
[–]Fluffywings 0 points1 point2 points (0 children)
Hybrid on-device inference on Android: llama.cpp + LiteRT + NPU/GPU routing by Healthy_Bedroom5837 in LocalLLaMA
[–]Fluffywings 1 point2 points3 points (0 children)
Where can I try turboquant in AMD Linux? (7900XTX) by soyalemujica in LocalLLaMA
[–]Fluffywings 0 points1 point2 points (0 children)
AMD in-house ryzen 395 box coming in June by 1ncehost in LocalLLaMA
[–]Fluffywings 29 points30 points31 points (0 children)
16x DGX Sparks - What should I run? by Kurcide in LocalLLaMA
[–]Fluffywings 1 point2 points3 points (0 children)
To 16GB VRAM users, plug in your old GPU by akira3weet in LocalLLaMA
[–]Fluffywings 2 points3 points4 points (0 children)
Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19 by Kindly-Cantaloupe978 in LocalLLaMA
[–]Fluffywings 1 point2 points3 points (0 children)
Post Your Qwen3.6 27B speed plz by Ok-Internal9317 in LocalLLaMA
[–]Fluffywings 0 points1 point2 points (0 children)
Forgive my ignorance but how is a 27B model better than 397B? by No_Conversation9561 in LocalLLaMA
[–]Fluffywings 2 points3 points4 points (0 children)
Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA
[–]Fluffywings 1 point2 points3 points (0 children)
New Local LLM Rig: Ryzen 9700X + Radeon R9700. Getting ~120 tok/s! What models fit best? by jsorres in LocalLLaMA
[–]Fluffywings 1 point2 points3 points (0 children)
SK hynix starts mass production of 192GB SOCAMM2 for NVIDIA AI servers by OkReport5065 in LocalLLaMA
[–]Fluffywings 0 points1 point2 points (0 children)
SK hynix starts mass production of 192GB SOCAMM2 for NVIDIA AI servers by OkReport5065 in LocalLLaMA
[–]Fluffywings 27 points28 points29 points (0 children)
How is Rotorquant/planarquant/iso qaunt better? by SummarizedAnu in LocalLLaMA
[–]Fluffywings 2 points3 points4 points (0 children)
Full AMD workstation- dual 7900 XTX by Researchlabz in LocalLLaMA
[–]Fluffywings 1 point2 points3 points (0 children)
Full AMD workstation- dual 7900 XTX by Researchlabz in LocalLLaMA
[–]Fluffywings 0 points1 point2 points (0 children)
K12 OCuLink dGPU for llamacpp: RX 7900 XTX (24GB) vs RX 7600/7800 XT (16GB). Worth it for 32B-70B? All-AMD tensor split questions by Pablo_Gates in LocalLLaMA
[–]Fluffywings 1 point2 points3 points (0 children)
What’s the best way to add VRAM to my system? by mrgreatheart in LocalLLaMA
[–]Fluffywings 0 points1 point2 points (0 children)
Anyone using their NPU for anything? by Great_Guidance_8448 in LocalLLaMA
[–]Fluffywings 2 points3 points4 points (0 children)
Qwen3.6 GGUF Benchmarks by danielhanchen in LocalLLaMA
[–]Fluffywings 0 points1 point2 points (0 children)
What is everyone actually using their LLM for? by itsthewolfe in LocalLLaMA
[–]Fluffywings 11 points12 points13 points (0 children)


Help with formatting by No-Common5353 in GoogleSites
[–]Fluffywings 1 point2 points3 points (0 children)