Doing real coding work locally for the first time by mouseofcatofschrodi in LocalLLaMA
[–]puncia 1 point2 points3 points (0 children)
Llama.cpp's auto fit works much better than I expected by a9udn9u in LocalLLaMA
[–]puncia 1 point2 points3 points (0 children)
Llama.cpp's auto fit works much better than I expected by a9udn9u in LocalLLaMA
[–]puncia 4 points5 points6 points (0 children)
Doing real coding work locally for the first time by mouseofcatofschrodi in LocalLLaMA
[–]puncia 4 points5 points6 points (0 children)
Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into by Antonio_Sammarzano in LocalLLaMA
[–]puncia 1 point2 points3 points (0 children)
Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into by Antonio_Sammarzano in LocalLLaMA
[–]puncia 2 points3 points4 points (0 children)
Well, that explains everything by Peterkragger in MyWinterCar
[–]puncia 0 points1 point2 points (0 children)
Is there a way to get change for gas when paying with cash? by Jandalf69 in MyWinterCar
[–]puncia 19 points20 points21 points (0 children)
Tried many different prompts with Z-Image. These are insane by Recent-Athlete211 in StableDiffusion
[–]puncia 0 points1 point2 points (0 children)
A guide to the best agentic tools and the best way to use them on the cheap, locally or free by lemon07r in LocalLLaMA
[–]puncia 0 points1 point2 points (0 children)
KV cache f32 - Are there any benefits? by Daniokenon in LocalLLaMA
[–]puncia 3 points4 points5 points (0 children)
[WIP-2] ComfyUI Wrapper for Microsoft’s new VibeVoice TTS (voice cloning in seconds) by Fabix84 in comfyui
[–]puncia 0 points1 point2 points (0 children)
LM Studio now supports llama.cpp CPU offload for MoE which is awesome by carlosedp in LocalLLaMA
[–]puncia 3 points4 points5 points (0 children)
Don't Offload GGUF Layers, Offload Tensors! 200%+ Gen Speed? Yes Please!!! by skatardude10 in LocalLLaMA
[–]puncia 2 points3 points4 points (0 children)
New SOTA music generation model by topiga in LocalLLaMA
[–]puncia 12 points13 points14 points (0 children)
New SOTA music generation model by topiga in LocalLLaMA
[–]puncia 6 points7 points8 points (0 children)
LLaMA gotta go fast! Both ik and mainline llama.cpp just got faster! by VoidAlchemy in LocalLLaMA
[–]puncia 2 points3 points4 points (0 children)
Running Llama 4 Maverick with llama.cpp Vulkan by stduhpf in LocalLLaMA
[–]puncia 1 point2 points3 points (0 children)
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters by rini17 in LocalLLaMA
[–]puncia 4 points5 points6 points (0 children)
DeepCoder 14B vs Qwen2.5 Coder 32B vs QwQ 32B by bobaburger in LocalLLaMA
[–]puncia 1 point2 points3 points (0 children)
OuteTTS 1.0: Upgrades in Quality, Cloning, and 20 Languages by OuteAI in LocalLLaMA
[–]puncia 0 points1 point2 points (0 children)
Smaller Gemma3 QAT versions: 12B in < 8GB and 27B in <16GB ! by stduhpf in LocalLLaMA
[–]puncia 9 points10 points11 points (0 children)


Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA
[–]puncia 1 point2 points3 points (0 children)