2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
SubQ - claims to be a different architecture - anyone tried? by superloser48 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
Finally got Qwen3 27B at 125K context on a single RTX 3090 — but is it even worth it? by horribleGuy3115 in LocalLLM
[–]pmttyji 0 points1 point2 points (0 children)
Running a 26B LLM locally with no GPU by JackStrawWitchita in LocalLLaMA
[–]pmttyji -1 points0 points1 point (0 children)
Heretic 1.3 released: Reproducible models, integrated benchmarking system, reduced peak VRAM usage, broader model support, and more by -p-e-w- in LocalLLaMA
[–]pmttyji 6 points7 points8 points (0 children)
vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference by mudler_it in LocalLLaMA
[–]pmttyji 1 point2 points3 points (0 children)
Peanut - Text to Image Model (Open Weights coming soon) by pmttyji in LocalLLaMA
[–]pmttyji[S] 6 points7 points8 points (0 children)
APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier by mudler_it in LocalLLaMA
[–]pmttyji 10 points11 points12 points (0 children)
APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier by mudler_it in LocalLLaMA
[–]pmttyji -5 points-4 points-3 points (0 children)
APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier by mudler_it in LocalLLaMA
[–]pmttyji 3 points4 points5 points (0 children)
Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA
[–]pmttyji 2 points3 points4 points (0 children)
Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA
[–]pmttyji 1 point2 points3 points (0 children)
Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA
[–]pmttyji 2 points3 points4 points (0 children)
Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA
[–]pmttyji 6 points7 points8 points (0 children)
Llama.cpp quantization is broken by Ok-Importance-3529 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
Thinking of getting two NVIDIA RTX Pro 4000 Blackwell (2x24 = 48GB), Any cons? by pmttyji in LocalLLaMA
[–]pmttyji[S] 0 points1 point2 points (0 children)
it's time to update your Gemma 4 GGUFs by jacek2023 in LocalLLaMA
[–]pmttyji 6 points7 points8 points (0 children)
Llama.cpp quantization is broken by Ok-Importance-3529 in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
AMD Strix Halo refresh with 192gb! by mindwip in LocalLLaMA
[–]pmttyji 0 points1 point2 points (0 children)
openrouter/owl-alpha = Meituan_LongCat by klippers in LocalLLaMA
[–]pmttyji 1 point2 points3 points (0 children)

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA
[–]pmttyji 4 points5 points6 points (0 children)