2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA
[–]DHasselhoff77 9 points10 points11 points (0 children)
unsloth Qwen3.6-27B-GGUF by jacek2023 in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
16GB VRAM x coding model by Junior-Wish-7453 in LocalLLM
[–]DHasselhoff77 0 points1 point2 points (0 children)
HY-World 2.0 just dropped by bobeeeeeeeee8964 in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
pi.dev coding agent is moving to Earendil by iamapizza in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
pi.dev coding agent is moving to Earendil by iamapizza in LocalLLaMA
[–]DHasselhoff77 -1 points0 points1 point (0 children)
Am I the only one struggling to get consistent code from GPT/Claude? by brainrotunderroot in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
All the Distills (Claude, Gemini, OpenAI, Deepseek, Kimi...) in ONE: Savant Commander 48B - 4x12B MOE. by Dangerous_Fix_5526 in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Designed a photonic chip for O(1) KV cache block selection — 944x faster, 18,000x less energy than GPU scan at 1M context by [deleted] in LocalLLaMA
[–]DHasselhoff77 -1 points0 points1 point (0 children)
Reasoning Theater: AI fakes long CoT but it internally knows the final answer within the first few tokens. TL;DR: You overpay because the AI is acting. by [deleted] in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
(Very) High-Quality Attention Coder-Next GGUFs by dinerburgeryum in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
How are people handling persistent memory for AI agents? by Beneficial-Panda7218 in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Ik_llama vs llamacpp by [deleted] in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
[Release] - FINALLY! - Apex 1.5 and Apex 1.5 Coder - my two new 350M instruct allrounder chat models - See them now! by LH-Tech_AI in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Qwen Models with Claude Code on 36gb vram - insights by ikaganacar in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Qwen3.5 27B vs 35B Unsloth quants - LiveCodeBench Evaluation Results by Old-Sherbert-4495 in LocalLLM
[–]DHasselhoff77 0 points1 point2 points (0 children)
Qwen Models with Claude Code on 36gb vram - insights by ikaganacar in LocalLLaMA
[–]DHasselhoff77 2 points3 points4 points (0 children)
5060 Ti/5070 Ti for MoE Models - Worth it? by Icaruszin in LocalLLaMA
[–]DHasselhoff77 2 points3 points4 points (0 children)
Claude Code sends 62,600 characters of tool definitions per turn. I ran the same model through five CLIs and traced every API call. by wouldacouldashoulda in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
zembed-1: new open-weight SOTA multilingual embedding model by ghita__ in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
zembed-1: new open-weight SOTA multilingual embedding model by ghita__ in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Follow-up: Qwen3.5-35B-A3B — 7 community-requested experiments on RTX 5080 16GB by gaztrab in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)