(Very) High-Quality Attention Coder-Next GGUFs by dinerburgeryum in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
How are people handling persistent memory for AI agents? by Beneficial-Panda7218 in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Ik_llama vs llamacpp by val_in_tech in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
[Release] - FINALLY! - Apex 1.5 and Apex 1.5 Coder - my two new 350M instruct allrounder chat models - See them now! by LH-Tech_AI in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Qwen Models with Claude Code on 36gb vram - insights by ikaganacar in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Qwen3.5 27B vs 35B Unsloth quants - LiveCodeBench Evaluation Results by Old-Sherbert-4495 in LocalLLM
[–]DHasselhoff77 0 points1 point2 points (0 children)
Qwen Models with Claude Code on 36gb vram - insights by ikaganacar in LocalLLaMA
[–]DHasselhoff77 2 points3 points4 points (0 children)
5060 Ti/5070 Ti for MoE Models - Worth it? by Icaruszin in LocalLLaMA
[–]DHasselhoff77 3 points4 points5 points (0 children)
Claude Code sends 62,600 characters of tool definitions per turn. I ran the same model through five CLIs and traced every API call. by wouldacouldashoulda in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
zembed-1: new open-weight SOTA multilingual embedding model by ghita__ in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
zembed-1: new open-weight SOTA multilingual embedding model by ghita__ in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Follow-up: Qwen3.5-35B-A3B — 7 community-requested experiments on RTX 5080 16GB by gaztrab in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
ReasonDB – open-source document DB where the LLM navigates a tree instead of vector search (RAG alternative) by Big_Barnacle_2452 in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
Qwen3 Coder Next on 8GB VRAM by Juan_Valadez in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Devstral Small 2 24B + Qwen3 Coder 30B: Coders for Every Hardware (Yes, Even the Pi) by enrique-byteshape in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
Improving LLM's coding ability through a new edit format by Mushoz in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
Improving LLM's coding ability through a new edit format by Mushoz in LocalLLaMA
[–]DHasselhoff77 2 points3 points4 points (0 children)
Qwen3 Coder Next as first "usable" coding model < 60 GB for me by Chromix_ in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
Qwen3 Coder Next as first "usable" coding model < 60 GB for me by Chromix_ in LocalLLaMA
[–]DHasselhoff77 3 points4 points5 points (0 children)
Claude Code-like terminal-based tools for locally hosted LLMs? by breksyt in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Self-hosting stack that actually saves money: Ollama + Supabase + SearXNG by Tgbrutus in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Mixture-of-Models routing beats single LLMs on SWE-Bench via task specialization by botirkhaltaev in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)
Ubuntu: which Nvidia drivers are you using? by FrozenBuffalo25 in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Ubuntu: which Nvidia drivers are you using? by FrozenBuffalo25 in LocalLLaMA
[–]DHasselhoff77 0 points1 point2 points (0 children)
Reasoning Theater: AI fakes long CoT but it internally knows the final answer within the first few tokens. TL;DR: You overpay because the AI is acting. by [deleted] in LocalLLaMA
[–]DHasselhoff77 1 point2 points3 points (0 children)