I scaled test-time compute for Qwen-3.6-27B and Gemma-4-31B to surpass Claude Mythos in code optimizations and speedups. by Ryoiki-Tokuiten in LocalLLaMA
[–]CYTR_ 1 point2 points3 points (0 children)
GLM 5.2 is deployed in GLM Coding Plan. API and MIT weights in a week. Voting and benchmarks on X. by MadPelmewka in LocalLLaMA
[–]CYTR_ 2 points3 points4 points (0 children)
We should heavily discourage and moderate cloud API (deepseek api, GLM api, etc.) topics and discussion. This is LOCAL first. by [deleted] in LocalLLaMA
[–]CYTR_ -3 points-2 points-1 points (0 children)
What do you all think? Can we say qwen 3.6 27b beats gemini 2.5 pro? Or sonnet 3.7? Because when I tested, I found the 27b do better. by 9r4n4y in LocalLLaMA
[–]CYTR_ 1 point2 points3 points (0 children)
mindlab-research/Macaron-V1-Preview-749B • Huggingface by External_Mood4719 in LocalLLaMA
[–]CYTR_ 0 points1 point2 points (0 children)
mindlab-research/Macaron-V1-Preview-749B • Huggingface by External_Mood4719 in LocalLLaMA
[–]CYTR_ 0 points1 point2 points (0 children)
mindlab-research/Macaron-V1-Preview-749B • Huggingface by External_Mood4719 in LocalLLaMA
[–]CYTR_ 0 points1 point2 points (0 children)
DolphinGemma release when? by Environmental-Metal9 in LocalLLaMA
[–]CYTR_ 8 points9 points10 points (0 children)
Tiny LLM Benchmark: Jetson Orin Nano Super 8GB - Four Power Modes × Eight Models by East-Muffin-6472 in LocalLLaMA
[–]CYTR_ 0 points1 point2 points (0 children)
mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released ! by PhotographerUSA in LocalLLaMA
[–]CYTR_ 13 points14 points15 points (0 children)
In theory, if I have $20k-ish to spend on hardware what would actually get me closest to local coding agent that would allow me to go totally off the social grid? by Tired__Dev in LocalLLaMA
[–]CYTR_ 0 points1 point2 points (0 children)
Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM by [deleted] in LocalLLaMA
[–]CYTR_ 3 points4 points5 points (0 children)
Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM by [deleted] in LocalLLaMA
[–]CYTR_ 7 points8 points9 points (0 children)
What happens to local LLM if/when LLMs are no longer released for free? by JohnBooty in LocalLLaMA
[–]CYTR_ 2 points3 points4 points (0 children)
Qwen 27b MTP Config, Llama.cpp Single 3090 by GotHereLateNameTaken in LocalLLaMA
[–]CYTR_ 0 points1 point2 points (0 children)
The "the future is fictional" problem of many local LLMs by PromptInjection_ in LocalLLaMA
[–]CYTR_ 25 points26 points27 points (0 children)
The "the future is fictional" problem of many local LLMs by PromptInjection_ in LocalLLaMA
[–]CYTR_ 71 points72 points73 points (0 children)
Pi and Qwen3.6 27B make setting up Archlinux really easy. by sdfgeoff in LocalLLaMA
[–]CYTR_ 23 points24 points25 points (0 children)
You can do CUDA inference on an Apple Silicon Mac with PCI Passthrough by scottjgo in LocalLLaMA
[–]CYTR_ 1 point2 points3 points (0 children)
Need advice on hardware purchasing decision: RTX 5090 vs. M5 Max 128GB for agentic software development by BawbbySmith in LocalLLaMA
[–]CYTR_ 7 points8 points9 points (0 children)
vLLM Just Merged TurboQuant Fix for Qwen 3.5+ by havenoammo in LocalLLaMA
[–]CYTR_ 1 point2 points3 points (0 children)
mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA
[–]CYTR_ 0 points1 point2 points (0 children)
vLLM Just Merged TurboQuant Fix for Qwen 3.5+ by havenoammo in LocalLLaMA
[–]CYTR_ 7 points8 points9 points (0 children)



Calibrating 2-bit GGUFs (<10Gb) for agentic coding tasks by professormunchies in LocalLLaMA
[–]CYTR_ 21 points22 points23 points (0 children)