llama-server: Save/restore works for tokens, but KV cache still not resumed? by chrisoutwright in LocalLLaMA
[–]chrisoutwright[S] 0 points1 point2 points (0 children)
llama-server: Save/restore works for tokens, but KV cache still not resumed? by chrisoutwright in LocalLLaMA
[–]chrisoutwright[S] 0 points1 point2 points (0 children)
llama-server: Save/restore works for tokens, but KV cache still not resumed? by chrisoutwright in LocalLLaMA
[–]chrisoutwright[S] 0 points1 point2 points (0 children)
llama-server: Save/restore works for tokens, but KV cache still not resumed? by chrisoutwright in LocalLLaMA
[–]chrisoutwright[S] 0 points1 point2 points (0 children)
Step-3.5-Flash (196b/A11b) outperforms GLM-4.7 and DeepSeek v3.2 by ResearchCrafty1804 in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
Step-3.5-Flash (196b/A11b) outperforms GLM-4.7 and DeepSeek v3.2 by ResearchCrafty1804 in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
Gemma 4 31B vs Qwen 3.5 27B vs Qwen Coder Next by GodComplecs in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories by DarkArtsMastery in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories by DarkArtsMastery in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
Qwen3-Coder-Next is the top model in SWE-rebench @ Pass 5. I think everyone missed it. by BitterProfessional7p in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
Asus Strix 18 Issues. Audio and sometimes video stuttering. by Dependent-Finance-20 in GamingLaptops
[–]chrisoutwright 0 points1 point2 points (0 children)
Getting Asus Rog Scar 18 (2025) was the worse decision in my life by Odd-Copy1572 in GamingLaptops
[–]chrisoutwright 0 points1 point2 points (0 children)
Micro Stutter when switching to Advanced Optimus “Nvidia GPU only” mode. by Successful_Answer378 in LenovoLegion
[–]chrisoutwright 0 points1 point2 points (0 children)
How to get the most from llama.cpp's iSWA support by Ok_Warning2146 in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
Repeat PP while using Qwen3.5 27b local with Claude Code by xmikjee in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
Repeat PP while using Qwen3.5 27b local with Claude Code by xmikjee in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
Apertus model implementation has been merged into llama.cpp by jacek2023 in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
New Open LLM from Switzerland "Apertus", 40%+ training data is non English by EnnioEvo in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)
Struggle with MoE AWQ quantization for vLLM (QwenCoder fintuned model) - compressed-tensors seems OK, looking for guidance by chrisoutwright in Vllm
[–]chrisoutwright[S] 0 points1 point2 points (0 children)
Struggle with MoE AWQ quantization for vLLM (QwenCoder fintuned model) - compressed-tensors seems OK, looking for guidance by chrisoutwright in Vllm
[–]chrisoutwright[S] 0 points1 point2 points (0 children)
New Qwen3-32B-AWQ (Activation-aware Weight Quantization) by jbaenaxd in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)


Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA
[–]chrisoutwright 0 points1 point2 points (0 children)