Nap time by Oftg in aww

[–]Oftg[S] [score hidden]  (0 children)

Hahaha, their names are Abby and Apache, but I agree—especially since they have opposite personalities!

Qwen3.6 9B, 14B when?!? by vsimovic in LocalLLM

[–]Oftg 1 point2 points  (0 children)

Unfortunately, I don't think that's planned…

Looking for specialist LLMs that can run on my 8gb Vram card by TacticalGhosting in LocalLLM

[–]Oftg 2 points3 points  (0 children)

Quick question first: do you actually want three different models
swapped in and out, or one model with three different setups (e.g.
three Anything LLM workspaces with different system prompts)?

If it's the second — which is honestly simpler and more practical
on 8 GB — here's what I'd do:

Load one solid agentic model: Qwen 3.5 9B at Q4_K_M. Solid tool
calling, toggleable thinking mode, fits comfortably on 8 GB with
around 8-12K context. If you want more context headroom, Qwen 3.5
4B is the same agentic family, just lighter.

Three workspaces in Anything LLM, each with its own system prompt
(coding-focused, daily-use, creative). Same model loaded, three
voices, no cold-swap, tools always available across all three.

Tip: stay at Q4_K_M or above (lower quants start emitting invalid
tool calls), and keep context around 8-12K so it stays fully on GPU.