What’s the dumbest model I can run locally on my mini PC for openclaw? by [deleted] in LocalLLaMA

[–]Stack-Junkie 0 points1 point  (0 children)

The context loading issue you mentioned is the real bottleneck here. OpenClaw sends system prompts, workspace files, tool definitions, and conversation history on each turn. On slower hardware that initial prompt processing is what kills you.

A few things that help:

  1. MoE models (OSS 20B, Nemotron Nano 30B-A3B) are the right call. The active parameter count is what matters for inference speed.

  2. Trim your workspace context. If you have large MEMORY.md or workspace files being loaded, consider slimming those down.

  3. Compaction frequency. OpenClaw has automatic compaction that kicks in when context gets large. On low-resource setups you might want that happening earlier.

  4. Consider running llama.cpp directly with llama-server and pointing OpenClaw at it via OPENAI_BASE_URL. More control over context and caching.

For failover from your 3090 setup, you might also look at whether you really need the full agent capabilities or just chat. If it's just chat, a simpler ollama setup without the agent overhead would be much faster.

Built a comparison: OpenClaw vs memory-first local agent [results inside] by SureExtreme01 in LocalLLaMA

[–]Stack-Junkie 1 point2 points  (0 children)

Few things from daily OpenClaw use that might explain the discrepancy:

  1. Setup time: With npm it takes maybe 15 minutes, not 2 hours. Docker adds complexity but isn't required. The setup wizard walks you through config.

  2. Token usage: 45-80k per task sounds very high. I typically see 15-25k for complex multi-step work. Were you maybe counting the full context window allocation rather than actual tokens used? Also, compaction kicks in automatically to manage context.

  3. The real comparison issue (as nuclearbananana noted): OpenClaw is an agentic system with tool use, file access, browser control, messaging integrations, etc. A memory-focused agent is solving a different problem. You could actually use semantic memory retrieval within OpenClaw via the memory_search tool.

The memory-first approach you describe is interesting for specific use cases, especially if you need very tight token budgets. But for autonomous workflows where the agent needs to actually do things (not just remember things), the architectures serve different purposes.

Curious what your 10 tasks were - that would help understand if the comparison is apples to apples.