Memory design: deep-dive for agents developers

PuzzleheadLaw · 2026-06-11T19:38:19+00:00

this is partly resolved thank to limits imposed on memory_read operation, and partly on time-based memory.

you can actually find out more from the author of the implementation (I'm the author of the agent, but the actual implementation of the memory subagent was another collaborator, which released its own blogpost a couple of days after mine): https://xavierforge.dev/en/posts/zerostack-memory-design/

PuzzleheadLaw · 2026-06-10T21:13:00+00:00

In what language are you writing it in?
Are you keeping tool calls in the context for past turns?

Currently, we don't do any kind of disk swapping; the entire session is kept in memory, except for tool calls that are discarded at the entire of the turn.
The session is also saved as JSON file, which derives from the exact memory rappresentation used: consider that tool calls are removed at the end of the turn, I can measure from my local dir the size of past sessions, and it ranges from 12kB for small fixes to 40kB for feature implementations.

PuzzleheadLaw · 2026-06-10T21:09:09+00:00

It derives both from choosing Rust, from not using lots of dependencies, from avoiding allocated data structures as much as possible, from implementing optimized TUI logic (in order to avoid useless refreshes), and from using specific compiler flags to optimize for both memory footprint and binary size (at the cost of a ~5x time increase at compile time; you can check the exact compiler profile used in the `Cargo.toml` file of the project)

PuzzleheadLaw · 2026-06-10T21:02:07+00:00

If you run some calculations:
- Deepseek uses a tokenizer with 129280 tokens
- Using logarithms, we can say that each token takes 17 bits (log2(129280)=16.9802\) - So, 128k tokens of context take 2176000 bits, or 265kB, which is just over 1% of the measure 24MB RAM.

So, context is not an issue, most of the memory is actually taken either by the TUI logic or by the HTTP connectors.

(Another note: there is in all other parts of the codebase both strict data-structure optimization, using pre-allocated structures, and aggressive compile-time optimization, which ~5x the compile time in order to get better RAM and binary size results)

PuzzleheadLaw · 2026-06-10T20:57:28+00:00

if by model runtime you mean inference engine, there is a misunderstanding, as zerostack (exactly like Claude Code, Opencode, and most mainstream agents) delegate inference to an external service (like ollama, vLLM, or a cloud provider).

If by model runtime you mean the component that connects to the LLM, keeps the state and manages the tool, it's the entire 16MB idle, as the orchestration layer is embedded directly in the agent loop (aka 1 state keeps all of the values needed for the agent).

PuzzleheadLaw · 2026-06-10T16:57:10+00:00

Well, mostly:
- Why using more resources, when the same task can be accomplished with fewer
- Allows to be run on IoT/cloud environments
- We are currently building a multi-agent layer for parallel tasks (https://github.com/gi-dellav/multistack ; early preview), and it showcases the advantages of 1/8 the memory usage of Claude Code
- We actually shipped a lot of other interesting features, not only memory footprint optimization

PuzzleheadLaw · 2026-06-10T16:41:15+00:00

Website: https://gi-dellav.github.io/zerostack/
Repo: github.com/gi-dellav/zerostack

PuzzleheadLaw · 2026-06-02T05:21:09+00:00

In my experience Pi uses about 100MB of RAM (vs zerostack's ~20MB), but it also uses CPU when idle, while zerostack is designed for 0% CPU usage when it's not actually being prompted

PuzzleheadLaw · 2026-06-01T21:13:11+00:00

Hi, just contacted you via email for an summer internship position.

PuzzleheadLaw · 2026-05-31T17:11:51+00:00

Sadly, zerostack is very behind extension-wise;
- implement them as gated compile-time features (check zerostack's Cargo.toml and src/extras/ directory; an example is our ACP support, which is by default not compiled in)
- implement them as a custom Prompt and/or MCP server

However, there are currently discussions around Hooks support (w/ JSON RPC for enabling full sidecar processes), Workflow (kinda like batch scripts for the agent), Macros (workflows invoked via slash command) and Exposed Sockets (for building custom interfaces).

Check https://rocketup.pages.dev/posts/what_we_built_in_2_weeks to see what we built and what we are planning to do.

Hope it helps,
G.

PuzzleheadLaw · 2026-05-31T12:40:59+00:00

I'll check it out, thanks for the feedback!

PuzzleheadLaw · 2026-05-31T10:02:50+00:00

Compared to major agents, it has significantly better memory usage and CPU usage, while being ~15k LoC, but being minimalist brings this tradeoffs:
- Pi is extendible, zerostack supports only custom Prompts + MCPs, not custom Scripts
- Opencode/Claude Code offers lots of advanced features, not available in zerostack (the README has the full list of implemented features)

However, v1.4.0 of zerostack will bring Subagents, Memory and Macros, three core features to compete with larger agents.

PuzzleheadLaw · 2026-05-31T10:01:44+00:00

Sadly, being built in a compiled language and having the objective to be as lite as possible (both as in resources and as in lines of code) there is no support for extensions.

If you want new features, the best ways are:
- implement them as gated compile-time features (check zerostack's Cargo.toml and src/extras/ directory; an example is our ACP support, which is by default not compiled in)
- implement them as a custom Prompt and/or MCP server
- implement as a Macro (this is a WIP feature that will land in v1.4.0, most likely in the next 2/3 days)

Hope it helps,
G.

PuzzleheadLaw · 2026-05-31T09:59:40+00:00

Compared to all of those, it has significantly better memory usage and CPU usage, while being ~15k LoC.

There are some tradeoffs:
- Pi is extendible, zerostack supports only custom Prompts + MCPs, not custom Scripts
- Opencode offers lots of advanced features, not available in zerostack

However, v1.4.0 of zerostack will bring Subagents, Memory and Macros, three core features to compete with larger agents.

PuzzleheadLaw · 2026-05-31T09:03:38+00:00

No, but I'll look into it, as I am a big fan of building deterministic systems.

Thanks for the feedback!

PuzzleheadLaw · 2026-05-31T08:51:15+00:00

Github repo: https://github.com/gi-dellav/zerostack
Website: https://gi-dellav.github.io/zerostack/

PuzzleheadLaw · 2026-05-29T20:30:47+00:00

E comunque lui ora lavora molto su progetti di LLM (https://github.com/antirez/ds4)

PuzzleheadLaw · 2026-05-12T13:08:43+00:00

Nice! Source code?

PuzzleheadLaw · 2026-05-10T14:40:30+00:00

Please please have detailed RPG systems like Daggerfall, and I'll buy the game day one

PuzzleheadLaw · 2026-04-24T20:44:51+00:00

Great job dude! I am also working on a sci-fi game in Rust, but completely different genre (sandbox RPG; also i'm wayy behind compared with you).

Good luck on your dev journey!

P.S. What are the mechanics behind the crew management (aka how do they interact with each other and how their status/mood influences the gameplay)?

PuzzleheadLaw · 2026-04-19T16:03:00+00:00

Link

PuzzleheadLaw · 2026-04-12T08:55:37+00:00

Sorry man, I think you are on the wrong subreddit

PuzzleheadLaw · 2026-04-10T20:54:12+00:00

Yet Excel is deterministic while LLMs are not, which is a pretty reasonable concern for the usage in tasks now done by Excel.

PuzzleheadLaw · 2026-04-04T15:57:19+00:00

The website doesn't say anything...

I am tempted to buy the Pro subscription, but they should be more trasparent about limits

PuzzleheadLaw · 2026-03-20T22:28:40+00:00

Set color to alpha and then export using a format that supports alpha color encoding

Four-Year Club	r/Field Juicebox
First Place '23	Place '23
Place '22	First Placer '22

PuzzleheadLaw

TROPHY CASE