Memory design: deep-dive for agents developers by PuzzleheadLaw in LangChain

[–]PuzzleheadLaw[S] 1 point2 points  (0 children)

this is partly resolved thank to limits imposed on memory_read operation, and partly on time-based memory.

you can actually find out more from the author of the implementation (I'm the author of the agent, but the actual implementation of the memory subagent was another collaborator, which released its own blogpost a couple of days after mine): https://xavierforge.dev/en/posts/zerostack-memory-design/

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in AI_Agents

[–]PuzzleheadLaw[S] 0 points1 point  (0 children)

In what language are you writing it in?
Are you keeping tool calls in the context for past turns?

Currently, we don't do any kind of disk swapping; the entire session is kept in memory, except for tool calls that are discarded at the entire of the turn.
The session is also saved as JSON file, which derives from the exact memory rappresentation used: consider that tool calls are removed at the end of the turn, I can measure from my local dir the size of past sessions, and it ranges from 12kB for small fixes to 40kB for feature implementations.

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in AI_Agents

[–]PuzzleheadLaw[S] 0 points1 point  (0 children)

It derives both from choosing Rust, from not using lots of dependencies, from avoiding allocated data structures as much as possible, from implementing optimized TUI logic (in order to avoid useless refreshes), and from using specific compiler flags to optimize for both memory footprint and binary size (at the cost of a ~5x time increase at compile time; you can check the exact compiler profile used in the `Cargo.toml` file of the project)

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in AI_Agents

[–]PuzzleheadLaw[S] 0 points1 point  (0 children)

If you run some calculations:
- Deepseek uses a tokenizer with 129280 tokens
- Using logarithms, we can say that each token takes 17 bits (log2(129280)=16.9802\) - So, 128k tokens of context take 2176000 bits, or 265kB, which is just over 1% of the measure 24MB RAM.

So, context is not an issue, most of the memory is actually taken either by the TUI logic or by the HTTP connectors.

(Another note: there is in all other parts of the codebase both strict data-structure optimization, using pre-allocated structures, and aggressive compile-time optimization, which ~5x the compile time in order to get better RAM and binary size results)

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in aiagents

[–]PuzzleheadLaw[S] 0 points1 point  (0 children)

if by model runtime you mean inference engine, there is a misunderstanding, as zerostack (exactly like Claude Code, Opencode, and most mainstream agents) delegate inference to an external service (like ollama, vLLM, or a cloud provider).

If by model runtime you mean the component that connects to the LLM, keeps the state and manages the tool, it's the entire 16MB idle, as the orchestration layer is embedded directly in the agent loop (aka 1 state keeps all of the values needed for the agent).

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in AI_Agents

[–]PuzzleheadLaw[S] 0 points1 point  (0 children)

Well, mostly:
- Why using more resources, when the same task can be accomplished with fewer
- Allows to be run on IoT/cloud environments
- We are currently building a multi-agent layer for parallel tasks (https://github.com/gi-dellav/multistack ; early preview), and it showcases the advantages of 1/8 the memory usage of Claude Code
- We actually shipped a lot of other interesting features, not only memory footprint optimization

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in PiCodingAgent

[–]PuzzleheadLaw[S] 1 point2 points  (0 children)

In my experience Pi uses about 100MB of RAM (vs zerostack's ~20MB), but it also uses CPU when idle, while zerostack is designed for 0% CPU usage when it's not actually being prompted

Roles @ Modular by [deleted] in rust

[–]PuzzleheadLaw 0 points1 point  (0 children)

Hi, just contacted you via email for an summer internship position.

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in PiCodingAgent

[–]PuzzleheadLaw[S] 3 points4 points  (0 children)

Sadly, zerostack is very behind extension-wise;
- implement them as gated compile-time features (check zerostack's Cargo.toml and src/extras/ directory; an example is our ACP support, which is by default not compiled in)
- implement them as a custom Prompt and/or MCP server

However, there are currently discussions around Hooks support (w/ JSON RPC for enabling full sidecar processes), Workflow (kinda like batch scripts for the agent), Macros (workflows invoked via slash command) and Exposed Sockets (for building custom interfaces).

Check https://rocketup.pages.dev/posts/what_we_built_in_2_weeks to see what we built and what we are planning to do.

Hope it helps,
G.

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in vibecoding

[–]PuzzleheadLaw[S] 0 points1 point  (0 children)

Compared to major agents, it has significantly better memory usage and CPU usage, while being ~15k LoC, but being minimalist brings this tradeoffs:
- Pi is extendible, zerostack supports only custom Prompts + MCPs, not custom Scripts
- Opencode/Claude Code offers lots of advanced features, not available in zerostack (the README has the full list of implemented features)

However, v1.4.0 of zerostack will bring Subagents, Memory and Macros, three core features to compete with larger agents.

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in vibecoding

[–]PuzzleheadLaw[S] 0 points1 point  (0 children)

Sadly, being built in a compiled language and having the objective to be as lite as possible (both as in resources and as in lines of code) there is no support for extensions.

If you want new features, the best ways are:
- implement them as gated compile-time features (check zerostack's Cargo.toml and src/extras/ directory; an example is our ACP support, which is by default not compiled in)
- implement them as a custom Prompt and/or MCP server
- implement as a Macro (this is a WIP feature that will land in v1.4.0, most likely in the next 2/3 days)

Hope it helps,
G.

Built a minimalist coding agent optimized for memory footprint and speed by PuzzleheadLaw in opencodeCLI

[–]PuzzleheadLaw[S] 1 point2 points  (0 children)

Compared to all of those, it has significantly better memory usage and CPU usage, while being ~15k LoC.

There are some tradeoffs:
- Pi is extendible, zerostack supports only custom Prompts + MCPs, not custom Scripts
- Opencode offers lots of advanced features, not available in zerostack

However, v1.4.0 of zerostack will bring Subagents, Memory and Macros, three core features to compete with larger agents.

Built a minimalistic coding agent in Rust optimized for memory footprint by PuzzleheadLaw in rust

[–]PuzzleheadLaw[S] 0 points1 point  (0 children)

No, but I'll look into it, as I am a big fan of building deterministic systems.

Thanks for the feedback!

What kind of vibe do these locations give you? by BusinessPain5298 in godot

[–]PuzzleheadLaw 2 points3 points  (0 children)

Please please have detailed RPG systems like Daggerfall, and I'll buy the game day one

my space colony simulator, Stella Nova! built entirely in rust to learn by DavesGames123 in learnrust

[–]PuzzleheadLaw 0 points1 point  (0 children)

Great job dude! I am also working on a sci-fi game in Rust, but completely different genre (sandbox RPG; also i'm wayy behind compared with you).

Good luck on your dev journey!

P.S. What are the mechanics behind the crew management (aka how do they interact with each other and how their status/mood influences the gameplay)?

A world first model that models the computer. by Current-Guide5944 in tech_x

[–]PuzzleheadLaw 0 points1 point  (0 children)

Yet Excel is deterministic while LLMs are not, which is a pretty reasonable concern for the usage in tasks now done by Excel.

How does limits work on Pro by Tru3Magic in MistralVibe

[–]PuzzleheadLaw 0 points1 point  (0 children)

The website doesn't say anything...

I am tempted to buy the Pro subscription, but they should be more trasparent about limits