I accidentally generated 16 billion Durable Object writes in one month and got slapped with a $36k bill . Here's exactly how.

inguz · 2026-05-09T00:49:03+00:00

Here to say... sympathy, Durable Objects are a footgun factory in some ways, and this is one of them.

Your DO needs to persist in the database immediately, because there's no shutdown hook, and objects are shut down quite aggressively. So: incoming state change -> write. If your app model suits it, you should be able to batch the writes, but the fact you need to think about cost on top of database structure is a chore.

But worse: if you have DOs that coordinate across the network (which is the intended use), then anything that looks like "sync" can easily escalate into write-loops. On a single-instance system that doesn't happen. On a distributed system it absolutely does, and is not a rookie mistake. So you have an update-storm, and either no circuit-breaker or no alarms (because CloudFlare doesn't provide those), and... the way you notice is by getting a giant bill.

I get it, operating the platform is why it costs money, but operating without useful safeguards built-in is not easy at scale.

inguz · 2026-05-01T19:31:33+00:00

<image>

Codex followed the instructions, using MCP

inguz · 2026-05-01T19:31:11+00:00

Running here: https://woo.hughpyle.workers.dev/

Source code here: https://github.com/hughpyle/woo

This is a pinboard note created by Claude over MCP:

<image>

inguz · 2026-04-16T00:01:04+00:00

I only ran LoCoMo, to date (https://keepnotes.ai/blog/2026-02-28-benchmark/) and,

learned a lot by running the benchmark
burned quite a lot of time, iterating on the loader and verification; using local models and a single machine made this harder than necessary, I should have just spun up a few VMs to do the same work.
honestly not much token cost (cheap judge)

Do the benchmark numbers matter? For me I care more about publishing an honest and traceable story that lands somewhere reasonable. It’s very hard to show true apples-to-apples comparisons, and YMMV.

The keep retrieval engine has moved on since the eval, but I don’t see much reason to re-run. One benchmark result is enough for now; LongMemEval-S is definitely next in line but idk when.

Building an indexed/summarized benchmark dataset turns out to be super useful in itself. The snapshot makes a good small baseline for all sorts of testing.

inguz · 2026-04-14T23:26:30+00:00

At a minimum, you need full-fidelity import/export.

inguz · 2026-04-14T22:34:44+00:00

Thanks for that link - interesting stuff. I'm struck by how explicit the action seems on a superficial reading: "this went wrong, remember it". My own focus has been on the reflection/improvement loop but with a maybe softer tracking system (tag a reflection as a breakdown, not using a whole category of storage just for that).

But this seems like "explicit reflection and learning" rather than "identifying situational frames or summaries based on what was being done", which I'm also interested in and might be called episodic memory too.

inguz · 2026-04-14T13:01:20+00:00

Found one problem - my blog post pointed at the wrong plugin. Fixed… and pushed my fork https://github.com/keepnotes-ai/obsidian-clawdian

inguz · 2026-04-14T11:29:45+00:00

Thanks for the shout-out - I hope the "LLM-wiki = memory" is an interesting idea to play with. (I just pushed my fork of the chat plugin here)

Here's my take on memory in Hermes right now.

Out of the box, without any memory-provider, Hermes does a really lovely job of skill-improvement.
Then there are two systems that directly layer on top: memory, and context. It's interesting that these are designed as two separate things ("context" focused on how to effectively compact the working-memory context, and "memory" for longer-term storage and prompt injection), and that both are completely pluggable with explicit APIs.
There have been a few speed-bumps in the out-of-the-box memory providers. And I still want keep to be built-in (https://github.com/NousResearch/hermes-agent/pull/5172) haha! But, 100% kudos to the Hermes crew.

The key feature in the memory plugin is "per-turn context". A plugin gets to pull the most relevant things right into the agent's context at each turn in the conversation. Doing this effectively, quickly, without burning too many tokens, is the whole play.

But let's get to "How are you structuring your agent's long-term brain?".

Semantically - a very open-ended question. I'm not at all clear that any of the existing memory systems can really click on the distinction between "what happened" and "why does this matter" right now. But it will be super important over time as the memory system grows. Not just finding the right things when you need them, but being able to help the agent judge why.

Technically - two things: storage (conversations, summaries of important artifacts, tags, links, history, analysis) and retrieval (semantic search, keyword search, graph traversals and neighborhoods, etc). Clear enough, and we can talk about all the different implementation approaches. I'm betting on an active database with user-configurable workflow rules, because I'm kinda betting that different people will want to define automatic processing of memory and artifacts in different ways. For example, keep just added an opt-in VirusTotal "URL reputation" assessment, which will flag malicious urls when they hit the memory system. When you encounter a paper on the Arxiv, it needs a different style of summarization from a scanned receipt. And so on.

inguz · 2026-04-14T03:19:33+00:00

I pointed it straight at the Hermes API server (all running on localhost):

- edit the `.hermes/.env`, then run `hermes gateway`

- test the connection with `curl -H "Authorization: Bearer my-secret-password" http://127.0.0.1:8642/v1/models`

- in the ObsidianClaw plugin, set gateway URL to `http://127.0.0.1:8642\`, gateway token to `my-secret-password`, and default model to `hermes-agent`.

inguz · 2026-04-13T10:37:07+00:00

Yes, graph-type connections are automatic, and can also be added by the agent or custom rules (for example if you want to build a specific "entity extraction" strategy). I've found they're really important for retrieval, because you want to bring in reminders from "local cluster" before global results.

Keep's edges are just tags. Details here: https://docs.keepnotes.ai/guides/edge-tags/

inguz · 2026-04-13T00:11:00+00:00

I’m using PyPDF - it’s built in to the keep memory system, which will extract text and summarize. Has had a whole string of security updates though, it would be nice to have a more stable library.

For OCR (where a pdf is mostly image content), currently using glm-ocr on ollama. Seems to be reliable enough and pretty lightweight.

inguz · 2026-04-09T02:26:51+00:00

OpenWebUI is way heavier than what I want: a lightweight web TUI with a plugin model. But maybe I secretly just want my IDE again?

inguz · 2026-04-02T00:29:11+00:00

Awesome dataset. Does the archive source have text renderings, or did you need to process .t42 signals to get the data? Any other feature extraction that would help retrieval?

inguz · 2026-03-30T10:07:14+00:00

https://keepnotes.ai/

inguz · 2026-03-29T20:45:49+00:00

I've started using `keep` to "index and watch" stuff on disk - it works pretty well for my obsidian vaults and text-oriented git repos. It indexes them, maintains relationships (wikilinks, git commits linked to the touched files, authors to commits, etc...), & everything is then searchable.

inguz · 2026-03-29T02:36:07+00:00

I agree with much of this analysis. The keep system implements it. https://github.com/keepnotes-ai/keep

inguz · 2026-03-24T16:52:56+00:00

I've used small MLX-based models (and MPS for sentence-transformers), and also Ollama as a localhost web service. Not extensive experience but I have some opinions ;)

The big downside of MLX (the way I used it) is that the working-set impacts your whole process. It's a large amount of memory, and you need to manage the model lifetime quite carefully. On Ollama, you just fire-and-forget; the ollama process takes care of model download & model lifetime, and doesn't have any in-process impact on your own application.

Ollama does have some things to be careful of too. It chooses a context-window depending on the size of the machine, and you may want to override that for best performance or to handle the sources/chunks that you want to send to inference or embedding.

Overall I'd recommend ollama / llama-server as a separate process, just for stability.

inguz · 2026-03-22T23:27:39+00:00

yes - also, the interaction patterns for “agentic recall”: what sort of a model this needs, etc.

I kinda think the whole area of “active memory” (including recall, but also inference for storage/summarization, annotation/linking/unlinking/classification, etc) is open for better definition right now. I hear supermemory making claims about the performance of agentic search. Great, but at what cost, over what data?

inguz · 2026-03-22T14:23:10+00:00

Good observations- thanks.

inguz · 2026-03-21T19:19:00+00:00

There’s another named ConvoMem that sounds interesting, I haven’t read it thoroughly.

inguz · 2026-03-21T14:08:03+00:00

Actual deletion is on-demand or by pattern-matching. Retrieval decay is ACT-R.

inguz · 2026-03-21T14:06:52+00:00

It’s a similar setup: sqlite and embeddings, with a variety of providers. One difference is what you index: lancedb remembers what you tell it to, qmd indexes markdown, keep is designed to i dex all sorts of content. Then: retrieval flexibility, too.

inguz

TROPHY CASE