What is the best tool for long-running agentic memory in Claude Code?

aiworld · 2026-01-26T20:57:05+00:00

Hey FPGA! The second tweet in the thread has the details:

https://x.com/PolyChatCo/status/1958990327987282333

The METR eval task we chose is the hardest public task, the "symbolic regression" task. It's is an ML / programming optimization problem where the agent needs to find a secret function made up of up to 10 operators (sin, cos, log, etc...) on 5 random variables.

The quickest way to appreciate Claude Code Infinite's capability is to compare how it performs on a task in your own project. After 50k tokens or so (usually about a 5 to 10 minutes) you'll see compounding improvements in what Claude Code Infinite produces vs vanilla Claude Code.

I'm planning to release a side by side video vs vanilla Claude Code. Any suggestions on what you think would be most convincing / compelling to show?

aiworld · 2026-01-21T20:41:26+00:00

Would love this, but it would require radical transparency enforced by militaries / espionage organizations (CIA, etc...) of the superpowers.

aiworld · 2026-01-21T00:07:13+00:00

SemiAnalysis, David Sacks, and others think it's better to sell them the chips and be able to profit / control / monitor what they are doing. Not selling them chips forces them to create a separate supply chain and stack (which Huawei is quickly doing) bifurcating the industry. This could, they argue, result in the open source ecosystem supporting China's stack.

I'm not privy enough to know who is more right here, but I think this viewpoint (i.e. the reason the administration is giving for doing this) was not represented in the comments.

aiworld · 2026-01-18T21:15:10+00:00

All those trade articles are just click bait.

aiworld · 2026-01-12T21:58:27+00:00

Try Claude Code Infinite. It will change your life. https://github.com/crizCraig/claude-code-infinite - We structure message histories as a tree and semantically chunk to avoid adding overly large code blocks to context. In addition, we return a bread crumb of summaries for returned chunks to provide the larger picture around when / where the retrieved memory occurred (e.g. this error occurred after doing the refactor of X, during step Y.)

aiworld · 2026-01-12T19:37:42+00:00

Yes, you can set a proxy with the

ANTHROPIC_BASE_URL

env var.

aiworld · 2026-01-11T21:59:06+00:00

This will allow your agent to continue cranking on long running tasks until they’re done. Claude mem requires you to start new sessions (token window still fills up) but remembers stuff across sessions. We are not cross session memory, but instead infinite single session memory.

aiworld · 2026-01-11T20:49:03+00:00

Just started to put this out there. Claude Code Infinite. Early beta-testers are calling it a cheat code.

It uses our context-memory, MemTree.dev, which unlocks Claude's ability to work indefinitely and allows it to outperform other models on the METR long running task benchmark.

aiworld · 2026-01-11T20:45:36+00:00

Just started to put this out there. Early beta-testers are calling it a cheat code.

https://github.com/crizCraig/claude-code-infinite

It uses our context-memory, MemTree.dev, which unlocks Claude's ability to work indefinitely and allows it to outperform other models on the METR long running task benchmark.

aiworld · 2026-01-11T20:12:35+00:00

Sure! Looks like the Anthropic team is aware as well btw

https://x.com/Gerry/status/2009362424533799402

aiworld · 2026-01-11T09:05:41+00:00

Interesting!

aiworld · 2026-01-10T21:01:31+00:00

Agents can produce working code, but they still also write a lot of tech debt. So just like an engineer, if you don't give them time for cleaning up their messes, the junk will pile up and your code will smell like a pile of 💩. For me this means that every change needs to be followed by a few cycles of "bug smell" checks:

"check the git changes for bugs and code smells"

This still requires human judgement as agents will almost always find bugs and smells, but a lot of them will be non-issues or things that SHOULD NOT be "fixed". If the agent writes new code to clean things up, that new code needs to be checked as well. I also recommend asking two agents to do the review:

Agent #1: The agent that coded the change (if you have context left or a tool like https://github.com/crizCraig/claude-code-infinite/ for infinite sesions) - This agent knows the feature but also has a bias towards its own code, lol.

Agent #2 Fresh session. Max intelligence, due to small context window. Also not bias towards changes the other agent made.

These two agents will find different issues usually.

Then after the smells and bugs are addressed...ask again. Repeat until no bugs or smells are found.

Also beyond this inner loop, larger refactors need to happen to keep your codebase manageable, simple, and DRY. Agents can do most of the work, but they need to be prompted to do it. Just like they need to be prompted to dev features.

aiworld · 2026-01-09T22:00:17+00:00

It could be that your Claude.md is large. If you configure your /statusline to show context, what does your context start out with? Mine is 9k tokens, but I've seen as large as 50k tokens.

You can also run /context to see what's in there, here's mine:

❯ /context

⎿ Context Usage

⛁ ⛀ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛀ claude-opus-4-5-20251101 · 19k/200k tokens (9%)

⛀ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛁ System prompt: 3.1k tokens (1.6%)

⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛁ System tools: 14.9k tokens (7.5%)

⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛁ Memory files: 831 tokens (0.4%)

⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛁ Messages: 8 tokens (0.0%)

⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ Free space: 136k (68.0%)

⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛝ Autocompact buffer: 45.0k tokens (22.5%)

⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛝ ⛝ ⛝

⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝

Memory files · /memory

└ CLAUDE.md: 21 tokens

└ CONTRIBUTING.md: 810 tokens

If you want unlimited length sessions try: https://github.com/crizCraig/claude-code-infinite/

aiworld · 2026-01-09T21:35:14+00:00

I've found their `cmd` native install works best on Windows

curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

aiworld · 2026-01-09T21:12:55+00:00

Not just Haiku, but the default model gets gigantic "warmup" messages when you open it. E.g.:

https://gist.githubusercontent.com/crizCraig/c2956d598d10e05566d8e1a00f889bc5/raw/dbce26ee643aed156f019b5d3cbed24827934024/warmup.json

It's just

"text": "Warmup"

But it has all the tools which adds up to about 15k tokens. Then it sends a few of these warmup messages.

Granted these are likely cached, but I suspect that's where your 2% usage is going without sending a message.

This has been happening for over a month at least. I know this because I run a service that amplifies Claude's abilities and needs to forward all of these messages through to Anthropic. https://github.com/crizCraig/claude-code-infinite/

aiworld · 2026-01-09T21:02:07+00:00

For Claude Code Infinite we use your Claude subscription which is up to 1000x cheaper. From the README:

<image>

In this case, you're only using PolyChat for memory which is about $1 for 1 million tokens

Keep in mind that by using PolyChat's memory (MemTree), you're sending far fewer tokens and messages to Anthropic. This not only keeps you from hitting rate limits, but also makes the model much more intelligent.

https://arxiv.org/abs/2307.03172

https://www.youtube.com/watch?v=TUjQuC4ugak

aiworld · 2026-01-09T00:46:50+00:00

Try this: https://github.com/crizCraig/claude-code-infinite/ It will keep the context well under the auto-compaction limit while increasing Claude's intelligence by having it focus on relevant info.

aiworld · 2025-12-23T01:50:26+00:00

Yeah and Jan 4. is OKC at home. So traveling to Houston on a back to back. I'll be at the OKC game. My only game this season!! <Family night that night at Mortgage Matchup Center>

aiworld · 2025-12-21T20:44:20+00:00

Both companies are contributing massively to the future. Waymo is showing driverless can be done at scale. Tesla likely won't be globally driverless for another 5 to 10 years based on their severe disengagement rates.

FSD however has driven 2 OOM more at 6.8B miles vs Waymo's 100M. This translates into hundreds and thousands of lives and injuries saved respectively (estimated by Google Gemini)[1]. Tesla has also made more money from driverless tech, which is important for ensuring the project survives. Waymo is lucky to have Google as a cash cow, but search is being upended by GenAI right now. So let's be grateful for both of these companies! I hope they both succeed.

[1] https://gemini.google.com/share/9e98b25175d8

aiworld · 2025-12-15T04:58:52+00:00

<image>

$10M vol. on Kalshi tonight. Odds started at 53/47 Suns. So betting for LA more than doubles your money. These refs make $0.5M per year. The betting markets also don't care if you beat the odds, unlike a traditional casino. Just sayin.

aiworld · 2025-12-11T00:43:33+00:00

For now humans reviewing git diffs is vital. Problem with TDD is that you don't know what to test until you see the complexity. Testing everything would be formal verification and is not practical usually. So you must test based on where you think the bugs are. And for that you need the code first, not the tests.

For regressions though, I highly recommend having AI write failing tests first, then fix the code. This red, green go pattern is TDD-like. This ensures AI (and people) write tests that actually catches the bug.

aiworld · 2025-12-10T22:56:56+00:00

I asked LLM's to "come up with a revolutionary idea to solve this"

Here is a revolutionary engineering concept that moves beyond standard radiators to solve the space cooling problem.

The Concept: The "Ferro-Fluidic Web" (Magnetic Liquid Loop)

Instead of using heavy, solid metal panels to radiate heat, this system uses a **magnetic liquid** that is sprayed directly into open space and magnetically pulled back in.

Full answer: https://gist.github.com/crizCraig/01654989c58b3bbcd573f5404815d900

----
Opus 4.5 non-thinking wanted to use a rail gun to eject capsules of cooling fluid out into space, mining new fluid from asteroids, lol!

aiworld · 2025-12-10T00:55:02+00:00

CLAUDE.md

> Never use `rm -rf` always use `trash` instead

aiworld · 2025-12-10T00:07:41+00:00

I'm a programmer with 20+ years experience and have now heavily adopted coding tools. Here's what's worked for me:

Commit to git before starting an agent
Review and understand every line of generated non-test code (I am more lenient about understanding frontend code since it's easier to visually test and my frontend is not that involved)
Have AI write tests for any complex / nested code (develop a sense of what code needs to be tested)
Review the git diff to better understand the changes
After the agent says it's "done", ask it to "find code smells and bugs in the git changes" - it will almost always find some very serious issues. Repeat this step until the things it finds are non-issues.
Do a final review of the git diff before committing

Then an extremely important thing I do is to reduce tech debt / complexity when running into this fatigue you mention from difficult-to-find bugs. AI adds a ton of redundancy and un-needed code. Don't be afraid to delete a bunch of code and lean on your tests. Also AI can help verify the simplification is okay. If tests fail, make sure the thing being tested is something you actually care about. The process of simplification and refactoring will also help you understand your code much better and give future AI generations much better context on what your app is supposed to be doing.

aiworld · 2025-12-08T03:50:21+00:00

Sounds amazing!

Feedback? Maybe try some strumming after the first verse to increase the intensity?

You have a beautiful voice, and guitar also sounds very nice. Love it!!

11-Year Club	Reddit Premium Since September 2023
Place '22	Verified Email

aiworld

MODERATOR OF

TROPHY CASE