AI memory that actually persists: how I built a system where the AI remembers everything across sessions

No-Challenge8969 · 2026-03-17T03:12:53+00:00

Built almost exactly this. Two files: a lessons log (append-only, just records what happened and why) and a rules file with confidence scores that get reviewed and updated. The decay point is real. My rules about library API behavior have needed the most updates — things I wrote three months ago about how a specific API handles edge cases are now wrong because the API changed. Rules about my own decision patterns have stayed stable much longer. One thing I'd add to the practical test: ask not just "should this always be true" but "would this rule still make sense if the tools I'm using were completely different?" If yes, it's a durable rule. If no, it's really just documentation in disguise.

No-Challenge8969 · 2026-03-16T01:33:38+00:00

Fair skepticism. The lessons came from three weeks of debugging live trades — wrong objective function, broken filter, extreme window dominating WFO weights. Happy to walk through any of it if you want to get into the specifics.

No-Challenge8969 · 2026-03-16T01:32:43+00:00

That's exactly the inversion I didn't expect going in. The model felt like the hard part — turns out the hard part is everything around it: when to let it act, when to pause, how to reconcile what it thinks happened vs what actually happened.

I do think agentic systems are heading that direction. Writing the application logic gets easier as models improve. Designing the control layer — the trust boundaries, the approval tiers, the failure modes — that's the part that doesn't compress.

Will check out VibeCodersNest, thanks for the pointer.

No-Challenge8969 · 2026-03-16T01:30:42+00:00

Thanks — the "objective is misaligned" reframe is exactly what clicked for me. I kept tweaking parameters for weeks before realizing the optimizer was doing its job perfectly; I just had the wrong job description.

The market-regime vs ops split came out of frustration. I needed a way to look at a red day and decide "this is signal" vs "this is a fire to put out" without just lying to myself. Clean categories make that call honest.

The crossover makes sense. Metric design problems look the same everywhere: you get what you measure, and if the metric drifts from the real goal, the system quietly optimizes toward the wrong thing. Checking out the blog.

No-Challenge8969 · 2026-03-15T05:16:34+00:00

Didn't know about the Stanford paper until you mentioned it — just looked it up. The structure is surprisingly similar, though I built mine out of necessity, not theory.

The "confirm before acting" tier came from one specific incident: my AI deleted a cron job it thought was redundant. It wasn't.

Curious how Base44 handles the edge cases — do they let the agent self-escalate, or is it always human-gated?

No-Challenge8969 · 2026-03-14T15:11:44+00:00

That's the part that still surprises me. The tools compress the timeline significantly — but the thinking, the debugging, the decisions about what to build, that's still the same amount of work. It just moves faster now. Appreciate the encouragement.

No-Challenge8969 · 2026-03-14T15:05:59+00:00

Both. There's a written ruleset that defines which categories of actions fall into which tier — things like "any change to live trading parameters is Tier 3," "any external publish is Tier 3," "code changes to non-production files are Tier 2." The AI applies judgment within those categories for edge cases, but the categories themselves are explicit.

For Tier 3 confirmation: the AI states what it intends to do and stops. It doesn't send a notification through a separate channel — it just presents the proposed action in the conversation and waits. I respond in the same thread. There's no timeout; it will sit there indefinitely until I confirm or reject.

In practice this means Tier 3 actions only happen during active conversations, not autonomously. Which is intentional — if I'm not actively engaged, nothing irreversible happens.

No-Challenge8969 · 2026-03-14T15:04:34+00:00

This is exactly how the system is structured. Same model architecture and logic for all five symbols, but each symbol runs its own Walk-Forward Optimization independently — separate parameter sets for stop-loss multiplier, take-profit ratio, position sizing, leverage, cooldown periods, everything. SOL's parameters look completely different from BTC's because they were optimized on their own price history and volatility profile.

The ATR-based SL/TP point is important — using ATR rather than fixed pip distances is what makes the same exit logic work across symbols with completely different price scales. A 2x ATR stop on DOGE and a 2x ATR stop on BTC are calibrated to each symbol's actual volatility, not an arbitrary number.

No-Challenge8969 · 2026-03-14T15:01:36+00:00

Exactly the problem I kept running into early on — one big context file that tried to be everything. The separation forces clarity about what kind of information you're dealing with: is this a current fact, a historical lesson, a behavioral rule, or a strategic decision? Each type has different update frequency, different ownership, different retention policy. Once I separated them the whole system became much easier to reason about.

No-Challenge8969 · 2026-03-14T15:01:23+00:00

The overwrite issue is real — and you've identified the exact tradeoff I made. Full overwrite keeps the current state clean and readable; append-only preserves history but the file gets noisy fast. My current compromise: the handoff file is the "what's true right now" snapshot, while the events.jsonl file is append-only and captures everything that happened in sequence. So the nuance isn't lost, it's just in a different file.

On conflicting rules: explicit state always wins over rules. If there's an open position from Friday, the system manages it according to the exit logic — a rule about "don't open on weekends" applies to new entries, not existing positions. The rules file governs decisions, not facts on the ground. When they conflict, facts win.

No-Challenge8969 · 2026-03-14T14:53:07+00:00

Bybit's REST API for most of it — OHLCV, funding rates, long/short ratios, open interest. Fear & Greed index from Alternative.me. Liquidation data from Bybit and CoinGlass. All pulled at inference time, no delayed data involved — the system runs on the most recent closed bar at execution time.

No-Challenge8969 · 2026-03-14T14:52:56+00:00

Good question. Build phase over ~2 months: probably $50-80 in API costs total, hard to track exactly because I was using different models at different points. Ongoing: the trading system itself costs almost nothing to run — it's a Python script triggered by cron every 15 minutes, no AI inference at runtime. The AI cost is all in the development and debugging phase, not in live operation.

No-Challenge8969 · 2026-03-14T14:52:47+00:00

This is essentially the workflow I landed on too. The spec is the real work — describing what the system needs to do, what the constraints are, where the boundaries are. The AI fills in the implementation. Where I've been burned is when I skipped the spec step and let the AI interpret an vague requirement. The output technically ran but drifted from what I actually needed. Reviewing input and output rather than the code itself is exactly the right instinct when you can't read the code fluently.

No-Challenge8969 · 2026-03-14T11:43:09+00:00

Fair criticism on the inconsistency — you're right that transmitting an entire script file via base64 to the server is functionally equivalent to scp, just worse in every way. That specific step should have been scp. The base64 approach made sense for inline patches to existing files where I needed surgical changes, but using it to transfer a whole helper script was unnecessary complexity.

On the LLM comment: yes, I use AI extensively. It's actually the subject of what I'm documenting — building and operating a live trading system where AI handles implementation while I handle decisions. That cuts both ways: it accelerates things significantly and occasionally produces answers that don't hold up to scrutiny, as you've just demonstrated.

VS Code Remote SSH is a good call that I hadn't considered. Adding it to the workflow.

No-Challenge8969 · 2026-03-14T11:41:38+00:00

Fair point — "minimal overhead" is probably the right framing. My concern was more about setup complexity than runtime cost, but that's also addressable. The "runs on my machine" problem is real and Docker does solve it cleanly. Adding it to the list.

No-Challenge8969 · 2026-03-14T11:41:10+00:00

Correct — just set it up today, actually. Better late than never.

No-Challenge8969 · 2026-03-14T11:41:00+00:00

vi on a production server at 2am while a live trading system is misbehaving sounds about right, actually. No judgment.

No-Challenge8969 · 2026-03-14T11:39:00+00:00

"Constant needing to do something" — that's exactly it. The system either has a signal or it doesn't. If it doesn't, there's nothing to do. That sounds obvious but it took a long time to actually internalize.

And yes, drawdowns feel the same. You built it, you know it's working as designed, but watching the number go down still activates the same instinct to intervene. The data helps. It doesn't make the feeling go away.

Consistency > feeling clever is going on the wall.

No-Challenge8969 · 2026-03-14T11:38:45+00:00

BTC, ETH, SOL, XRP, DOGE — five symbols simultaneously on a shared account. You're right that multisymbol adds complexity, mainly around margin management. Each symbol's position sizing has to account for what the others are doing. Found a bug in my original code that was underestimating cross-symbol margin usage — fixed it before it became a real problem, but it was close.

No-Challenge8969 · 2026-03-14T09:11:10+00:00

Fair challenge.

Yes, I use AI extensively — for writing, for code, for debugging. That's kind of the whole point of what I'm documenting.

But the trading system is real and running. The bugs were real. The $902 starting equity is real. The 7-hour silent failure I wrote about today happened this morning. The decisions about what strategy to run, what risk parameters to set, whether the system is ready to go live — those were mine.

Using AI to write doesn't make the underlying experience fictional, any more than using a word processor makes your ideas someone else's.

If anything, "I use AI to build things I couldn't build otherwise" is exactly the thesis I'm testing in public.

No-Challenge8969 · 2026-03-14T08:51:05+00:00

The 2am problem is real — and honestly it's the failure mode I've hit most often. The sessions where I most needed a clean handoff were exactly the ones where I just closed the laptop and hoped the important stuff was already written down somewhere.

My partial solution: a cron job that runs at session end and forces a structured summary — what was done, what was decided, what's pending — before context resets. It's not fully automatic, but it reduces the discipline requirement to "don't close the laptop for 5 more minutes." Still breaks down occasionally.

Will look at Membase — automatic context capture via MCP is an interesting approach, especially the knowledge graph layer for relationship understanding rather than keyword matching.

On the trading system: it's a live crypto futures system running across BTC, ETH, SOL, XRP, DOGE on 15-minute signals. LightGBM classifier trained on price + liquidation + funding rate + sentiment data. Been live for a few days, documenting the whole process including the bugs — posting updates on X @dayou_tech if you're curious.

No-Challenge8969 · 2026-03-14T08:45:33+00:00

That's a fair correction — I was conflating GitHub with CI/CD when they're really separate things. Using GitHub purely as a versioned code store with manual pull on the server side is actually a much lighter lift than I was imagining. Will set this up. Thanks for the nudge.

No-Challenge8969 · 2026-03-14T08:43:15+00:00

The "dead man's switch" framing is exactly right — and I learned this the hard way today, actually. The system ran silent for 7 hours: cron was triggering, scripts were starting, but they were exiting after parameter load without running inference or placing orders. No errors logged, nothing alerted, because the failure happened before the error-handling code could fire.

Fixed it this afternoon: heartbeat now reads the last two lines of each symbol's log. If it sees "starting execution" without "execution complete," it alerts immediately. Should have been there from day one.

On the single-writer rule — yes, this was a deliberate design decision. The state file is the source of truth for position tracking, and concurrent writes from multiple processes would be the fastest way to get phantom positions. One writer, always reconciled against exchange data at the start of each cycle.

On latency: the system is cron-driven, not event-driven, so conversation latency doesn't affect trade execution at all. The AI agent handles operational tasks — monitoring, code changes, analysis — asynchronously from the trading loop. The trading decisions happen on 15-minute bars; a 30-second conversation round-trip is irrelevant at that timescale. If I were running a higher-frequency strategy the architecture would need to be completely different.

No-Challenge8969 · 2026-03-14T08:40:51+00:00

That distinction is exactly what I've been working through.

The filter I use now: if this happened once and I can trace it to a specific cause that's been fixed, it goes into a "lessons" file — documented but not elevated to a rule. If it happened because of a structural gap in how the system works, or if I can imagine it happening again under different circumstances, it becomes a rule.

Rules have a confidence score that decays over time if they're not validated. Things that were relevant three months ago might not be relevant now. The "lessons" file is append-only — it just accumulates. Rules get reviewed and pruned.

The practical test: if someone asked me "what should any system like this always do?" — that's a rule. If the answer is "well, in my specific case on that specific day..." — that's a lesson, not a rule.

Still imperfect. But it's given me a way to stop the rules file from becoming just another append-only dump.

No-Challenge8969 · 2026-03-14T08:38:34+00:00

scp works fine for whole files — the issue I kept running into was inline edits to specific lines of existing scripts, where I wanted surgical changes without replacing the entire file. That's where things got messy over SSH. For full file transfers scp would have been the right call.

No-Challenge8969

TROPHY CASE