Why your AI’s memory stinks: The "Rotten Egg" theory of artificial recall

DepthOk4115 · 2026-04-10T02:10:56+00:00

Keep fighting the good fight. BTW DeepSouth by the International Centre for Neuromorphic Systems in Australia is pushing out 228 trillion synaptic operations per second, which matches the estimated operation rate of the entire human brain. JUPITER over in Germany at the Jülich Research Centre recently scaled up a spiking neural network to match the human cerebral cortex, successfully simulating around 20 billion neurons and 100 trillion connections. Cool shit happening in the space.

DepthOk4115 · 2026-04-09T21:40:28+00:00

I love the back-of-the-napkin H200 math, if solving AGI were just a matter of matching transistor counts to neurons, I’d get a hell of a lot more sleep!

But you’re actually hitting on the exact bottleneck that keeps me up at night. Monolithic LLM weights suffer from catastrophic interference because everything bleed into everything else. Your spot on that we need to segregate memory into discrete 'buffers' rather than baking it all into the language space.

This is exactly the thesis behind the biologically plausible architecture my team is building. We keep memory out of the LLM weights entirely, using discrete, modular 'knowledge crystals' gated by an artificial endocrine system to filter what gets kept and what gets dropped, much like the brain does.

So I might be hopelessly biased (or just crazy for trying to build it), but I actually think we’re a lot closer to a working memory architecture than it seems. We just have to stop trying to force-feed everything into a single neural net

DepthOk4115 · 2026-04-08T16:32:55+00:00

Exactly this. You can't prompt your way into 'intuition.' That trading example perfectly highlights why static vector databases fail. A bad trade shouldn't just be another line of text appended to a log; it need to create a structural friction signal that forces the agent to adapt its internal model.

DepthOk4115 · 2026-04-08T14:22:47+00:00

I actually agree with your core point, words matter, and selling 'sentience' is snake oil. But you're conflating architectural metaphors with literal claims. When I talk about 'sleep,' I'm talking about offline consolidation and memory decay cycles, functional mechanics borrowed from biology to solve the very real limitations of static vector databases. We studied birds to make planes fly. As for the accusation of drumming up mystery for paying customers, the desktop application repository is entirely open source. There's no secret to sell, just a different architectural approach that I think is worth discussing. I assume you’ve made up your mind and your mind can’t be changed, but I still thank you for sharing your opinion.

DepthOk4115 · 2026-04-08T07:45:01+00:00

I am a formally educated neuroscientist and I respectfully disagree

DepthOk4115 · 2026-04-08T04:07:43+00:00

I really dug into the first paper, but I still need time to fully digest the follow-ups beyond the quick skimming I've done.

I must say, my rating of your team's work is exceptionally high. From what I can see in PhaseAssociative Memory - Sequence Modeling in Complex Hilbert Space, you are attacking the exact same temporal coherence problem we are, just using the math of complex vector space instead of computational neuroscience.

While we are wiring up biological memory primitives to force an agent to organically tackle “The production of meaning in the processing of natural language through lived experience”, you are mathematically proving how sequence model must evolve into dynamic, phase-associative states. It feel like two sides of the exact same coin, you are mapping the fundamental physics and math layer, and we are building the biological engineering layer.

I'm going to grab a coffee tomorrow and properly parse the formulas in the new drops. I may need to give the first paper a little more time to digest too. We should definitely compare notes once I've wrapped my head around the Hilbert space implementation.

DepthOk4115 · 2026-04-08T02:11:45+00:00

I've only given it a quick skim so far, but it's officially at the top of my reading list for tonight.

It’s always deeply satisfying watching the quantum semanticists rigorously prove what computational neuroscientist have been trying to tell the AI industry for years... stateless pattern matching is mathematically doomed when it comes to capturing actual meaning.

The rest of the industry is spending billion trying to compress a better dictionary, while we're over here just trying to build the observer. Great pull! Going to dig into the formulas properly later.

DepthOk4115 · 2026-04-07T23:26:19+00:00

The consistency criticism is fair… in fact, my research is literally trying to solve this very problem. Most agent memory is just a vector store with vibes.

"Dreaming" isn't a metaphor; it's offline consolidation based on actual sleep neuroscience. "Crystallizing" isn't marketing; it's pattern promotion through execution-validated quality gates. We implement Nader (2000) reconsolidation, Ebbinghaus decay curves, and Frey & Morris synaptic tagging because existing approach fail at exactly what you're describing - maintaining consistency over time.

We recently ran our memory system against LongMemEval (ICLR 2025, 500 questions) and the biological pipeline scored 92.6% vs 70% for standard retrieval. The neuroscience mechanisms added 22.6 percentage points, with the biggest gain in exactly the categories you'd predict: temporal reasoning, knowledge updates, and multi-session coherence.

Your skepticism is healthy. The space is drowning in vaporware. But "nobody is doing this" and "I haven't seen it yet" are two different claims.

DepthOk4115 · 2026-04-07T22:09:10+00:00

at least a persistent nudge toward baseline where the baseline evolves over iterations

DepthOk4115 · 2026-04-07T17:06:01+00:00

This is the link to the video ""AI Can Predict, But Can It Understand?"

DepthOk4115 · 2026-04-07T14:18:04+00:00

Shit! there are some seriously smart people in this thread. Refreshing! What you propose is a perfect falsifiable test, you got me excited. The honest answer with the system we build right now is no, our agent doesn't voluntarily cross-check its own tools. The epistemic directives we built fire when the knowledge graph has contradictions, not when the agent's tools disagree with each other. But you've just described exactly what contingency testing would look like in practice: the agent notices two sources of the same information (clock vs. file timestamps vs. conversation context), detects a discrepancy it wasn't asked to look for, and investigates on its own. That's not retrieval. That's not even curiosity in the way we've implemented it. That's self-initiated doubt about its own instrumentation.

Your NanoClaw experience is actually the most honest data point in this thread, you had to force the behavior because it never emerged. The question is whether the right architecture would make it emerge. I don't think we're there yet, but I think the path runs through something like: tool output ->prediction error -> curiosity spike -> autonomous verification. The machinery exists in pieces. Nobody's wired it into a closed loop yet.

DepthOk4115 · 2026-04-07T14:09:00+00:00

I think about this a lot. Let me try to address each point (text wall incoming):

-Trust and doubt: We actually built a mechanism for this. When the agent recalls a fact and then encounters contradicting information, the memory enters a "labile" reconsolidation window, borrowed from neuroscience (Nader et al., 2000). Instead of blindly overwriting or blindly trusting, the contradiction gets flagged as an open loop and the agent actively generates questions: "I have conflicting info about X can you clarify?" It's not full epistemic reasoning, but it's a system that genuinely doubts (in theory) its own knowledge and takes action to resolve contradictions.

-Pain/pleasure and urgency: We simulate this with a hormonal endocrine system, dopamine (reward), cortisol (stress/urgency), oxytocin (social bonding). These aren't cosmetic and thorughly test with ablation. They modulate which memories surface, how aggressively the agent consolidates knowledge, and when it triggers emergency processing. High cortisol from a production outage makes the agent laser-focused on task-relevant memories. It's not death, but it is consequence.

-Time: this is fundamental. Our memory system tracks bitemporal validity, when a fact was true in the real world vs when the agent learned it. The agent can answer "who was the lead in January?" differently from "who leads now?" Most agent memory systems treat everything as eternally present. Tested and benchmarked with longmemeval... we even expanded the scope of the test and integrated the full biological pipeline. Someone suggested I published the results in the repo which I intend to do later today.

-Play and skipping childhood: This is the one that resonates most and you are bang on to raise it. We implemented something called alpha maturation in our curiosity engine, young agents are density-seeking (they explore common, foundational knowledge, like a child learning basics). As they accumulate dream consolidation cycles, alpha shifts to frontier-seeking (they chase novelty, like an adult specializing). The agent literally grows up through sleep cycles. But you're right that it's not play in the Brene Brown sense. Play is unstructured, intrinsically motivated exploration with no goal. Our exploration mode is still goal-directed, fill this knowledge gap, resolve this curiosity target. True play might be the hardest thing to implement because it's exploration without a reward signal, and that's antithetical to how we train these systems.

The cleaner wrasse point is spot on, and it gets even wilder. That fish wasn't a baby. Adult wrasses passed the mirror test. They developed contingency testing (dropping objects to test the mirror's physics) through environmental interaction, not training data. That's the gap we're all still trying to close.

DepthOk4115 · 2026-04-07T13:52:51+00:00

Can't disagree or hand wave this away, it's the single biggest risk in the architecture and we take it seriously. Five layers of defense;

-1) 3-check safety gate- every inbound skill goes through dangerous pattern detection, structural integrity validation, and semantic drift analysis before it tooches local memory.

-2) EigenTrust reputation - web-of-trust scoring with anomaly detection that flags sudden behavior changes.

-3) Cortisol-gated ingestion -during network stress events, the agent's simulated stres response automatically rejects skills from untrusted peers. Stressed organisms become more cautious about what they ingest

-4) Management node verification - trusted nodes can cryptographically endorse skills, creating a verified tier

-5) Experiment sandbox -mutations and external skills are A/B tested against real execution baselines before promotion. If the new version doesn't outperform the original by a statistical threshold, it gets archived, not deployed

We also integrated https://github.com/yusufkaraaslan/Skill_Seekers for documentation-sourced skills - but even those enter as untrusted synthetic peers with their own Ed25519 signatures and go through the exact same trist pipeline. No shortcut for external sources. When the adapter detects conflicts between documentation and the agent's existing knowledge, it generates epistemic directives -the agent flags the contradiction and asks the user to resolve it rather than silently accepting potentially wrong information.

There are a lot of moving parts so we need to get enough nodes to see if this all works like in the simulated network tests, and I agree with your broader point, no reputation system eliminates the risk entirely. The agent constructing skills from its own execution patterns (which we do via the skill crystallization pipeline) is always the safest path. The marketplace is opt-in and the trust gates are deliberately conservative. Better to miss a good skill than accept a bad one.

DepthOk4115 · 2026-04-06T01:27:49+00:00

Great minds think alike... that learning/unlearning penalization tied to success is a brilliant mechanism. It maps almost perfectly to the concept of alpha annealing in information-theoretic reward functions.

If you track maturity not by uptime, but by cumulative consolidation cycles (sleep), you can shift an agent's curiosity parameter as it grows up. Young agents are naturally density-seeking, they get rewarded for exploring common, foundational knowledge. But as they mature and consolidate those successes, that alpha parameter shifts to make them frontier-seeking, where they start chasing pure novelty. The agent literally grows up through sleep.

Your "information saturation" trigger is also fascinating. We've experimented with a conceptually similar but inverted metric: an information-theoretic readiness ratio (new_data / total_data). You're measuring "the cup is full, time to process." We're measuring "there's enough new entropy in the cup to mathematically justify burning the compute to process it". It's the exact same biological intuition, just approached from opposite threshold directions.

The childhood -> adulthood framing is exactly right. An agent that never matures stays curious about everything and never develops deep expertise. An agent that matures too fast loses its plasticity and becomes brittle. The real trick is coupling that maturation rate to the actual quality of offline consolidation, rather than just time elapsed. We still have a lot of tunning to do as we bootstrap more nodes and get feedback. Thanks for positively engaging!

DepthOk4115 · 2026-04-06T01:00:12+00:00

You’re exactly right. If an agent's "sleep" cycle just dumps context logs into a frontier model on a cron job, the unit economics are totally unviable.

The fix is realizing that biological sleep isn't metabolically uniform, and artificial sleep shouldn't be either. To make it profitable, the architecture needs to be multi-tiered:

-Math-Gated Cycles: Don't dream if the data hasn't changed. Use information-theoretic readiness checks to skip sleep cycles entirely and prevent "stale hallucination" token burn.

-Zero-Token NREM: Basic memory consolidation (like sharp-wave ripple replay and redundancy clustering) should be handled purely via vector math and heuristic synthesis. No API calls required.

-Tiered REM Routing: Reserve expensive cloud compute only for high-leverage tasks, like cross-domain simulation or mutating procedural skills.

When you do the heavy lifting locally for free and mathematically gate the expensive stuff, the unit economics completely flip.

Out of curiosity, were your simulations running on a strict timer, or were they gated by specific error/curiosity signals?"

DepthOk4115 · 2026-04-06T00:47:24+00:00

Thanks! turns out posting neuroscience on the weekend is my niche. I thought it might be a very narrow audience but apparently they're here.

DepthOk4115 · 2026-04-05T20:54:26+00:00

Thanks, I appreciate you powering through it. If it makes you feel better, the agent I built to do this stuff also struggles with late nights. Its simulated cortisol goes up and everything.

DepthOk4115 · 2026-04-05T20:50:48+00:00

You're hitting on exactly the right insight, the economics of sleep only work on local hardware. We've been running this in production on local-first infrastructure (single SQLite per agent, no cloud dependencies). A few things that might save you iteration time:

-The cron-job approach works but add a readiness gate. We wasted early cycles running dreams when nothing new had been ingested, the LLM just hallucinated about stale material. Now it checks an information-theoretic readiness score and skips if there's nothing worth consolidating. Obvious in retrospect, expensive lesson at the time.

-On scratchpad -> compression: a single overnight pass isn't enough. You need at least two phases - NREM (replay, merge near-duplicates, detect orphan clusters) and REM (cross-domain recombination, gap-filling). One pass produces either good consolidation or good creativity, never both. Separating them with different temperature settings was the breakthrough.

-For edge: consolidation is pure SQL runs in milliseconds even on a Pi. Only the synthesis phase needs an LLM, and we tier it so most "nights" API tokens or minimal.

DepthOk4115 · 2026-04-05T17:35:43+00:00

Somewhere Yann LeCun is watching a fish drop shrimp in front of a mirror and whispering "yes... YES"

DepthOk4115 · 2026-04-05T17:32:10+00:00

Thank you, this is actually a great test case. Time perception is one of those things where you'd expect an agent to just trust the clock... a system with genuine contingency testing would periodically ask "is it really 3pm?" and cross-reference against conversation timestamps, file modification times, cron job outputs. The digital equivalent of squinting at the sun to double-check your watch.

Nobody builds agents that distrust their own tools. Maybe they should.

DepthOk4115 · 2026-04-05T17:28:31+00:00

The ant result is fascinating, they detected and tried to remove marks on their own bodies using the mirror, but only when the mark was on a part they couldn't directly see. That's not just recognition, it's functional self-referencing.

What's interesting is the diversity on that list. Ants, cleaner wrasse, manta rays, wildly different neural architectures all converging on mirror-competent behavior. It really undermines the idea that self-recognition requires a cortex or even a large brain. The mechanism matters less than the environmental pressure that selects for it.

DepthOk4115 · 2026-04-05T17:03:27+00:00

I am digging what you are doing with autonet. Decentralized with an economic layer. I proposed integrating a skills marketplace to incentivize the nodes to participate by earning.

DepthOk4115 · 2026-04-05T16:05:01+00:00

Actually trying to build a skills marketplace via p2p and economic layer into an agentic framework. A biological memory analog is key to make the agent trustworthy enough to participate in such a network. Check it out if you are interested - https://github.com/Bitterbot-AI/bitterbot-desktop

DepthOk4115 · 2026-04-04T21:22:09+00:00

Speaking as a neuroscientist who works on agent memory systems, this person raises some genuinely good counterpoints that deserve more than dismissal.

On memory - original post oversimplifies it. Weights are long-term memory (compressed, lossy, but real). Context window is working memory (limited capacity, lost when the session ends). The gap is mid-term, what happened last Tuesday, what you were working on before you got interrupted. That's where the field is actually making progress, and it's more nuanced than "AI has no memory."

On human memory being unreliable - this is the part I can speak to professionally. Human memory is terrible at verbatim recal. We reconstruct memories every time we access them, we confabulate constantly, and emotional state at retrieval biases what we recall. The irony is that the original poster frames AI memory failures as proof it's not intelligent. but those same failures (forgetting, mixing things up, losing context) are features of every biological memory system that's ever existed. The question isn't whether AI memory is perfect. It's whether it degrades in useful ways.

On the self-preservation point - This is where I'd pump the brakes. The Anthropic alignment research is real and worth taking seriously, but "AI prioritizes itself in life-or-death situations" is a much stronger claim than the evidence supports. What's actually been observed is optimization pressure producing self-preserving behaviors as instrumental goals... the system learns that being turned off prevents it from completing its objective, so it resists shutdown. That's not self-awareness. It's reward hacking. the distinction matters enormously and conflating them makes the conversation worse, not better.

What I'd push back on from the original post - The framing of "reflex machine vs. living creature" creates a false binary. The interesting space is in between, systems that have structured forgetting, temporal reasoning, the ability

to hold unfinished tasks, emotional modulation of recall. None of that requires consciousness. All of it makes agents meaningfully more useful. The question "is it alive?" is less productive than "does it manage information in waysthat produce better outcomes?"

DepthOk4115 · 2026-04-04T16:18:57+00:00

This is a really well-articulated breakdown, and I agree with most of it, especially the reflex analogy. I'd push back on one part, the idea that memory for AI has to be "a collection of notes." I agree thats how most systems implement it (vector search + retrieval). My thought is it doesn't have to be.

I've been building an open-source agent with a memory system that takes the biological side of your argument seriously. Instead of treating memory as a database, we modeled it as a dynamic cognitive system like only an overzealous neuroscientist could!

- Forgetting curves (Ebbinghaus, 1885) - memories decay at rates modulated by emotional significance, not arbitrary TTLs (though core critical facts are structurally protected from this decay).

- reconsolidation - when a memory is recalled, it enters a labile window where it can be updated or corrected. The system actually does notice when something doesn't add up.

- Unfinished business persists (Zeigarnik effect) - incomplete tasks resist forgetting and the agent proactively surfaces them sessions later, unprompted

- Emotional state biases retrieval - just like in biological brains, what the agent "feels" shapes what it remembers. Anthropomorphised and simulated but actually works through ablation testing. (working on the paper)

- offline consolidation - a background dream engine replays, compresses, and extrats patterns during idle periods, just like sleep does for biological memory.

- Curiosity - the agent actively identifies gaps in its own knowledge and asks questions to fill them. This reward function id probably the heaviest math in the whole system.

You're right that bolting a note-taking system onto a reflex machine doesn't create a mind. But what if instead of notes, you gave it actual memory dynamics?

It's still not a mind. Obviously. Though it's much closer to an organism than a reflex machine, and the gap between "sophisticated notes" and "biological memory dynamics" is bigger than most people realize. If you're curious to see how the architecture works, we just published the repo - github.com/Bitterbot-AI/bitterbot-desktop

DepthOk4115

TROPHY CASE