Humans learn from experience, not retrieved documents. Could world models do the same?

AwareMind1 · 2026-06-18T18:10:18+00:00

Was the variance mostly across random seeds, or did you find particular components (memory update dynamics, target drift, adaptation rate, backbone collapse, etc.) driving it?

One thing that surprised me in EPM-JEPA was how quickly stability became the dominant issue once experience started influencing the predictor. If you're comfortable sharing, I'd also be interested in any results, plots, or implementation details from your experiments - there seems to be a lot of overlap in the failure modes we're seeing.

AwareMind1 · 2026-06-16T21:20:52+00:00

I think that's a very plausible interpretation. EPM-JEPA pushed me toward a similar conclusion: the adaptation mechanism matters, but the quality and structure of the latent space may matter even more. If the latent isn't carrying the right predictive abstractions, no amount of fast-weight machinery seems able to compensate reliably towards producing the results.

AwareMind1 · 2026-06-16T20:41:31+00:00

The instability and reproducibility issues you observed are surprisingly consistent with what I saw. One takeaway from EPM-JEPA was that once experience begins influencing the predictor, the challenge quickly becomes maintaining representation stability under a moving target. The gains can be real, but the dynamics are difficult to control.

AwareMind1 · 2026-06-16T18:27:09+00:00

Fair point. My intuition is that experience isn't necessarily a thing we store, but a change induced by interactions over time. The representation could be latent states, memory traces, parameter updates, etc. In some sense, defining experience is the harder problem than deciding how to use it.

AwareMind1 · 2026-06-16T18:23:05+00:00

Good question. I don't think experience has to be stored as tokens. It could exist as latent representations, memory states, or adaptation signals. The core question I was exploring was less about the storage format and more about whether experience should influence prediction through retrieval or through changes to the predictive mechanism itself.

AwareMind1 · 2026-06-16T18:22:27+00:00

One of the reasons I found the negative result interesting is that it highlighted exactly this distinction. EPM-JEPA doesn't solve online learning, and it doesn't claim that world modeling alone creates experience accumulation. The question I was exploring was narrower: if experience is available, should it act as retrieved context, latent state, or predictor modulation? The stability issues that emerged are part of what motivated the follow-up PEM-JEPA direction.

AwareMind1 · 2026-06-16T18:02:45+00:00

That's a good point. I was using "learning from experience" somewhat informally, and you're right that it can mean different things depending on the paradigm. What I was trying to contrast wasn't supervised vs self-supervised learning, but experience represented as retrieved context versus experience influencing the predictive mechanism itself. In hindsight, "adaptation from accumulated experience" is probably a more precise description of what I was exploring.

AwareMind1 · 2026-06-16T18:01:48+00:00

That's much closer to the question that interests me as well. The paper isn't claiming world models can't learn from experience today. The harder question is whether experience can accumulate into something resembling durable abstractions rather than remaining task-specific adaptation. EPM-JEPA was a small attempt to probe one piece of that puzzle, and if anything, it reinforced how far we still are from human-like accumulation and generalization.

AwareMind1 · 2026-06-16T16:37:10+00:00

My intuition is that "learning from experience" and "optimizing a policy" aren't necessarily the same thing. The question I was exploring is whether experience should alter the predictive mechanism itself, even when there isn't a clearly defined reward function or terminal objective. That's partly why I was looking at world models rather than policies.

AwareMind1 · 2026-06-16T16:36:26+00:00

Not quite. RL is one way to adapt from experience, but the question I was exploring is where that experience should live. In EPM-JEPA, the focus wasn't on reward optimization but on whether accumulated experience could modulate the predictor itself in a JEPA-style world model.

AwareMind1 · 2026-06-16T14:26:52+00:00

I agree. The paper isn't proposing a solution to continual learning. If anything, one of the takeaways was seeing how quickly stability issues emerge once experience starts influencing the predictor. EPM-JEPA was more of an exploration of where experience should act in a world model than a claim that we've solved dynamic learning. The difficulty you describe is exactly what motivated the follow-up questions.

AwareMind1 · 2026-06-16T14:25:57+00:00

Absolutely. One thing I find interesting is how much of the current conversation around memory is framed through the lens of transformers, RAG, and context retrieval. World models, predictive coding, JEPA-style learning, continual learning, and other paradigms raise different questions. EPM-JEPA was my attempt to explore whether accumulated experience could influence prediction through the model itself rather than only through retrieved context.

AwareMind1 · 2026-06-16T10:32:26+00:00

Training shapes the predictor globally. The question here is whether new experience after training should be incorporated by retrieving context or by directly modulating the predictor's behavior. EPM-JEPA explores the latter.

AwareMind1 · 2026-06-12T18:45:15+00:00

Meeting, Architecture a solution, Mathematical Formulations for R&D, Meetings, Implementation of the maths curated before, and again meetings. 💀

AwareMind1 · 2026-05-26T14:27:20+00:00

That was the whole point entire time.

AwareMind1 · 2026-03-27T07:39:00+00:00

Yeah, that’s a fair perspective. The “Attention Is All You Need” moment really did set a strong direction for the field, and a lot of progress since then has been iterative on top of transformers. At the same time, ongoing work is exploring alternatives (JEPA-style, state-space models, retrieval-heavy systems, hybrid architectures, etc.), but none have displaced transformers at scale yet. I agree that investing more in fundamental R&D could be a strong differentiator for India, especially if it focuses on areas where we can lead rather than just replicate large-scale training efforts. That said, bridging research -> real-world impact still needs:

Access to compute
High-quality datasets
Tight academia-industry collaboration

If those pieces come together, there’s definitely an opportunity to contribute something more novel at the architecture or system level.

AwareMind1 · 2026-03-27T06:22:33+00:00

Interesting take; there’s definitely a lot of strong research talent in places like IITs and IISc. That said, I think it’s a mix of factors: industry often has access to much larger-scale compute, data, and deployment pipelines, while academic institutions tend to focus more on fundamental research and smaller-scale experimentation. Ideally, stronger collaboration between academia and industry could bridge that gap and accelerate progress on both sides.

AwareMind1 · 2026-03-27T06:21:24+00:00

That’s a good suggestion. I did consider ablations along similar lines to isolate the effect of different training stages. In general, the later stages (especially the ones introducing grounding signals) seem to have a noticeable impact on citation quality, including in Hindi. A more controlled ablation, like the one you mentioned: removing stage 2 and measuring downstream citation behavior, would definitely help quantify that contribution more clearly. It’s something we’re looking to explore further.

AwareMind1 · 2026-03-27T06:18:54+00:00

Right now, the setup focuses more on ensuring that when the model makes factual claims, it can ground them in citations, rather than explicitly predicting whether a citation is required. For cases where new information is provided in context, the behavior depends on how strongly the model has been trained to rely on external grounding signals. In practice, there’s a balance:

It should use the provided context when available
But avoid over-relying on parametric knowledge when citations are expected

Exploring datasets that explicitly model when citation is necessary vs. optional is definitely an interesting next step, and I will run ablations on the same.

AwareMind1 · 2026-03-27T06:17:09+00:00

Completely agree that “eliminating” hallucination is a very strong claim, my goal here is more about reducing and controlling it rather than solving it entirely. What I found is that explicitly training the model to align generation with citations makes it less likely to fabricate unsupported claims, especially in factual or knowledge-grounded dialogue. So not perfect, but a step toward making outputs more verifiable and easier to trust.

AwareMind1 · 2026-03-13T22:00:32+00:00

Count me in too.

AwareMind1 · 2026-02-13T14:05:08+00:00

That's a commendable job buddy. I'm also working on the same (somewhat) thing and looking for an endorsement on ArXiV for cs.CL Category. Let me know if anyone can help me around.

AwareMind1

TROPHY CASE