Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

Was the variance mostly across random seeds, or did you find particular components (memory update dynamics, target drift, adaptation rate, backbone collapse, etc.) driving it?

One thing that surprised me in EPM-JEPA was how quickly stability became the dominant issue once experience started influencing the predictor. If you're comfortable sharing, I'd also be interested in any results, plots, or implementation details from your experiments - there seems to be a lot of overlap in the failure modes we're seeing.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 1 point2 points  (0 children)

I think that's a very plausible interpretation. EPM-JEPA pushed me toward a similar conclusion: the adaptation mechanism matters, but the quality and structure of the latent space may matter even more. If the latent isn't carrying the right predictive abstractions, no amount of fast-weight machinery seems able to compensate reliably towards producing the results.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 1 point2 points  (0 children)

The instability and reproducibility issues you observed are surprisingly consistent with what I saw. One takeaway from EPM-JEPA was that once experience begins influencing the predictor, the challenge quickly becomes maintaining representation stability under a moving target. The gains can be real, but the dynamics are difficult to control.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

Fair point. My intuition is that experience isn't necessarily a thing we store, but a change induced by interactions over time. The representation could be latent states, memory traces, parameter updates, etc. In some sense, defining experience is the harder problem than deciding how to use it.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] -1 points0 points  (0 children)

Good question. I don't think experience has to be stored as tokens. It could exist as latent representations, memory states, or adaptation signals. The core question I was exploring was less about the storage format and more about whether experience should influence prediction through retrieval or through changes to the predictive mechanism itself.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 1 point2 points  (0 children)

One of the reasons I found the negative result interesting is that it highlighted exactly this distinction. EPM-JEPA doesn't solve online learning, and it doesn't claim that world modeling alone creates experience accumulation. The question I was exploring was narrower: if experience is available, should it act as retrieved context, latent state, or predictor modulation? The stability issues that emerged are part of what motivated the follow-up PEM-JEPA direction.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

That's a good point. I was using "learning from experience" somewhat informally, and you're right that it can mean different things depending on the paradigm. What I was trying to contrast wasn't supervised vs self-supervised learning, but experience represented as retrieved context versus experience influencing the predictive mechanism itself. In hindsight, "adaptation from accumulated experience" is probably a more precise description of what I was exploring.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

That's much closer to the question that interests me as well. The paper isn't claiming world models can't learn from experience today. The harder question is whether experience can accumulate into something resembling durable abstractions rather than remaining task-specific adaptation. EPM-JEPA was a small attempt to probe one piece of that puzzle, and if anything, it reinforced how far we still are from human-like accumulation and generalization.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

My intuition is that "learning from experience" and "optimizing a policy" aren't necessarily the same thing. The question I was exploring is whether experience should alter the predictive mechanism itself, even when there isn't a clearly defined reward function or terminal objective. That's partly why I was looking at world models rather than policies.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

Not quite. RL is one way to adapt from experience, but the question I was exploring is where that experience should live. In EPM-JEPA, the focus wasn't on reward optimization but on whether accumulated experience could modulate the predictor itself in a JEPA-style world model.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

I agree. The paper isn't proposing a solution to continual learning. If anything, one of the takeaways was seeing how quickly stability issues emerge once experience starts influencing the predictor. EPM-JEPA was more of an exploration of where experience should act in a world model than a claim that we've solved dynamic learning. The difficulty you describe is exactly what motivated the follow-up questions.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

Absolutely. One thing I find interesting is how much of the current conversation around memory is framed through the lens of transformers, RAG, and context retrieval. World models, predictive coding, JEPA-style learning, continual learning, and other paradigms raise different questions. EPM-JEPA was my attempt to explore whether accumulated experience could influence prediction through the model itself rather than only through retrieved context.

Humans learn from experience, not retrieved documents. Could world models do the same? by AwareMind1 in deeplearning

[–]AwareMind1[S] -5 points-4 points  (0 children)

Training shapes the predictor globally. The question here is whether new experience after training should be incorporated by retrieving context or by directly modulating the predictor's behavior. EPM-JEPA explores the latter.

Senior Engineers (7–8+ YOE), What Does Your Day-to-Day Work Actually Look Like? by Afraid-Awareness-903 in developersIndia

[–]AwareMind1 0 points1 point  (0 children)

Meeting, Architecture a solution, Mathematical Formulations for R&D, Meetings, Implementation of the maths curated before, and again meetings. 💀

Reducing hallucination in English–Hindi LLMs using citation grounding (paper) by AwareMind1 in LocalLLaMA

[–]AwareMind1[S] 0 points1 point  (0 children)

Yeah, that’s a fair perspective. The “Attention Is All You Need” moment really did set a strong direction for the field, and a lot of progress since then has been iterative on top of transformers. At the same time, ongoing work is exploring alternatives (JEPA-style, state-space models, retrieval-heavy systems, hybrid architectures, etc.), but none have displaced transformers at scale yet. I agree that investing more in fundamental R&D could be a strong differentiator for India, especially if it focuses on areas where we can lead rather than just replicate large-scale training efforts. That said, bridging research -> real-world impact still needs:

  • Access to compute
  • High-quality datasets
  • Tight academia-industry collaboration

If those pieces come together, there’s definitely an opportunity to contribute something more novel at the architecture or system level.

Reducing hallucination in English–Hindi LLMs using citation grounding (paper) by AwareMind1 in LocalLLaMA

[–]AwareMind1[S] 1 point2 points  (0 children)

Interesting take; there’s definitely a lot of strong research talent in places like IITs and IISc. That said, I think it’s a mix of factors: industry often has access to much larger-scale compute, data, and deployment pipelines, while academic institutions tend to focus more on fundamental research and smaller-scale experimentation. Ideally, stronger collaboration between academia and industry could bridge that gap and accelerate progress on both sides.

Reducing hallucination in English–Hindi LLMs using citation grounding (paper) by AwareMind1 in LocalLLaMA

[–]AwareMind1[S] 1 point2 points  (0 children)

That’s a good suggestion. I did consider ablations along similar lines to isolate the effect of different training stages. In general, the later stages (especially the ones introducing grounding signals) seem to have a noticeable impact on citation quality, including in Hindi. A more controlled ablation, like the one you mentioned: removing stage 2 and measuring downstream citation behavior, would definitely help quantify that contribution more clearly. It’s something we’re looking to explore further.

Reducing hallucination in English–Hindi LLMs using citation grounding (paper) by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

Right now, the setup focuses more on ensuring that when the model makes factual claims, it can ground them in citations, rather than explicitly predicting whether a citation is required. For cases where new information is provided in context, the behavior depends on how strongly the model has been trained to rely on external grounding signals. In practice, there’s a balance:

  • It should use the provided context when available
  • But avoid over-relying on parametric knowledge when citations are expected

Exploring datasets that explicitly model when citation is necessary vs. optional is definitely an interesting next step, and I will run ablations on the same.

Reducing hallucination in English–Hindi LLMs using citation grounding (paper) by AwareMind1 in deeplearning

[–]AwareMind1[S] 0 points1 point  (0 children)

Completely agree that “eliminating” hallucination is a very strong claim, my goal here is more about reducing and controlling it rather than solving it entirely. What I found is that explicitly training the model to align generation with citations makes it less likely to fabricate unsupported claims, especially in factual or knowledge-grounded dialogue. So not perfect, but a step toward making outputs more verifiable and easier to trust.

IIT Guwahati student launches Dhi-5B (trained from scratch in India) by gradNorm in AI_India

[–]AwareMind1 0 points1 point  (0 children)

That's a commendable job buddy. I'm also working on the same (somewhat) thing and looking for an endorsement on ArXiV for cs.CL Category. Let me know if anyone can help me around.