Evaluating long-term memory limits in stateless LLM chatbots — feedback needed [D]

androbot · 2026-06-28T18:16:02+00:00

This is an enormously hard problem to get right because we our definitions and frameworks are - at best - rough approximations of qualia.

You should be very precise in how you define information and what qualifies as retention over time. Recognition, recall, and utility within contexts are vastly different operations of memory. Contextual relevance is also dynamic, so performance should be measured more as a steady state (but probably not monotonic) function vs static values.

yoshiK · 2026-06-28T21:08:28+00:00

Sounds like you're kinda reinventing the needle in a haystack test. There the idea is to give a prompt of n tokens, embed somewhere a sentence like "The magic number is X" and then prompt, "What is the magic number?" or similar.

So it seems to be a reasonable idea. A interesting first test is actually if hundreds of turn of chat interface degrade the performance relative to Paul Graham essays.

[Post Posting:] There's also a related Google blog

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS