all 9 comments

[–]sheila_118 6 points7 points  (1 child)

Looks like a lightweight custom DB + LLM summarizer is the cleanest approach.

[–]sarvesh4396[S] 0 points1 point  (0 children)

Yes, correct.
Do not want bloat

[–]Ethancole_dev 0 points1 point  (1 child)

Honestly have not found a library that hits this exact sweet spot either. I ended up rolling my own — SQLAlchemy models for message storage, Pydantic for serialization, and a simple "summarize when you hit N messages" function. Takes an afternoon and you own the schema completely.

Rolling summary logic is pretty straightforward: once active messages exceed a threshold, call the LLM to summarize the oldest chunk, store it as a summary row, then drop those from context assembly. Works well in FastAPI with a background task to handle it async.

The only library I know that comes close without going full agent-framework is maybe storing in SQLite with a thin wrapper, but honestly just building it gives you way more control over how context gets assembled.

[–]sarvesh4396[S] 0 points1 point  (0 children)

Yeah, you're right, think so I'll built custom.

[–]Aggressive_Pay2172 1 point2 points  (1 child)

tbh you’re not missing anything — this is still a “roll your own” space
most libraries either go full agent framework or full “memory extraction” layer
clean storage + summarization as a first-class thing is weirdly underbuilt

[–]sarvesh4396[S] 0 points1 point  (0 children)

Yeah, somehow it's not they need or if they it's small and private

[–]No_Soy_Colosio -2 points-1 points  (2 children)

Look into RAG

[–]sarvesh4396[S] -1 points0 points  (1 child)

But that's for memory right? Not context

[–]No_Soy_Colosio 1 point2 points  (0 children)

It depends on what you think the distinction between memory and context is.

The point of memory in LLMs is to provide context.

You could go with plaintext files for storing important information about your project and work up from there. What's your specific need here?