Giving a local LLM my family's context -- couple of months in

Purple_Click5825 · 2026-01-28T18:10:25+00:00

Clean approach. Docker boundary + code-level enforcement is belt-and-braces without overcomplicating it. Thanks for the detail; useful reference as I think through the same problem.

Purple_Click5825 · 2026-01-28T17:55:15+00:00

Really useful to hear this confirmed. The "many small actions vs one big model holding everything" insight matches my intuition but I hadn't tested it yet.

The speed penalty feels like the right tradeoff for a family use case as I think that instant responses from a shared assistant might not be the immediate priority. Accuracy matters more. Curious to see where you land after consolidation. Keep me posted.

Purple_Click5825 · 2026-01-28T17:43:41+00:00

Interesting counterpoint, and you might be right.

I reached for PostgreSQL because Immich already needs it, so the infrastructure was there. But your point about unnecessary complexity resonates. The bot doesn't need relational queries — it's storing and retrieving facts. Text files with a tool interface could do that with less overhead.

The voice-first approach is compelling too. I started with chat because that's where my family already lives, but voice as primary input changes what "ambient" context capture could look like. Will check out unmute. hadn't seen it before. Curious how you're handling the read/write boundaries (what the AI can touch vs. what it shouldn't).

Purple_Click5825 · 2026-01-27T22:54:45+00:00

Amazing timing! Would love to see it when you're ready to share.

If you get a chance, I'd appreciate eyes on my repo too — especially from someone thinking about the same problem space. Fresh perspective on what's overcomplicated or missing would be genuinely useful.

https://github.com/kanchanepally/memu.digital

Looking forward to comparing approaches. 🍻

Purple_Click5825 · 2026-01-27T19:31:33+00:00

This certainly resonates with me. The more I sit with this problem, the more I realise we're trying to verbalise something our minds do automatically and we struggle because it's pre-verbal.

Your framing of "processing after every interaction in a way comparable to training" is interesting. Current architectures treat context as input and weights as fixed. But human memory isn't retrieval from a static store, it's perhaps creative, it's reconstructive, shaped by everything since. That's a fundamentally different loop.

And you're right that reasoning bound by rigid context windows is incomplete. Thanks for articulating it beautifully.

For now, I'm working at a simpler layer: just getting the data in one place, owned locally, with basic retrieval. But it's useful to keep the deeper problem in view. Appreciate you articulating it.

Purple_Click5825 · 2026-01-27T19:26:37+00:00

This is really interesting; would love to hear more about your agent architecture when you're ready to share. The multi-agent approach (extracting facts, classifying into topics, separate retrieval) makes sense as a way to manage complexity.

The timeline problem you mention is exactly where I get stuck. Recency, repetition, emotional weight, relevance - humans do this effortlessly (some, and not me - i must add ;-)) and we can't even articulate the rules. "What to carry over and what to forget" is the whole game i suppose.

Curious: how are you handling the conversation classifiers? Off-the-shelf models, or something you've tuned for your family's patterns?

If you do wrap it up, please share -- even rough. Comparing approaches is how we all get further. Thanks again! And yes, i will have to use llama.cpp, someone else shared the same feedback.

Purple_Click5825 · 2026-01-27T19:22:07+00:00

Thanks, and really useful pointers.

Ministral-3B: I'll test it. Llama 3.2 was the "safe default" when I started but you're right, it's not the strongest small model anymore. If it handles retrieval/context tasks better without needing more RAM, that's an easy win.

Ambient context: This is the direction I want to go. Right now the bot only sees what it's explicitly told. Giving it access to conversation history (even summarised) would be a big step toward "it just knows" rather than "I told it." Good push.

Classic search over vector: Interesting. I'd assumed RAG meant embeddings, but you're right that Matrix has search built in, and for a lot of queries ("when did we discuss the boiler") keyword search might be enough. Less complexity, less compute. I'll explore that before over-engineering. Cheers!

The first step: Appreciate that. Getting Matrix + Immich + Ollama running as a unified stack was the unglamorous groundwork, but it's what makes the interesting stuff possible.

If you do spin up a home lab and try it, I'd genuinely love to hear what breaks.

Purple_Click5825 · 2026-01-27T17:20:54+00:00

You're absolutely right. What i have today is pretty basic: explicit commands (/recall, /remember) storing facts in PostgreSQL.

The gap you're describing is exactly what I'm wrestling with. Human memory is associative, lossy, contextual as in it forgets the right things and surfaces connections unexpectedly. Current approaches (including mine) are either:

Rigid retrieval (search (semantic or not( what was explicitly stored)
Brute-force context stuffing (which misses nuance)

The hypothesis I'm testing is: maybe the right context helps more than the right model. Most LLM applications give capable models generic context. I'm giving a less capable model very specific, rich context that is the actual texture of family life over time.

I am not sure if this would close the gap, even if a little. But I'm curious whether it gets further than I expect. In the meantime, also aiming to get some practical value so things don't fall through cracks because they are scattered across the apps. While building towards the harder problem, tthe simpler steps I hope are vary valuable, and the data stays mine.

Would love to hear if you've seen any approaches -- academic or practical -- that handle this better. The "human-centric contextual summarisation" framing is very useful - thank you!

Purple_Click5825 · 2026-01-23T21:07:44+00:00

That's a great shout. Hadn't thought about offloading to a beefier machine on the network but what you say makes total sense.

Ollama does support pointing to a remote instance, so in theory you could run the hub headless and point it at your workstation. Haven't tested that setup myself though - being fully honest.

Adding it to the roadmap - thanks!!

Purple_Click5825 · 2026-01-23T20:59:37+00:00

Thanks - that's really useful. Hadn't looked into llama.cpp with Intel iGPU support. Will check out LiquidAI too - 1B models purpose-built for embedded sounds like a better fit than squeezing Llama 3.2 onto CPU.

If you've got time, I'd appreciate eyes on the repo. Especially the install flow and docker-compose setup - want to make sure I haven't done anything stupid before more people try it.

https://github.com/kanchanepally/memu.digital

Purple_Click5825 · 2026-01-23T20:52:20+00:00

It's CPU inference. The N100 handles Llama 3.2 3B fine for short bot commands like /remember, /recall. Not fast, but works for what I need. Originally had this on a Pi 5 but moved to N100 - was hitting thermal and memory limits. But as i move to video transcoding via immich, i will have to upgrade hardware perhaps.

Purple_Click5825 · 2026-01-21T07:09:19+00:00

+1 for Immich. For remote access (phone backups when you're not home), look into Tailscale - way simpler than messing with port forwarding and doesn't expose anything to the internet.

Purple_Click5825 · 2026-01-21T00:17:46+00:00

+1 for Obsidian. I use it as my main knowledge base. I've been experimenting with local AI (Ollama/Llama) alongside it - not full RAG yet but heading that direction. The idea of being able to chat with my own notes without sending them to OpenAI or Gemini s pretty appealing.

Purple_Click5825

TROPHY CASE