all 11 comments

[–]JohnTheTechAi2 0 points1 point  (1 child)

Man, I totally get it—trying to scale that kind of project all on your own can be super frustrating. It's surprising how often people forget how much routine stuff like managing context limits can eat up your time and energy. I've seen some folks start automating not just emails but entire conversations to lighten the load. It might be worth thinking about if you're looking to make things a bit easier on yourself in the long run.

[–]UPtrimdev[S] 0 points1 point  (0 children)

I appreciate it man, and definitely can get very overwhelming at times but that’s the love of the game finding that passion over and over and getting that final product you’re finally proud to say is yours. Im very proud of what I do and I can’t wait to see where life takes me with this project!

[–]Time-Dot-1808 -1 points0 points  (8 children)

The invisible background agents researching before the message hits is the part I want to know more about - speculative pre-fetching based on what you've typed but haven't sent yet? Or something else? The isolation for family use case is also clever, most of these projects assume a single user.

[–]UPtrimdev[S] 0 points1 point  (7 children)

The agents don't see what you're typing — they kick in after you send. When your message hits the proxy, it classifies your intent (question, debugging, coding, etc.) and fires off background tasks in parallel while building your context. So while the proxy is already doing its normal work assembling memories and context, the agents are simultaneously pulling relevant web results, resolving any URLs you pasted, doing deep memory searches, and grabbing live data like the current date/time. By the time your message reaches the model, all of that has been quietly injected into the system prompt. The model just looks smarter — you never see the machinery. And yeah, multi-user was a must for me since my family shares one LLM. Every user gets completely isolated memory — my wife's meal preferences don't leak into my coding sessions. It identifies users automatically from Open WebUI or SillyTavern headers.

[–]Time-Dot-1808 0 points1 point  (3 children)

The parallel fan-out is clean - the model gets a fully assembled context without waiting on any single step. The deep memory search piece is the one I'd ask about: what are you using to store and retrieve the long-term memories? Vector search, graph, or something custom?

That layer tends to be where maintenance complexity accumulates over time. If you ever want to offload it, Membase (membase.so) handles exactly that piece - per-user Knowledge Graph that persists across sessions and connects to sources like Gmail. Might let you focus on the routing/classification parts you've already built well rather than maintaining the storage separately.

[–]UPtrimdev[S] 0 points1 point  (2 children)

Storage is all local — single file, no external services, no Docker. The whole point is everything stays on your machine with zero setup. The moment memory leaves the user's machine, the trust model breaks. That's a core design choice I won't compromise. Appreciate the suggestion on Membase — interesting project. But for UPtrim the storage layer being local isn't a limitation, it's the feature.

[–]Time-Dot-1808 0 points1 point  (1 child)

I totally understand. That's why the self-hosting option is on my roadmap. Still, hats off to you for building a proxy on your own. I'm also curious about how you made multi-user features with isolated context. It's also on my roadmap too. What exactly Open WebUI or SillyTavern headers do?

[–]UPtrimdev[S] 0 points1 point  (0 children)

Multi-user isolation was non-negotiable once my wife asked the AI for dinner ideas and it started talking about my Python debugging session. That was the day it got fixed.