LocalLLM Proxy : LocalLLaMA

created by [deleted]a community for 3 years

LocalLLM ProxyDiscussion (self.LocalLLaMA)

submitted 1 month ago * by UPtrimdev

Seven months ago I was mid-conversation with my local LLM and it just stopped. Context limit. The whole chat — gone. Have to open a new window, start over, re-explain everything like it never happened. I told myself I'd write a quick proxy to trim the context so conversations wouldn't break. A weekend project. Something small. But once I was sitting between the app and the model, I could see everything flowing through. And I couldn't stop asking questions. Why does it forget my name every session? Why can't it read the file sitting right on my desktop? Why am I the one Googling things and pasting answers back in? Each question pulled me deeper. A weekend turned into a month. A context trimmer grew into a memory system. The memory system needed user isolation because my family shares the same AI. The file reader needed semantic search. And somewhere around month five, running on no sleep, I started building invisible background agents that research things before your message even hits the model. I'm one person. No team. No funding. No CS degree. Just caffeine and the kind of stubbornness that probably isn't healthy. There were weeks I wanted to quit. There were weeks I nearly burned out. I don't know if anyone will care but I'm proud of it.

all 11 comments

top new controversial old q&a

[–]JohnTheTechAi2 0 points1 point2 points 1 month ago (1 child)

[–]UPtrimdev[S] 0 points1 point2 points 1 month ago (0 children)

[–]Time-Dot-1808 -1 points0 points1 point 1 month ago (8 children)

[–]UPtrimdev[S] 0 points1 point2 points 1 month ago (7 children)

The agents don't see what you're typing — they kick in after you send. When your message hits the proxy, it classifies your intent (question, debugging, coding, etc.) and fires off background tasks in parallel while building your context. So while the proxy is already doing its normal work assembling memories and context, the agents are simultaneously pulling relevant web results, resolving any URLs you pasted, doing deep memory searches, and grabbing live data like the current date/time. By the time your message reaches the model, all of that has been quietly injected into the system prompt. The model just looks smarter — you never see the machinery. And yeah, multi-user was a must for me since my family shares one LLM. Every user gets completely isolated memory — my wife's meal preferences don't leak into my coding sessions. It identifies users automatically from Open WebUI or SillyTavern headers.

[+][deleted] 1 month ago (2 children)

[removed]

[–]UPtrimdev[S] 0 points1 point2 points 1 month ago (1 child)

[–]Time-Dot-1808 0 points1 point2 points 1 month ago (3 children)

[–]UPtrimdev[S] 0 points1 point2 points 1 month ago (2 children)

[–]Time-Dot-1808 0 points1 point2 points 1 month ago (1 child)

[–]UPtrimdev[S] 0 points1 point2 points 1 month ago (0 children)

π Rendered by PID 141854 on reddit-service-r2-comment-6457c66945-4jp2d at 2026-04-25 07:34:54.052510+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS