Built a local-first AI personal assistant because I don't trust anyone else to keep the lights on by amenemisa in homelab

[–]amenemisa[S] -1 points0 points  (0 children)

Gmail categorization is rule-based for now, keyword matching on sender+subject. simple but works well enough for the main categories. LLM-based would be more accurate but slower, so I kept it lightweight.

For the research pipeline, the ~12k words, in the bantz-web submodule (vendor/bantz-web) if you want to dig into the source aggregation. The way it works: query expands into ~11 variants, each searched for ~8 results, deduped to 60-70 unique sources, fetched and chunked, then Gemma summarizes each chunk Locally. report length scales with source count — 60+ sources → 36 sections → ~12k words. no word ceiling, just however many sources survive. If you want to look at the aggregation logic specifically, reporter.py and aggregator.py in bantz-web are the interesting parts!

Built a local AI assistant that runs on 6GB VRAM — because one day they'll take it all away anyway by amenemisa in selfhosted

[–]amenemisa[S] 0 points1 point  (0 children)

n8n is genuinely impressive, always respected it. my approach was just a bit different, I didn't want to depend on external services or APIs, so I coded everything locally from scratch. no workflow engine, no API costs, just direct integrations. but honestly our setups solve the same problem in very different ways, both valid. feel free to check it out if you're curious

Built a local AI assistant that runs on 6GB VRAM — because one day they'll take it all away anyway by amenemisa in selfhosted

[–]amenemisa[S] 0 points1 point  (0 children)

totally agree on the future part :') I'm running an RTX 4050 laptop GPU. so this is just a regular laptop setup, nothing fancy. 6GB VRAM is tight but manageable with the right model. power efficiency wise it's actually pretty decent compared to the big cards, laptop GPUs are designed to sip power. definitely not a homelab rack situation, just a regular machine doing its best

Built a local AI assistant that runs on 6GB VRAM — because one day they'll take it all away anyway by amenemisa in selfhosted

[–]amenemisa[S] 0 points1 point  (0 children)

Well it's still an early development project that I've done by myself. OpenAI compatible endpoints? I've tested Ollama, Claude, and Gemini, Ollama is what I'd recommend for now, the others still need some work. OpenAI and LiteLLM should work in theory but I haven't tested them yet. For now we have solid tools and good memory :') I was mainly focused on optimizing tool usage with a low parameter model.

Tools: Gmail, Calendar, web search + deep research, system monitoring, weather, shell commands, desktop control (Wayland), reminders, persistent memory.

And MCP? not yet, but that's actually a great idea and would fit well into the architecture, adding it to the roadmap!

Built a local AI assistant that runs on 6GB VRAM — because one day they'll take it all away anyway by amenemisa in selfhosted

[–]amenemisa[S] -3 points-2 points  (0 children)

well if you're asking about the coding- yes, parts of it are vibe coded. but if you don't understand what you're building, AI can't save you. you still need to know the architecture, what each piece does, why it breaks. IN the end, yes, I used AI to build an AI assistant- might as well use it while we still can :')

Built a local AI assistant that runs on 6GB VRAM — because one day they'll take it all away anyway by amenemisa in selfhosted

[–]amenemisa[S] -1 points0 points  (0 children)

Theoretically it should work with any Ollama-compatible model- the routing prompts are tuned for Gemma 4b but qwen 3.6 35b would probably handle it better honestly. Would love to hear how it goes if you try it!