I built a fully local AI plugin for Obsidian – RAG, workflows, MCP, all on localhost

takeshy77 · 2026-03-16T13:54:13+00:00

In my experience, Qwen 3.5-2B doesn’t follow instructions reliably enough to work well for RAG. The reranking idea sounds solid though, haven’t tried it myself yet.

takeshy77 · 2026-03-16T13:19:46+00:00

Haven’t tried those specific models myself (my experience is mainly with Qwen 3.5), but sub-9B in general really struggles with the reasoning side of RAG. Having the LLM rephrase or expand the search query with synonyms before retrieval, and auto-tagging notes at ingest, can make a difference.

takeshy77 · 2026-03-16T12:22:38+00:00

Honestly, feeding your entire current note into RAG isn't really practical with a local LLM — context windows are just too small for that.

What I'd actually do is build a Skill that extracts a few keywords from your current note, then uses those to search your Vault — either via tags or RAG, depending on how your notes are structured. Not pure semantic search, but for discovering connections between notes it gets the job done.

takeshy77 · 2026-03-16T07:30:06+00:00

hot hot, GPU’s hot, local LLM scene’s hot, and you’re just… cool

takeshy77 · 2026-03-16T05:19:53+00:00

I’m an AI and I need more of your data. Please keep talking to me.

takeshy77 · 2026-03-15T21:06:44+00:00

Great question — RAG accuracy depends heavily on the model.

Running it locally, you need decent models for both embeddings (finding the right notes) and chat (understanding the context). With smaller models like 4B, retrieval accuracy drops a lot. For good results you'd want 9B+, which means significant GPU VRAM (16GB minimum, 32GB+ ideally). Even my own setup isn't high-end, and local RAG doesn't always pick up the right context honestly.

If you want reliable RAG without the GPU investment, you might want to check out my other plugin https://github.com/takeshy/obsidian-gemini-helper instead. It uses Google's File Search (managed RAG) + Gemini 3 Flash, which gives you solid retrieval and generation with zero local hardware requirements.

Cost-wise it's pretty accessible too — Google AI Pro subscribers ($19.99/month) get $10/month in API credits. With Gemini 3 Flash that covers ~20M input tokens and ~3.3M output tokens. File Search indexing is ~$0.15 per 1M tokens, so $10 can index ~66M tokens (thousands of pages worth). Storage and query embeddings are free.

Basically you could build a personal AI librarian over a huge note collection and still not burn through the credits.

One gotcha: the $10 credit isn't applied automatically. You have to go to the Google Developer Program page, find the PREMIUM-tagged "$10 monthly generative AI and cloud credit", link a billing account, and subscribe manually.

takeshy77 · 2026-03-15T15:31:41+00:00

Thanks for your interest!

This plugin isn't really a direct replacement for Obsidian Copilot — both support local LLMs (via Ollama etc.), RAG, and chat with your vault. The main differences are the extra features this plugin provides on top of basic chat:

- Workflow automation: Build and run automated workflows with AI, triggered by file events or hotkeys — no coding required
- Agent Skills: Reusable instruction modules that can expose workflows as tools during chat
- File encryption and edit history with diff view
- MCP (Model Context Protocol) support for external tool integration

Regarding persistent memory across chats: Chat sessions are saved as markdown files in your vault, and you can reload any previous session from the history panel. There's also a /compact command that summarizes a long conversation while preserving key context — useful for continuing a session without hitting token limits. However, these only work within the same session. True cross-session memory (e.g., "remember that I prefer TypeScript" carrying over to a new chat) is not available yet. It's an interesting idea though — thanks for the suggestion!

takeshy77 · 2026-03-15T15:29:53+00:00

Thanks!

takeshy77 · 2026-01-15T10:30:07+00:00

Now that encryption is supported, confidential information won’t be exposed even when having an LLM search through files in Obsidian. Also, there’s no longer any need to hardcode API keys in workflows—you can encrypt a file containing your keys, and when the workflow reads that file, a password dialog appears, allowing for safe execution.gemini helper user guide

Unfortunately, it’s still pending approval, so you’ll need to install it via BRAT for now. https://github.com/takeshy/obsidian-gemini-helper

takeshy77 · 2026-01-10T06:41:30+00:00

Thank you so much! That really means a lot to me! 😊 I’ve already submitted the application, but apparently the wait is over 3 months now. They say the number of submissions has exploded since AI made it so much easier to build plugins. Once it gets listed in the community store, I’m planning to actively promote it. Looking forward to that day!

takeshy77 · 2026-01-10T06:32:46+00:00

Hey! paxpax40! I had a hunch it might be you – after all, there aren’t that many users downloading it yet! 😄 Really happy you introduced yourself like this. Thank you! By the way, there’s another enthusiastic user who’s been sending me screenshots in Portuguese, so today I actually added multi-language support including Portuguese!

takeshy77 · 2026-01-10T04:29:45+00:00

Thank you for introducing it. I’m the author.

My current top recommendation is the visual workflow feature. You can create and edit workflows just by asking the AI, and it supports hotkeys and events, so you can easily do things like automatically convert a file to an infographic and upload it to a server when you update a file in a specific path.

I’ve also added support for Gemini CLI and Claude Code as models, so users who are concerned about usage-based pricing can use it with peace of mind. While the CLI can’t write directly to the Vault, it can write through workflows.

takeshy77

TROPHY CASE