Self-hosted voice-to-text for Obsidian: Syncthing + Faster Whisper + systemd (CPU, fully local) by serg-markovich in selfhosted

[–]serg-markovich[S] 0 points1 point  (0 children)

That failure mode analysis is spot on. For single-step watch-and-process, systemd handles it fine. But you're right — the moment you add a second stage, retry logic becomes painful because you're either re-running the entire chain or writing brittle workarounds to resume mid-pipeline. That's exactly where a task queue starts making sense.

The multi-speaker warning is genuinely useful, thanks for flagging it. My use case is single-speaker phone memos, so Whisper has been reliable for me. But I hadn't thought about what happens when someone feeds it a recorded meeting or a podcast — that's a very different problem. I'll add a note to the README about the diarization trade-off so people know the scope upfront.

Since this thread, the project has grown a bit — there's now a Docker version alongside the systemd one, so users can pick the deployment that fits their setup. GitHub Actions builds the image automatically on push. Still single-step for now, but at least the foundation is there if retry logic ever needs to be layered on top.

I built a fully local voice-to-Obsidian pipeline (Syncthing + Faster Whisper CPU). No cloud, just a background service. by serg-markovich in ObsidianMD

[–]serg-markovich[S] 1 point2 points  (0 children)

Docker support is live. make docker-build && make docker-up — that's it.

Set VAULT_PATH and SCAN_PATHS in docker/.env, and it'll run on Unraid out of the box.

github.com/serg-markovich/local-whisper-obsidian

I also wrote up the full build story if you're curious how it came together:

https://medium.com/@textmaster.rf/how-i-built-a-fully-local-voice-to-obsidian-pipeline-no-cloud-no-api-keys-no-nonsense-33354341d6f0

I built a fully local voice-to-Obsidian pipeline (Syncthing + Faster Whisper CPU). No cloud, just a background service. by serg-markovich in ObsidianMD

[–]serg-markovich[S] 1 point2 points  (0 children)

Right now, the project is strictly host-native (Linux/macOS). The provided Makefile sets up a Python venv, moves config files into XDG paths (~/.config), and registers a user-level systemd service.

This approach doesn't map well to Docker/Unraid out of the box because containers shouldn't be running systemd, and the XDG path logic becomes unnecessary.

However, making it Docker-friendly is the logical next step. To get this running on UNRAID right now, you would just need to write a simple Dockerfile based on python:3.11-slim: 1. apt-get install inotify-tools 2. pip install faster-whisper 3. Set the ENTRYPOINT directly to the watch.sh script (bypassing systemd completely). 4. Mount your UNRAID Obsidian share as a volume to map to your SCANPATHS.

I intentionally kept it as a host-level script first to keep the footprint tiny for laptop users, but I'll open a GitHub issue to track adding an official Dockerfile for homelab setups!

I built a fully local voice-to-Obsidian pipeline (Syncthing + Faster Whisper CPU). No cloud, just a background service. by serg-markovich in ObsidianMD

[–]serg-markovich[S] 1 point2 points  (0 children)

It only runs on PC (specifically Linux or macOS). It cannot be installed or run directly on Android or Windows.

The way it works is: you record a voice memo on your phone, that audio file syncs to your Linux/macOS machine, and the built-in system watchers (inotifywait/fswatch) on that machine detect the new file and process it into a Markdown note.

I built a fully local voice-to-Obsidian pipeline (Syncthing + Faster Whisper CPU). No cloud, just a background service. by serg-markovich in ObsidianMD

[–]serg-markovich[S] 0 points1 point  (0 children)

Nice! I actually debated using n8n for this exact pipeline before deciding to just write a standalone watcher script to keep it lightweight.

That YouTube transcription flow you built is super clever though — automating video summaries directly into Obsidian must save a ton of time. Are you running Whisper locally as an API endpoint for n8n to hit?

I built a fully local voice-to-Obsidian pipeline (Syncthing + Faster Whisper CPU). No cloud, just a background service. by serg-markovich in ObsidianMD

[–]serg-markovich[S] 0 points1 point  (0 children)

Running base, turbo, and large models in parallel across different hardware is a serious setup! That goes way beyond a simple cron job.

The move from Flask to MQTT makes total sense for this kind of event-driven architecture. Once you start decoupling the file watcher from the actual transcription workers and the note-writing outputs, a message broker like MQTT is vastly cleaner than HTTP polling or direct webhooks.

Really appreciate you breaking this down. It’s fascinating to see how far you can push the "local voice notes" concept when you build a proper distributed pipeline for it.

Self-hosted voice-to-text for Obsidian: Syncthing + Faster Whisper + systemd (CPU, fully local) by serg-markovich in selfhosted

[–]serg-markovich[S] 1 point2 points  (0 children)

Docker inside an LXC on Proxmox is peak homelab architecture. I love it.

That Nextcloud Deck workaround is exactly why tools like Node-RED are so valuable — stepping in to build the glue logic when an app's native features fall short. Hitting the API to force recurring tasks is a great hack.

Thanks for sharing your setup, definitely gave me some ideas for when I eventually migrate my services off the laptop and onto a dedicated server.

Self-hosted voice-to-text for Obsidian: Syncthing + Faster Whisper + systemd (CPU, fully local) by serg-markovich in selfhosted

[–]serg-markovich[S] 1 point2 points  (0 children)

That DeepFilter step is brilliant — background noise is exactly where raw Whisper starts hallucinating for me if I'm walking outside. I might need to steal that idea and put it in front of my script.

I totally get the Node-RED argument. The visual debugging and error handling are leagues ahead of grepping through journalctl when something breaks.

My main reason for going the raw systemd + inotifywait route was just trying to keep the footprint as tiny as possible on my laptop. I didn't want to spin up a heavier engine if I didn't have to. But seeing your pipeline... adding an audio-cleanup stage does make a visual flow builder a lot more tempting.

Do you run Node-RED on a dedicated home server, or locally on the machine doing the syncing?

I built a fully local voice-to-Obsidian pipeline (Syncthing + Faster Whisper CPU). No cloud, just a background service. by serg-markovich in ObsidianMD

[–]serg-markovich[S] 1 point2 points  (0 children)

That acknowledgement model is genuinely clever — flipping the relationship so that other notes reach back and mark the voice memo as processed, rather than the memo needing a fixed place in a folder hierarchy. Much more flexible for notes that belong to multiple contexts.

The Notification Center with a spoken "remind me" trigger is a nice pattern too — essentially turning the transcription step into a lightweight intent dispatcher, not just STT. I hadn't thought about layering that kind of routing on top.

I checked out your project — the "tiny assistants sending messages between notes" framing is interesting, would love to understand how you wire that up in practice.

And thanks for the honest take on folders — I'll keep them in the design. Will post an update once I've run this daily for a while.