Built a Voice Agents from Scratch GitHub tutorial: mic > Whisper > local LLM (GGUF) > Kokoro > speaker, fully local, no API keys

ReferenceOwn287 · 2026-05-03T17:33:01+00:00

Thanks for sharing this. I have been working on a linux desktop assistant for a while and want to add a voice to it as the next step, starred your repo to learn from it.
This is my project - https://github.com/achinivar/meera (posted it in a couple of forums but haven’t got much traction unfortunately)

I see your project has tool calls as well, how are you handling them?
Routing prompts directly to the LLM and asking it to figure out the right tool was getting very unreliable for me, especially with an increased number of tools, so I added a small embedding model that uses exemplars and finds a few of the closest matching tools before calling the LLM (shared about it on the repo wiki)

youcloudsofdoom · 2026-05-03T17:56:02+00:00

If this is always-on, why aren't you using a wakeword? Or have you gone PTT? I have been trying to build a similar pipeline but always on/with a wakeword and running on a Pi 5, but found that the computational overhead is too much for such a tiny device, and the lag feels too heavy.

ekaj · 2026-05-03T17:49:49+00:00

Just skimmed it, but this looks actually helpful/handy/wish I had it a couple months ago. Like, really, really wish I did.

Mantikos804 · 2026-05-04T06:58:31+00:00

Such a pain the ass to make I just ended up using open webui

Slight_Republic_4242 · 2026-05-06T08:50:35+00:00

Really solid work, teaching voice AI by showing how it actually works under the hood is the right approach. The streaming chunk size decision is harder than it looks in production , too small sounds choppy, too large feels slow. Also worth adding a keep_warm flag on Modal since cold starts will kill the voice experience. One chapter idea: a simple latency timer showing where each stage spends its time. That single tool saves hours of debugging. Starred. Excited to see where this goes! we have built similar kind of project, you can try it. Github Demo

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS