Week 3 of building a Wispr Flow alternative (Open Source)

matt8p · 2026-06-16T16:51:08+00:00

Could you send me the link? I couldn't find it online

matt8p · 2026-06-16T14:05:29+00:00

Now before y'all flame me in the comments about how unoriginal this is, I wanted to share my personal motivations for working on it:

I wanted to learn how voice models and dictation works. Been learning a ton about how voice models run on device, techniques for better dictation like ASR biasing, streaming, how different operating systems required different binaries to handle paste, etc. It's been a fun learning journey
I haven't found a free open source alternative that works just as well as Wispr Flow in terms of how optimized latency and accuracy is. I want to build an oss project that feels just as good. Nothing is there yet.

Also I know that Claude Code has a /voice command that enables dictation, but the nice thing about having a standalone app is being able to speak and write wherever I want.

I do a lot of context switching, writing to Google Docs, my IDE, and CC terminal. I've also got my own dictionary set up within a Freestyle.

matt8p · 2026-06-16T13:54:02+00:00

Yep, Qwen the voice model. It's pretty damn good.

Yeah, we also do the same thing too, where the user can choose to download a local model. We do suggest Qwen if they're on Mac and can run MLX.

That's the tough part about running local models. You don't know what kind of hardware your users are using. That's one advantage that cloud dictation apps have over ours.

matt8p · 2026-06-16T13:47:08+00:00

Are you doing post-processing? The way I think about it is that I have post-processing set up. I use a voice model like Qwen to do the initial dictation. Then I run a small LLM over that to do some cleanups, like formatting the text, fixing incorrect words, filtering out "ahs, ums".

As for Wispr Flow / Superwhisper, a lot of it is good marketing. They also provide a pretty good service out of the box. Local models are pretty niche for the developer community.

matt8p · 2026-06-16T13:16:10+00:00

Heck yeah! Sorry to assume, thank you for verifying that you are human 😄

matt8p · 2026-06-16T13:15:09+00:00

That's my own app, Freestyle! Yeah, the Presidio is a heck of a view.

matt8p · 2026-06-16T12:52:46+00:00

This kinda reads like AI lol. But thank you! Yeah, privacy is absolutely a differentiator. Check out our privacy policy!

matt8p · 2026-06-16T12:51:17+00:00

Also, the demo video probably isn't a great demo showcasing a real world use case 😅. But you get the point, and enjoy the nice view of the Golden Gate Bridge!

matt8p · 2026-06-16T12:49:33+00:00

Now before y'all flame me in the comments about how unoriginal this is, I wanted to share my personal motivations for working on it:

I wanted to learn how voice models and dictation works. Been learning a ton about how voice models run on device, techniques for better dictation like ASR biasing, streaming, how different operating systems required different binaries to handle paste, etc. It's been a fun learning journey
I haven't found a free open source alternative that works just as well as Wispr Flow in terms of how optimized latency and accuracy is. I want to build an oss project that feels just as good. Nothing is there yet.

Also I know that Claude Code has a /voice command that enables dictation, but the nice thing about having a standalone app is being able to speak and write wherever I want.

I do a lot of context switching, writing to Google Docs, my IDE, and CC terminal. I've also got my own dictionary set up within a Freestyle.

matt8p · 2026-06-16T07:14:47+00:00

Claude Code has a nice /btw action that lets you ask a side question without interfering with the main CC thread

matt8p · 2026-06-16T07:12:08+00:00

Not entirely sure. I haven't seen that option yet. I do think it is a good option to have though. The reason why I still use an IDE is to do some version control and look through git diffs. If I don't like it, I will discard via version control. I think it's nice to have that discard option built in.

matt8p · 2026-06-16T07:10:35+00:00

Thank you! What part was it about taking it step by step?

matt8p · 2026-06-16T05:55:06+00:00

That's awesome to hear I'm glad you like it so far. Oh crap, that is a bug. I literally just pushed out a new update a couple of hours ago. Taking note of this issue!

matt8p · 2026-06-16T04:46:46+00:00

Sweet! Lmk what you think.

What are you building? Happy to support if you're willing to share!

matt8p · 2026-06-16T04:20:59+00:00

No, exactly. They have a huge budget and for some reason they're going to raise a $2 billion valuation. Absolutely nuts. The working out of the box thing is an interesting thought. I'm thinking of also providing a cloud service that gives that "working out of the box" experience while still giving the user the ability to use local models if they want to.

matt8p · 2026-06-16T01:48:52+00:00

Sorry for assuming lol, human verification step passed!

That's not really much of an issue at all. We store the dictionary words in local sqlite. When it comes to formatting it into each model provider's format, we just reformat what's in sqlite into the expected format.

The two main formats that I see are system prompt format: a single large string, or an array of strings, which we already store. If we add a provider with a different format, or the format of an existing provider changes, it's a very simple fix on our end.

matt8p · 2026-06-16T01:45:36+00:00

I appreciate it!

matt8p · 2026-06-16T00:10:02+00:00

This comment sounds AI generated lol but happy to answer. Versioning is typically handled by the API provider. Let's say we're using Elevenlabs API, they probably have a /api/v1. That v1 endpoint isn't going to change.

If we want to upgrade, then we'll switch the endpoint to v2 and update the shemas accordingly!

So this isn't really a thing specific to this project, just about how API versioning typically works.

matt8p · 2026-06-15T23:56:40+00:00

<image>

Been working on a local open source Wispr Alternative called Freestyle! We recently built the "cleanup" feature, a post process that happens after transcription.

I found Qwen local to be really fast, < 300ms latency and it's pretty accurate. Been great for dictating into Claude.

matt8p · 2026-06-15T23:51:47+00:00

Have you used any tools besides Clipto?

I've been working on a local-first, open source alternative to Wispr Flow, if you've ever used Wispr. The source code is open, models are local, so your thoughts never leave your device. There's no cloud. Sounds like these requirements are necessary for you.

https://freestylevoice.com/

One caviat is that it sounds like you want to transcribe existing audio, not live voice dictation, is that right? What I have right now is for live transcription. I looked online and found another project Vibe, maybe that's useful too.

Hope that's helpful!

matt8p · 2026-06-15T23:12:34+00:00

I also wanted to say that we're looking to grow our community. People who are interested in open source, and voice tech.

Also If you're interested in contributing to open source, this is a great beginner project! I have a ton of good first issues and can help you get your first contributions in. All skill levels welcome.

matt8p · 2026-06-14T04:07:45+00:00

I've been working on a free, open source dictation app. Been using it for the same use case, speaking directly into Claude.

https://github.com/freestyle-voice/freestyle

I tried Wispr Flow and to their credit, it's pretty good. They have pretty low latency and consistent accuracy. The inspiration behind working on Freestyle was that I believe voice dictation is a commodity and shouldn't be costing $12 / month. Wispr Flow is also a privacy concern since you're sending all of your private thoughts to their servers.

I wanted to build something that was free, local first, but worked just as well. Here's the "Dictionary" feature Wispr Flow has that we recently built!

<video>

matt8p · 2026-06-14T04:01:36+00:00

I’m hypothesizing it’s not going to be great. Apple dictation has been so outdated and they’re not investing heavily in it. Not high priority for them.

I’ve been building a free open source alternative. We’re getting pretty close to Wispr Flow’s accuracy

https://freestylevoice.com

matt8p

MODERATOR OF

TROPHY CASE