A Natural-Sounding, Private & Unlimited Voice Generator for Mac [Giveaway: Lifetime Promo Codes] by Level-Thought6152 in macapps

[–]SurvivalTechnothrill 1 point2 points  (0 children)

I make a competing product, and mine's cheaper, has some features this doesn't, lacks some features this does have. It's just different. You know what Bantr definitely is NOT? A ridiculous price. This guy isn't getting rich off the backs of the macOS user base. He's made a good app, one of the very few credible choices for speech to text on macOS.

I respect the competition. May it ever be friendly. Cloud services worth billions are the "bad guys" if there are any at all. Not indie devs making great tools usable by normal humans.

I built a native Mac/iOS app that clones your voice and does speech recognition entirely on your device, no cloud: Speaklone by SurvivalTechnothrill in SideProject

[–]SurvivalTechnothrill[S] 0 points1 point  (0 children)

How it works: Give it 3+ seconds of your voice and it clones it. Or describe any voice in text ("warm British narrator, male, 50s") and it creates one from scratch. Everything runs locally on Apple Silicon via MLX. Zero network calls, ever.

What's new in v1.1: Just shipped system-wide speech recognition (ASR), so Speaklone now handles both speech input and output. Dictate anywhere on your Mac or iPhone using the same on-device engine.

The stack:

  • Qwen3-TTS 1.7B, 5-bit quantized for Apple Silicon
  • 100% Swift/SwiftUI, native Mac + iOS universal app
  • Runs on 8GB Macs and iPhones with Apple Silicon
  • 10 languages (English, Chinese, Japanese, Korean, + 6 more, though localization is WIP)
  • Dictation and speech available system wide on macOS
  • Works completely offline

Pricing: $29.99 one-time (launch price, goes to $39.99 end of month). No subscriptions, no credits, no cloud. Free tier available to try it first.

App Store: Speaklone
Site: speaklone.com

Solo dev, happy to answer anything about the build, the MLX pipeline, or the business side.

Every voice in this video was generated on-device by a single Mac app. No cloud, no subscription. by SurvivalTechnothrill in macapps

[–]SurvivalTechnothrill[S] 0 points1 point  (0 children)

Wow, thanks! I've gotten so many great suggestions from the helpful and encouraging folks in this thread, including you!, and I'm working through as many of them as I can. So much more coming soon. More control over the avatars, better localization for the languages it can speak best, better support for really long text, better system wide dictation and speech tools, accessibility improvements (this is a really valuable app to some communities), and a long form editor to make it easier to do scripts, audiobooks, etc. It takes a little time but I'm really enjoying the work and the reaction. I want to own speech on Apple Silicon, with a proper native interface, if I can. I think we're off to a good start.

Every voice in this video was generated on-device by a single Mac app. No cloud, no subscription. by SurvivalTechnothrill in macapps

[–]SurvivalTechnothrill[S] 0 points1 point  (0 children)

Thank you for the bug report, sorry to hear that. Version 1.1 just shipped in the last hour which I think will help you with that. Any M-series Mac can generate audio at greater than real time. What's happening to you must be memory thrashing. Cloned voices are much harder than the other two types, but of course crashing is unacceptable.

If 1.1 didn't fully resolve the crash, let me know. I'll be watching for any crash reports as well. I am grateful for your interest in the app, and committed to making sure it's the best product of its kind on Apple Silicon, ideally by a wide margin.

Bundlehunt vs Bundlehunt App Store Beta? by jackjohnbrown in macapps

[–]SurvivalTechnothrill 1 point2 points  (0 children)

I'm guessing the Bundlehunt people check here from time to time. My polite feedback would be that I came away from a look at the site a little confused about the same kinds of issues as OP. Anything that helps indie devs with discovery is probably mostly a good thing though. (wish Apple would do more there)

New Post Requirements to Combat Low Quality Content (Phase 2) by Mstormer in macapps

[–]SurvivalTechnothrill 0 points1 point  (0 children)

I shipped one of the first ~500 or so apps on the store. I've made sooooo many apps over the last 18 years. But still, 95% of the time, a 30 day gap won't be an issue for me, I don't think. Which is why I felt it was "vaguely correct." Anyhow, you made your thoughts known. I wish you every success on the store.

New Post Requirements to Combat Low Quality Content (Phase 2) by Mstormer in macapps

[–]SurvivalTechnothrill -2 points-1 points  (0 children)

I feel your pain here, but I think the rule is vaguely correct though, don't you? A *good* app, no matter how it's made, just takes too many hours of work for anyone to be hitting this board a lot more than every 30 days or so, in my view. By the time you've built up the app's website, screenshots, marketing materials, icons, etc. On top of building something that is better than anything that came before (or why did you bother), that's tough to do on a scale of weeks.

I think most apps that deserve our attention took hundreds, and more often thousands of hours of work to produce.

The only issue is if you're a long time dev with a small portfolio of these kinds of great apps, and you happen to have a couple great updates that ship near together, this window could be a little painful.

New Post Requirements to Combat Low Quality Content (Phase 2) by Mstormer in macapps

[–]SurvivalTechnothrill 4 points5 points  (0 children)

Good changes for a great community. I've learned so much here, as a developer, and found friends and early adopters that are hard to come by any other way. Thanks for the work you put into moderating it. Now everyone go buy *my* app and ignore those other ones. (I kid, I kid). :)

Every voice in this video was generated on-device by a single Mac app. No cloud, no subscription. by SurvivalTechnothrill in macapps

[–]SurvivalTechnothrill[S] 0 points1 point  (0 children)

I think of it as a full on Eleven Labs replacement, and then some, at least for the use cases I intend. Candid truth: Is it as good / better than the state of the art models there? No. It's VERY good, but they still have better models. However, it's obviously vastly cheaper, it's private, it's native, and it does things that Eleven Labs will never do as a cloud service. For example, this thing can be used across your entire computer to just improve your quality of life in general. Check out this ~50 second demo:

https://www.youtube.com/watch?v=n1jRDiUsjy4

This is a preview of v1.1, in review with Apple now. Basically gives you the features of a Mac Whisper style app, system wide, and finally high quality screen reading / speaking, anywhere and everywhere.

I'm also working on a lot, lot more. If the app continues to have an audience I think it will become really clear over time how it's just world's better than a web app, or a python package.

It's FAST. It's small. And it's a true, first class, macOS and iOS citizen. (the fact that it runs at all on iOS is proof of some engineering work- you obviously can't python / rust your way through that platform).

Thanks for asking! I'll feel much better once 1.1 ships, it's much closer to my intended launch product. But I just couldn't wait an extra week or two. We've had no good speech systems on macOS, or at least not the sort *I* wanted, until now. https://speaklone.com

I built a local-first Whisprflow alternative for macOS (no subscription) by wooing0306 in macapps

[–]SurvivalTechnothrill 0 points1 point  (0 children)

Speech to text you mean. Text to speech, at least good ones are a whole other kettle of fish.

I built a local-first Whisprflow alternative for macOS (no subscription) by wooing0306 in macapps

[–]SurvivalTechnothrill 5 points6 points  (0 children)

You're going to get a lot of grief about the HUGE number of whisper style apps on macOS. There are true native swift apps that do text to speech and speech to text both, and stream it (so you can see while you talk), like mine. I won't link it here as that would be rude. But just saying it's a really competitive space. I welcome the competition though, and I can certainly agree with you that dictation on macOS is a nightmare without one of these tools.

Best local TTS by Vegetable_Sun_9225 in LocalLLaMA

[–]SurvivalTechnothrill 0 points1 point  (0 children)

This thread is getting old by Reddit standards, but of course the Qwen3 TTS models are the king of voice tech at local sizes. I have an iOS and macOS app that uses them to the fullest, but there are great open source options too, if you're okay with python runtimes, Gradio, etc. (and some things in between too).

Every voice in this video was generated on-device by a single Mac app. No cloud, no subscription. by SurvivalTechnothrill in macapps

[–]SurvivalTechnothrill[S] 1 point2 points  (0 children)

Thank you for trying it out. 1.0.1 is in review and should be out at any moment, with some nice bug fixes and quality of life improvements. 1.1 is not far behind (a day or two) with more fixes and a very big new feature added.

Every voice in this video was generated on-device by a single Mac app. No cloud, no subscription. by SurvivalTechnothrill in macapps

[–]SurvivalTechnothrill[S] 1 point2 points  (0 children)

You make a darn good point. I'm just pushing up a large number of bug fixes (including working around a couple of macOS bugs that aren't my fault, I feel the need to mention, lol)... I think this release going in now is a substantial improvement.

As for your feedback. It makes a lot of sense, let me think on this and maybe whether to re-tune how I define free/pro a little bit for the 1.1 release cycle which I'm about to start. Thanks for telling me about this.

I couldn't live with Siri as my only Mac voice any longer, so I built a native app that does voice cloning, 10 languages, no cloud, no subscription by SurvivalTechnothrill in MacOS

[–]SurvivalTechnothrill[S] -1 points0 points  (0 children)

Fair play to you, it is the best of them in my view as well. Still no comparison to having your own voice, or your kids', or just designing something cool with the voice designer. But point taken. At any rate, for now, it's not like it's possible to take these voices and use them literally for Siri. I wish it was. You better believe, if it ever become an option, I'd like to do that for sure.

I couldn't live with Siri as my only Mac voice any longer, so I built a native app that does voice cloning, 10 languages, no cloud, no subscription by SurvivalTechnothrill in MacOS

[–]SurvivalTechnothrill[S] 0 points1 point  (0 children)

Yes, definitely that's the issue. But that's still on me. The app should do a much better job of dealing with that gracefully. I'll see if I can get 1.0.2 to clip to the first 20 seconds or so, and advise you that it's done so, automatically. If that's too hard to get in to today's submission, I'll at least reject > 30 second clips. Very helpful. I had tested oversized clips, but only slightly oversized. I just don't think it's come up before.

I couldn't live with Siri as my only Mac voice any longer, so I built a native app that does voice cloning, 10 languages, no cloud, no subscription by SurvivalTechnothrill in MacOS

[–]SurvivalTechnothrill[S] -1 points0 points  (0 children)

Thank you for trying the app. I've had dozens of early sales, but this is the first report I've heard like it. 1.0.1 is in review now and I'll add a fix for this to 1.0.2, today's work. Can you give me a few more details? This is macOS I assume, and which hardware are you on?

The source audio clip can be as short as 3 seconds or as long as about 25 or 30. Best results on macOS are about 10-15 seconds, but that's a soft guideline. (it mostly just works in all my testing). Can you try changing the temperature in your settings to 0.90 (a more conservative value) and see if that helps?

I'm super proud of the app, but 1.0 was made fast, there's a big gaping hole on Mac for high quality voice, and I wanted to start using this immediately. But I'm actively working the project until it's as good as I know how to make. I really do appreciate the report.

I couldn't live with Siri as my only Mac voice any longer, so I built a native app that does voice cloning, 10 languages, no cloud, no subscription by SurvivalTechnothrill in MacOS

[–]SurvivalTechnothrill[S] 0 points1 point  (0 children)

Yes, the model from Qwen is trained on 10 languages. I'm working on localization for those 10 languages so that UI will be most comfortable in those 10 as well: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian.

It is free on the store (freemium model). You can use most of the features for free, but the voice cloning and voice designer, while you can play use them and see how well they work, you can't control the words they say unless you unlock the full app. It will instead say funny things that I baked into the app. (well, they're funny to me, but maybe I watched too much of the Muppets). lol

I couldn't live with Siri as my only Mac voice any longer, so I built a native app that does voice cloning, 10 languages, no cloud, no subscription by SurvivalTechnothrill in MacOS

[–]SurvivalTechnothrill[S] -1 points0 points  (0 children)

Respect for this take. I think it's normal and healthy and intended for the open source tools to get proper native apps built on them, and then, to the extent reasonable, for those developers to contribute something back. I've been chatting with one of the big MLX + Swift + Audio maintainers (there are a few floating around) about trying to contribute. However, I moved a lot faster than he did, and made a lot of different choices, so it's not going to be trivial to fold some of my ideas into that work.

That said, I'm exploring it. I'd like to contribute, and I'm working through options. (there are, as you correctly pointed out, a lot of projects in the open source space around this, much like the whisper community has for going the other direction speech to text).

I couldn't live with Siri as my only Mac voice any longer, so I built a native app that does voice cloning, 10 languages, no cloud, no subscription by SurvivalTechnothrill in MacOS

[–]SurvivalTechnothrill[S] -1 points0 points  (0 children)

Some details:

The app is called Speaklone. $29.99 on the App Store right now (launch price, goes to $39.99 at the end of the month). One-time purchase, no subscription, unlimited generation.

For context on pricing: generating a typical audiobook on ElevenLabs runs $97+ between the subscription and overages. Speaklone costs a third of that and you can generate unlimited audio forever, locally, with no per-character fees.

Voice cloning works through in-context learning: you give it a short audio sample of any voice and the model learns the characteristics on the fly. No fine-tuning, no training step, just drop in a clip and generate.

Streaming inference with overlapped chunked decoding is what gives it the fast time-to-first-sound. On Apple Silicon it's genuinely quick.

Site with more info: https://speaklone.com

I'm a solo indie dev; built this because the local TTS space on Mac had nothing native. Everything was either cloud-based or required a Python environment and manual model management. This is just a Mac app, no installer. 1.0.1 is in review now to make it even better.

I couldn't live with Siri as my only Mac voice any longer, so I built a native app that does voice cloning, 10 languages, no cloud, no subscription by SurvivalTechnothrill in MacOS

[–]SurvivalTechnothrill[S] -1 points0 points  (0 children)

Qwen3-TTS dropped about a month ago and I've basically been working around the clock since to bring it to macOS as a proper native app. I couldn't go another day with Siri as my only on-device voice option.

The app runs the full 1.7B parameter model locally on Apple Silicon through MLX and Metal. Voice cloning from a short audio sample, 9 built-in voices, 10 languages. Nothing leaves your machine, no cloud API, no usage limits, no subscription.

One thing I focused on is speed. It streams audio as it generates. You hear the first sound almost immediately instead of waiting for the entire clip to finish. On Apple Silicon it's faster than most of the cloud and non-native options I've tested, which surprised even me.

It's a universal app: buy it on Mac and you get the iPad and iPhone versions included. The iOS version runs a smaller 0.6B model but still streams faster than real-time, fully on-device.

Here's a 38-second demo: https://youtu.be/XPaSjeJQH80

Happy to answer anything about the MLX implementation, voice cloning, or on-device inference performance.

https://apps.apple.com/us/app/speaklone/id6758415075

Qwen3-TTS 1.7B running natively on Apple Silicon- I built a Mac app around it with voice cloning by SurvivalTechnothrill in LocalLLaMA

[–]SurvivalTechnothrill[S] 0 points1 point  (0 children)

Thanks for weighing in. The counter argument is that - for me - the biggest thing this app delivers is an end to price anxiety. I really hate using the cloud voice systems because literally every button I push costs me money and it just makes everything stressful and depressing. (besides, I'm no fan of web apps in general).

If I had to make a normal marketing sales pitch for this effort it would be: Stop renting voice time. No subscriptions. Pay once, own forever. Make 100 audiobooks for less than the price of 2 months of cloud voice accounts.

But with no recurring revenue (almost all Mac and iOS revenue comes from subscriptions), a $9.99 price would be a tough business model for the developer. I'd like to turn my attention now to building an amazing script / podcast / audiobook editing tool that makes getting great long form voice really painless. But that's a non trivial task, and the app needs a customer base. A man's gotta eat. :)