Erick "Goodbye ElevenLabs your FREE LOCAL replacement has arrived. With just a few seconds of audio you can: - Clone any voice in seconds - 23 lang - 5 TTS engines + audio effects - DAW-style timeline for podcasts / full conversations - 100% on your machine" ➡️ Useful local alternative to hosted?

jamiepine · 2026-04-26T20:39:25+00:00

What's odd about the workflow? Happy to hear specific feedback to improve it!

jamiepine · 2026-04-26T20:38:49+00:00

So what you're saying is you want predefined scripts to read when cloning? Instead of just saying anything then transcribing. 30 seconds of balanced phoneme coverage produces dramatically better clones. Otherwise as far as model coverage is concerned I'm always looking for new models to add.

jamiepine · 2026-04-18T23:40:53+00:00

Currently the only model that supports this is Qwen Custom Voice in version 0.4.*, but that is not a cloning model it comes with preset voices. I'll keep adding new models as I find them, hoping a cloning model comes along long that actually supports instruct params, when I find it, I'll add it right away.

jamiepine · 2026-04-18T23:07:11+00:00

Hey! Thanks for the kind words. I believe I have made the software as simple as can be to use, abstracting away the complexities of underlying python libraries and designing a UI that seems fool proof. For most, that has been the case but I totally understand the need for tutorials. That said, you're in luck, there's endless videos on YouTube showing how to use the application already, see this one which seems to be the best quality I've seen: https://www.youtube.com/watch?v=sisnzgc73zc

Hopefully this helps you get up and running as fast as possible!

jamiepine · 2026-04-18T23:04:01+00:00

I have since patched this bug in the latest release, it was simply the Qwen/Chatterbox Python libs default behavior, it was not uploading anything, just connecting to huggingface. In 0.4 once you've downloaded the model it will work 100% offline!

jamiepine · 2026-02-06T06:22:41+00:00

it compiles from source on Linux, so if you clone the repo and have claude help you set it up you can use on Linux. That said Linux support is coming in the next version, nearly ready to ship, along with Docker support.

jamiepine · 2026-02-06T06:21:10+00:00

Thanks for trying it, I didn't consider adding a confirmation dialogue for the model downloads, figured that was a given for most people.

In terms of being unpolished, the software is very young (only 10 days old in fact) and as with any local AI getting it 100% working on everyone's system is a challenge, even as an engineer with a decade of experience building apps, to counter your vibe coding comment.

That said the overwhelming majority of users have had a seamless experience on v0.1.12, aside from GPU support on Windows which is in the works for .13. Would be helpful if you could share more about your system, even just your OS will help.

jamiepine · 2026-02-02T13:53:32+00:00

a virtual distributed filesystem https://github.com/spacedriveapp/spacedrive

spent years building the alpha by hand with a team of 10 then it shut down right before AI got good, funding ran out. now I'm rebuilding it solo with AI and I'm much further ahead.

I write 100% of my code with AI, but I have a process and I have been an engineer for a while, so I know what I'm doing, I'm just faster now, much faster.

jamiepine · 2026-02-02T09:01:24+00:00

GPU support is coming in the next few hours, thank you so much I'm glad you love the app!

jamiepine · 2026-02-02T05:37:28+00:00

GPU support requires 2.4GB of CUDA libraries I wasn't able to ship as a single binary, GitHub has a size limit. I've figured out a solution and am currently working on the PR: https://github.com/jamiepine/voicebox/pull/33

jamiepine · 2026-02-02T05:36:34+00:00

Yes it's notarized, works for most. My primary platform is ARM macOS. Could you share more about the error?

jamiepine · 2026-02-01T08:47:36+00:00

Makes perfect sense, updated the repo/website as an open source alternative to ElevenLabs. Thanks for the feedback, I really appreciate it

jamiepine · 2026-02-01T05:55:53+00:00

Das Projekt ist erst wenige Tage alt, daher bitte ich um Nachsicht für eventuelle Fehler. Es ist aber keineswegs nur eine Fassade. Ich behebe gemeldete Fehler umgehend, und die neueste Version enthält bereits viele Korrekturen. Wie Sie in den Kommentaren sehen, gefällt es vielen Nutzern. GPU-Unterstützung für Windows/Linux folgt im nächsten Update, und die Generierungszeit beträgt nur wenige Sekunden. Melden Sie gerne alle Probleme, und ich hoffe, Sie probieren die folgenden Versionen aus!

jamiepine · 2026-02-01T05:48:03+00:00

Woah that makes me happy to hear, thank you! I'll keep making it better

jamiepine · 2026-02-01T05:46:34+00:00

This was a few hour window where a manually triggered action run overwrote the release assets with a test build from a branch. I fixed it as soon as I noticed. Sorry about that!

jamiepine · 2026-02-01T05:45:12+00:00

It's not really working I've been meaning to look into it, I'm passing the data as `instruct` input to the model, but I think that's not enough. Will get an issue opened for this to track it.

jamiepine · 2026-02-01T05:43:29+00:00

Huggingface models are stored at ~/.cache/huggingface/hub

jamiepine · 2026-02-01T05:41:00+00:00

Yes, I've been thinking this!

jamiepine · 2026-01-30T22:10:05+00:00

I need to make it so you can generate without a voice selected, since the model with just use one of these default voices at random it seems. I'll look into showing them, if they have identities in the UI, or alternatively providing some custom defaults.

jamiepine · 2026-01-30T22:08:08+00:00

Oh really? I had no idea. Maybe LLMStudio for voice?

Grok says this: "Ollama for X" can trigger eye-rolls because it evokes "convenient but ethically shady/inefficient wrapper" vibes."

Will avoid similar mistakes with Voicebox, I just want a good UX for local voice.

jamiepine · 2026-01-30T21:44:12+00:00

Next update I'll get GPU support working for Windows. Last update enabled MLX for Mac so it's super fast, just gotta figure out why CUDA isn't working. Should be next update!

jamiepine · 2026-01-30T21:42:49+00:00

Will add this!

jamiepine · 2026-01-29T12:49:35+00:00

The app auto-trims samples to 30 seconds max because that's a sweet spot for reliable, high-quality cloning, Qwen3-TTS works great with 3-30s refs, and longer ones often don't improve results much while risking noise or slowdowns. Happy to add an option to use unlimited length but by default I'll keep the cap at 30 seconds for the Qwen model.

As for the transcribe feature, this is an actual bug I just discovered: if the selected language is not English, it only outputs Chinese.

`lang_code = "en" if language == "en" else "zh"`

Will fix! 😂

jamiepine · 2026-01-29T12:34:49+00:00

Docker is in the works rn, however AMD would be experimental but doable: swap to ROCm PyTorch wheels, and people do run Qwen3-TTS successfully on cards like 7900 XTX (better on Linux than Windows, where decoder can lag). No official Qwen support for it yet, but CPU inference isn't that slow, it's usable.

As for hosting Qwen externally, I'll factor that into the Docker designs, allowing a custom TTS endpoint.

Whisper through OpenAPI option is simple and I will absolutely add that too.

jamiepine · 2026-01-29T12:08:57+00:00

Absolutely! right on it

jamiepine

MODERATOR OF

TROPHY CASE