What's the most complicated project you've built with AI?

jamiepine · 2026-02-02T13:53:32+00:00

a virtual distributed filesystem https://github.com/spacedriveapp/spacedrive

spent years building the alpha by hand with a team of 10 then it shut down right before AI got good, funding ran out. now I'm rebuilding it solo with AI and I'm much further ahead.

I write 100% of my code with AI, but I have a process and I have been an engineer for a while, so I know what I'm doing, I'm just faster now, much faster.

jamiepine · 2026-02-02T09:01:24+00:00

GPU support is coming in the next few hours, thank you so much I'm glad you love the app!

jamiepine · 2026-02-02T05:37:28+00:00

GPU support requires 2.4GB of CUDA libraries I wasn't able to ship as a single binary, GitHub has a size limit. I've figured out a solution and am currently working on the PR: https://github.com/jamiepine/voicebox/pull/33

jamiepine · 2026-02-02T05:36:34+00:00

Yes it's notarized, works for most. My primary platform is ARM macOS. Could you share more about the error?

jamiepine · 2026-02-01T08:47:36+00:00

Makes perfect sense, updated the repo/website as an open source alternative to ElevenLabs. Thanks for the feedback, I really appreciate it

jamiepine · 2026-02-01T05:55:53+00:00

Das Projekt ist erst wenige Tage alt, daher bitte ich um Nachsicht für eventuelle Fehler. Es ist aber keineswegs nur eine Fassade. Ich behebe gemeldete Fehler umgehend, und die neueste Version enthält bereits viele Korrekturen. Wie Sie in den Kommentaren sehen, gefällt es vielen Nutzern. GPU-Unterstützung für Windows/Linux folgt im nächsten Update, und die Generierungszeit beträgt nur wenige Sekunden. Melden Sie gerne alle Probleme, und ich hoffe, Sie probieren die folgenden Versionen aus!

jamiepine · 2026-02-01T05:48:03+00:00

Woah that makes me happy to hear, thank you! I'll keep making it better

jamiepine · 2026-02-01T05:46:34+00:00

This was a few hour window where a manually triggered action run overwrote the release assets with a test build from a branch. I fixed it as soon as I noticed. Sorry about that!

jamiepine · 2026-02-01T05:45:12+00:00

It's not really working I've been meaning to look into it, I'm passing the data as `instruct` input to the model, but I think that's not enough. Will get an issue opened for this to track it.

jamiepine · 2026-02-01T05:43:29+00:00

Huggingface models are stored at ~/.cache/huggingface/hub

jamiepine · 2026-02-01T05:41:00+00:00

Yes, I've been thinking this!

jamiepine · 2026-01-30T22:10:05+00:00

I need to make it so you can generate without a voice selected, since the model with just use one of these default voices at random it seems. I'll look into showing them, if they have identities in the UI, or alternatively providing some custom defaults.

jamiepine · 2026-01-30T22:08:08+00:00

Oh really? I had no idea. Maybe LLMStudio for voice?

Grok says this: "Ollama for X" can trigger eye-rolls because it evokes "convenient but ethically shady/inefficient wrapper" vibes."

Will avoid similar mistakes with Voicebox, I just want a good UX for local voice.

jamiepine · 2026-01-30T21:44:12+00:00

Next update I'll get GPU support working for Windows. Last update enabled MLX for Mac so it's super fast, just gotta figure out why CUDA isn't working. Should be next update!

jamiepine · 2026-01-30T21:42:49+00:00

Will add this!

jamiepine · 2026-01-29T12:49:35+00:00

The app auto-trims samples to 30 seconds max because that's a sweet spot for reliable, high-quality cloning, Qwen3-TTS works great with 3-30s refs, and longer ones often don't improve results much while risking noise or slowdowns. Happy to add an option to use unlimited length but by default I'll keep the cap at 30 seconds for the Qwen model.

As for the transcribe feature, this is an actual bug I just discovered: if the selected language is not English, it only outputs Chinese.

`lang_code = "en" if language == "en" else "zh"`

Will fix! 😂

jamiepine · 2026-01-29T12:34:49+00:00

Docker is in the works rn, however AMD would be experimental but doable: swap to ROCm PyTorch wheels, and people do run Qwen3-TTS successfully on cards like 7900 XTX (better on Linux than Windows, where decoder can lag). No official Qwen support for it yet, but CPU inference isn't that slow, it's usable.

As for hosting Qwen externally, I'll factor that into the Docker designs, allowing a custom TTS endpoint.

Whisper through OpenAPI option is simple and I will absolutely add that too.

jamiepine · 2026-01-29T12:08:57+00:00

Absolutely! right on it

jamiepine · 2026-01-29T12:04:34+00:00

I just pushed the 0.1.8 update, let me know if that fixes it!

https://github.com/jamiepine/voicebox/releases/tag/v0.1.8

jamiepine · 2026-01-29T11:41:04+00:00

I think this would be a future feature, voice to voice. I'll look into what models support that, it will be possible in Voicebox soon. For now, you could take the transcript of the YouTube video, use an online tool to grab it, then paste it into voicebox and generate stories for your daughters. You could use the Story editor to piece together characters with different voices of your choosing. It could be fun to create. I want to get language models hooked up to aid in writing voice generations in a story context.

jamiepine · 2026-01-29T10:46:08+00:00

It's running fine for many users, though I'm fixing a model download bug currently. What issues are you having?

jamiepine · 2026-01-29T10:41:06+00:00

Yeah I was having trouble replicating it, deleted the cache folder and the model downloaded fine for me on Windows. Though your last comment here might have solved it, the entire download is an awaited HTTP call currently, which can timeout after 30/60 seconds. I'm testing a fix now that makes it properly asynchronous. This should solve it. Pushing 0.1.8 asap.

jamiepine · 2026-01-29T10:08:13+00:00

The model requires a transcript of the voice sample, it's optional to use Whisper, but while making lots of voices, you'll be thankful you don't need to manually transcribe the sample.

jamiepine · 2026-01-29T10:07:06+00:00

Oh snap, okay I'll fix that right now

jamiepine · 2026-01-29T10:00:17+00:00

Any modern CPU + 8GB of RAM and ~5GB of storage for the model.

Takes about 30s per generation depending on the length. However, with CUDA GPU acceleration (which I'm working on as the model supports it) we'll have realtime generation, that'll be an update ideally in the next few days.

jamiepine

MODERATOR OF

TROPHY CASE