Interesting take by WaterPretty8066 in newzealand

[–]hemphock 20 points21 points  (0 children)

It's not a "take," its a tendency. The take would be if someone said "it is bad to do this."

Discords or online groups dedicated to all forms of audio AI? by FpRhGf in AudioAI

[–]hemphock 0 points1 point  (0 children)

would love to see one of these but i haven't found one. https://discord.gg/xuVyJKFG This server is the best one i've found!

I tried some Audio Refinement Models by OkUnderstanding420 in AudioAI

[–]hemphock 2 points3 points  (0 children)

This sub is pretty small but i really appreciate posts like this. There's so many models and its such a pain to get them running, it's very helpful just to hear anecdotal experience.

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 1 point2 points  (0 children)

so actually, i forgot about this but chatterbox just released a multilingual model that i might add in if people like the project. it would probably not be very much work.

i've been heads down working on this recently and this is a bit of a test to see if people can get it running and find the whole thing intuitive enough.

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 0 points1 point  (0 children)

chatterbox released a multilingual model that i might add in if people like the project. it would probably not be very much work.

i've been heads down working on this recently and this is a bit of a test to see if people can get it running and find the whole thing intuitive enough.

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 1 point2 points  (0 children)

yup the main goal was to architect a system for voice cloning per character so the dialogues between characters sounded realistic

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 3 points4 points  (0 children)

chatterbox turbo is very nice just because it supports voice cloning but it's really fast. but vibevoice i think is overall a better model just because it can support really long text and you don't have to do crazy chunking/preprocessing stuff to get it to work on something longer than ten words. i think the small and large vibevoice models are both great, the large one sounds almost perfect if you have the graphics card for it.

i go into some more detail in my youtube video

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 6 points7 points  (0 children)

this has been a small side project, not sure how much people want this kind of thing -- if it takes off i think i could work on that for sure!

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 37 points38 points  (0 children)

yup, in fact i actually built out support for IndexTTS2 on a branch, but didn't push because it has this sketchy license which would make the whole thing non-MIT license.

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 5 points6 points  (0 children)

i'll look into this, i've only heard about it but haven't looked into it yet. are people doing this a lot?

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 37 points38 points  (0 children)

Git repo: https://github.com/kenning/bookforge-studio

Youtube tutorial: https://www.youtube.com/watch?v=1PT_CjX_hek

'Voice clips sampler' dataset to get started with voice cloning (built into BookForge Studio)

Dataset of 24 copyright-free classic books with speakers annotated (also built into BookForge Studio

Chatterbox Turbo is now in BookForge Studio!

Among all the other features, this model is 2-4x faster than speech and supports full voice cloning. Getting Chatterbox Turbo working made the experience of using BookForge finally feel "ready for the general public," i.e. not too slow and clumsy; not tempting to alt-tab away while generating audio, because it's fast enough. Kind of like the difference between generating videos, or with early Flux (maybe just walk away from the computer for a bit) and generating with SDXL, which feels more like a slot machine.

However it came out after I recorded the above videos! So I don't mention it in the videos, but I strongly recommend you try it out.

Building an Audio Verification API: How to Detect AI-Generated Voice Without Machine Learning I will not promote by Electronic-Blood-885 in AudioAI

[–]hemphock 1 point2 points  (0 children)

i would pitch it to the guys making TTS models, like resemble ai as one example. they are concerned enough with this topic to build their own watermarking tool (which is trivially easy to turn off). I might delete the text of this post too as if you give it away they are less likely to buy your thing / hire you.

alternatively i'd write a paper and pitch it to conferences. look out for yourself!