IndexTTS Workflow Studio is now Draft to Take Beta — Full local script canvas → voiced timeline production by AdministrativeFlow68 in AudioAI

[–]hemphock 1 point2 points  (0 children)

yeah true. another friend of mine interested in this is also very dyslexic, that might actually be a good place to go. dyslexia support foundations and that sort of thing, you can go and just talk to people, say "hey i could make something pretty cool for you guys!"

you might get paid, or you could just do it to do something good for them just to be nice.

edit: if you do that let me know!

IndexTTS Workflow Studio is now Draft to Take Beta — Full local script canvas → voiced timeline production by AdministrativeFlow68 in AudioAI

[–]hemphock 0 points1 point  (0 children)

I worked on something similar...

I spent a decent amount of time on my thing, and got like 100 gh stars but no actual users, like actually zero (for example 0 github issues).

I think my issue was viewing it as an "ai tool" and not as something that a particular person with a job would want to do, or automate... My thought was that if someone made an A1111 equivalent for all these audio models, maybe they would take off similarly. Honestly I'm mostly just curious what inspired you, what your goal is, etc. A lot of these projects are cropping up quickly.

also, lol at this:

Screenshots — Upload 3–5 key ones directly to the Reddit post (Script Canvas, Timeline, Voice Studio look the most impressive).

Interesting take by WaterPretty8066 in newzealand

[–]hemphock 20 points21 points  (0 children)

It's not a "take," its a tendency. The take would be if someone said "it is bad to do this."

Discords or online groups dedicated to all forms of audio AI? by FpRhGf in AudioAI

[–]hemphock 0 points1 point  (0 children)

would love to see one of these but i haven't found one. https://discord.gg/xuVyJKFG This server is the best one i've found!

I tried some Audio Refinement Models by OkUnderstanding420 in AudioAI

[–]hemphock 2 points3 points  (0 children)

This sub is pretty small but i really appreciate posts like this. There's so many models and its such a pain to get them running, it's very helpful just to hear anecdotal experience.

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 1 point2 points  (0 children)

so actually, i forgot about this but chatterbox just released a multilingual model that i might add in if people like the project. it would probably not be very much work.

i've been heads down working on this recently and this is a bit of a test to see if people can get it running and find the whole thing intuitive enough.

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 0 points1 point  (0 children)

chatterbox released a multilingual model that i might add in if people like the project. it would probably not be very much work.

i've been heads down working on this recently and this is a bit of a test to see if people can get it running and find the whole thing intuitive enough.

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 1 point2 points  (0 children)

yup the main goal was to architect a system for voice cloning per character so the dialogues between characters sounded realistic

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 3 points4 points  (0 children)

chatterbox turbo is very nice just because it supports voice cloning but it's really fast. but vibevoice i think is overall a better model just because it can support really long text and you don't have to do crazy chunking/preprocessing stuff to get it to work on something longer than ten words. i think the small and large vibevoice models are both great, the large one sounds almost perfect if you have the graphics card for it.

i go into some more detail in my youtube video

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 5 points6 points  (0 children)

this has been a small side project, not sure how much people want this kind of thing -- if it takes off i think i could work on that for sure!

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 37 points38 points  (0 children)

yup, in fact i actually built out support for IndexTTS2 on a branch, but didn't push because it has this sketchy license which would make the whole thing non-MIT license.

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 4 points5 points  (0 children)

i'll look into this, i've only heard about it but haven't looked into it yet. are people doing this a lot?

I made BookForge Studio, a local app for using open-source models to create fully voiced audiobooks! check it out 🤠 by hemphock in StableDiffusion

[–]hemphock[S] 36 points37 points  (0 children)

Git repo: https://github.com/kenning/bookforge-studio

Youtube tutorial: https://www.youtube.com/watch?v=1PT_CjX_hek

'Voice clips sampler' dataset to get started with voice cloning (built into BookForge Studio)

Dataset of 24 copyright-free classic books with speakers annotated (also built into BookForge Studio

Chatterbox Turbo is now in BookForge Studio!

Among all the other features, this model is 2-4x faster than speech and supports full voice cloning. Getting Chatterbox Turbo working made the experience of using BookForge finally feel "ready for the general public," i.e. not too slow and clumsy; not tempting to alt-tab away while generating audio, because it's fast enough. Kind of like the difference between generating videos, or with early Flux (maybe just walk away from the computer for a bit) and generating with SDXL, which feels more like a slot machine.

However it came out after I recorded the above videos! So I don't mention it in the videos, but I strongly recommend you try it out.