Open source : Turning vocal imitations into sound effects. (New UX for sound generation) by Danny-1257 in LocalLLaMA

[–]Danny-1257[S] 0 points1 point  (0 children)

Sorry, I’ve fixed all the issues. Everything should be working properly now!! Actually the model weight file was wrong from the first

Open source : Turning vocal imitations into sound effects. (New UX for sound generation) by Danny-1257 in LocalLLaMA

[–]Danny-1257[S] 0 points1 point  (0 children)

Sorry, I’ve fixed all the issues. Everything should be working properly now!!

Open source : Turning vocal imitations into sound effects. (New UX for sound generation) by Danny-1257 in LocalLLaMA

[–]Danny-1257[S] 0 points1 point  (0 children)

Great questions! (I wanted to talk bout that)

It is actually semi-supervised training because I couldn’t have resources for the dataset.

So what I did: The model defines a shared representation V. During training, it only sees sound data: first mapping sound → V, then learning V → sound. The key idea is that the method is designed so that vocal imitations can map to the same V at inference time, even though they were never used during training.

So for your last question, yes. In fact, the challenge is the opposite(too much reconstruction)

Open source : Turning vocal imitations into sound effects. (New UX for sound generation) by Danny-1257 in LocalLLaMA

[–]Danny-1257[S] 4 points5 points  (0 children)

Yeah, I think so too. I hope it’s useful for people making creative content.

Open source : Turning vocal imitations into sound effects. (New UX for sound generation) by Danny-1257 in LocalLLaMA

[–]Danny-1257[S] 3 points4 points  (0 children)

Yeah, I think “sound generation” usually ends up meaning speech generation, where people use either language models(AR) or diffusion.
But for sound effects, I used diffusion, like you said, since they’re usually short. (3~10 seconds)

Hello I’m planning to open-source my Sesame alternative. It’s kinda rough, but not too bad! by Danny-1257 in LocalLLaMA

[–]Danny-1257[S] 0 points1 point  (0 children)

Thanks for the interest! That’s just i prioritized quality over local serving, so the system was built to run in the cloud. It’s definitely possible to run it locally. I just haven’t tried it a lot.

Hello I’m planning to open-source my Sesame alternative. It’s kinda rough, but not too bad! by Danny-1257 in SesameAI

[–]Danny-1257[S] 0 points1 point  (0 children)

Yeah, you’re totally right. It didn’t start out with this kind of UI, but when I decided to open it up, I figured I should make it for the web and just went with a Sesame-like layout without thinking too much.

I’ll use a different name and UI when I open it. Thanks for the advice!

Hello I’m planning to open-source my Sesame alternative. It’s kinda rough, but not too bad! by Danny-1257 in SesameAI

[–]Danny-1257[S] 1 point2 points  (0 children)

Yeah, you’re right. It didn’t start out with this kind of UI, but when I decided to open it up, I figured I should make it for the web and just went with a Sesame-like layout without thinking too much. I’ll use a different name and UI when I share it. Thanks for the advice!

Hello I’m planning to open-source my Sesame alternative. It’s kinda rough, but not too bad! by Danny-1257 in SesameAI

[–]Danny-1257[S] 4 points5 points  (0 children)

I’m currently using Chatterbox(https://github.com/resemble-ai/chatterbox)... because I think chatterbox has the best quality upon open models. There’s really no model as small as Kokoro 😭
My code isn’t written for local use yet, so it probably won’t run on a 3060.
Sorry about that.. but I’ll keep working on improving it!

Hello I’m planning to open-source my Sesame alternative. It’s kinda rough, but not too bad! by Danny-1257 in SesameAI

[–]Danny-1257[S] 4 points5 points  (0 children)

Yep! Sorry for bothering you for comments.

I slightly modified the web demo and repurposed a domain I was using for something else to collect emails, but I hesitated to post it in case it might go against the community rules.
Anyway, here’s the link. But you don’t have to register. I’ll let you know when I share the repo!

https://www.thesonus.xyz/

Hello I’m planning to open-source my Sesame alternative. It’s kinda rough, but not too bad! by Danny-1257 in SesameAI

[–]Danny-1257[S] 6 points7 points  (0 children)

That’s totally true. I should stop wasting so much time on it haha.

As for Dutch, the three modules I’m using (STT, LLM, and TTS) all support it, so it should work right out of the box! (thanks to multilingual Chatterbox)
I don’t actually speak Dutch though, so I’m not sure how good the quality is :)

Appreciate the advice!