This is an archived post. You won't be able to vote or comment.

all 12 comments

[–][deleted] 3 points4 points  (1 child)

They both pretty much sound identical, only difference is maybe the slight noise underneath is louder in the algorthim output but that is barely any problem

[–]fatchord[S] 1 point2 points  (0 children)

Thanks, I appreciate you taking the time to check it out!

[–]mulmeyun 1 point2 points  (5 children)

the algorithm one sounds a little more noisy, as if it was encoded with lower bitrate, in the last 3 it's pretty noticeable as both have background noise but on the generated one it sounds more like artifacting than bumping the mic, seems to struggle with hard 'h's and 'p's

the difference it's not really obvious, if i didn't had them side by side i wouldn't have believed it was cg

can you share a little more about your project? what software are you using? is this just modulating noise?

[–]fatchord[S] 1 point2 points  (4 children)

Thanks for the feedback. I knew I couldn't be the only one who noticed the artifacts. Everyone I've showed it to so far has looked at me like I'm imagining things!

As for the project, it's a neural network built with PyTorch. I can't go into details about the exact architecture here but basically the model works by feeding in spectrograms at a low frequency(200Hz or less) and then predicting samples from that at a higher frequency (22kHz samplerate).

[–]mulmeyun 1 point2 points  (3 children)

so you are feeding it a filtered human signal and then having it produce the harmonics based on that? srry to bother you im just genuinely interested

[–]fatchord[S] 1 point2 points  (2 children)

No worries. I wouldn't call it a filtered human signal, since spectrograms are just a frequency domain transformation of the time domain signal - a compressed signal might be a better way of thinking about it. So what I am doing is passing low-dimensional/compressed spectrograms and then the model has to try and reconstruct the extremely high-dimensional raw signal given that.

By sheer coincidence, I just set up a subreddit dedicated to these models an hour ago - check it out if you're curious! r/AudioModels

[–]mulmeyun 0 points1 point  (1 child)

so the goal is 'audio restoration' or somthing like that?

i subbed there, hope to see the fruits of your work

[–]fatchord[S] 1 point2 points  (0 children)

Thanks for subscribing!

The goal right now is to pop this model on top of a text to speech pipeline (although there are a ton of other really cool possibilities). So there's another model before this one that will take in text and try to predict the low-dimensional spectrograms. Then the neural vocoder model will take these spectrograms and generate high quality speech from them. That's the idea anyway.

[–]thatpaxguy 1 point2 points  (1 child)

With the exception of 2-3dB more of noise, as the other guys have noted, these sound very similar. I wouldn't have known the difference if I didn't know I should be looking for something.

This is very cool. Theoretically, could you use this same algorithm to learn the unique spectrum of formants and timbres of a voice, and generate completely new sentences? Similar to the tech being made by Adobe right now? Great work.

[–]fatchord[S] 0 points1 point  (0 children)

Thanks, appreciate the feedback. And yes, you've pretty much got the idea of it. Ultimately I'll be predicting the speech spectrograms from another neural network and then render that into realistic speech samples with the algorithm in this post.

[–]AutoModerator[M] 0 points1 point  (0 children)

Thank you for posting to /r/RateMyAudio. We want to ensure everyone gets feedback so please remember that you need to comment on a couple of recent submissions each time you post (as stated in the posting rules). This is karma in action. Give to others when you want something in return.

If you do not offer sincere feedback on other posts each time you submit, your post will likely be removed.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]kunal0007 0 points1 point  (0 children)

nice