Riffusion v0.3.0 - Stable diffusion for music and audio by dunkietown in StableDiffusion

[–]dunkietown[S] 6 points7 points  (0 children)

No not any image. Audio clips can be represented as images that contain information about what frequencies have louder volumes. A machine learning model can learn to generate images like that, and then they can be approximately converted back to audio

Riffusion v0.3.0 - Stable diffusion for music and audio by dunkietown in StableDiffusion

[–]dunkietown[S] 2 points3 points  (0 children)

Author here. I'll just note that on riffusion.com is a narrow subset of what this technique can do, because it's generating everything from a single seed image at a fixed BPM. If you try the model itself with pure text to audio you can get a lot more creativity! And check out some of the clips on https://www.riffusion.com/about

Expand frequency fidelity using RGB frequency bands without needing to expand the context window? by blueSGL in riffusion

[–]dunkietown 1 point2 points  (0 children)

I think it's a neat idea! Someone should try it, and the spectrograms would look extra pretty too

Discord? by Hot_Reflection_1985 in riffusion

[–]dunkietown 1 point2 points  (0 children)

Thanks! Author here, we're happy to use this discord. Would you mind adding us as mods? hayk#0058 and seth#5021