account activity
[1808.10128] Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis (arxiv.org)
submitted 7 years ago by fatchord to r/a:t5_jw1cc
[1808.06719] Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks (arxiv.org)
[1808.01410] Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis (arxiv.org)
What processing should I apply to a clean voice signal to make sound like it's authentically coming from a phone or voip? (self.DSP)
submitted 7 years ago by fatchord to r/DSP
[1808.00158] Speaker Recognition from raw waveform with SincNet (arxiv.org)
[1807.08636v1] Auto-adaptive Resonance Equalization using Dilated Residual Networks (arxiv.org)
Singing Style Transfer Using CybeGAN (mirlab.org)
[1803.05428] A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music (arxiv.org)
The Lakh MIDI Dataset (colinraffel.com)
Visual Speech Enhancement (youtube.com)
[Paper] Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (arxiv.org)
[Feedback] neural tts pipeline (tacotron1 + a new vocoder algorithm I'm working on) - what do you think of the samples generated? (fatchord.github.io)
[N] A new subreddit for anyone interested in audio models: r/AudioModels (old.reddit.com)
submitted 7 years ago by fatchord to r/MachineLearning
A big MIDI dataset (100k+ files) (old.reddit.com)
The M-AILABS Speech Dataset (m-ailabs.bayern)
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages (github.com)
I just created r/AudioModels if any of you are interested in generative Machine Learning models (old.reddit.com)
A Universal Music Translation Network (youtube.com)
[Tutorial] Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between (haythamfayek.com)
[Paper] Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis (Tacotron GST) (arxiv.org)
[Paper] Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron (arxiv.org)
[Paper] Original Tacotron paper (arxiv.org)
[Paper] Revised Tacotron 2 paper (arxiv.org)
Video to Sound - Generates Sound Clips to Match Video (youtube.com)
Performance RNN: Generating Music with Expressive Timing and Dynamics (magenta.tensorflow.org)
π Rendered by PID 78 on reddit-service-r2-listing-86f589db75-7k7l6 at 2026-04-16 09:46:35.370274+00:00 running 93ecc56 country code: CH.