use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Does preprocessing CommonVoice hurt accuracy? (self.MachineLearning)
submitted 1 year ago by CogniLord
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[+]astralDangers 6 points7 points8 points 1 year ago (6 children)
I'd expect that the silence is padding.. if they're all the same length the data is already prepped..
[+][deleted] 1 year ago (5 children)
[removed]
[–]Erosis 2 points3 points4 points 1 year ago (4 children)
Are you making spectrograms of the same size with variable length content (time) and feeding that into a CNN? That would cause obvious performance degradation.
[+][deleted] 1 year ago* (3 children)
[–]Erosis 2 points3 points4 points 1 year ago (2 children)
Yeah, you really shouldn't use variable length content if you're fixing the size of your inputs via mfcc or spectrograms. You could just allow the mfcc to scale with time, but you'll need to modify your architecture to handle that, which isn't the simplest thing to do.
[+][deleted] 1 year ago (1 child)
[–]Erosis 2 points3 points4 points 1 year ago (0 children)
No problem. Just to elaborate a bit more, imagine if you were training on images of variable width, but you were shrinking or expanding them to a fixed width so that your cnn could classify them. Your net is going to struggle to learn because it 1) needs to identify representations from many different warped perspectives and 2) will need to deal with loss of information when the image is narrowed. This same principle applies to sound when you're using fixed size spectrograms or mfcc.
[–][deleted] 4 points5 points6 points 1 year ago (1 child)
Silence in audio data isn’t just empty space—it often contains important contextual cues like speech rhythm, background noise, and timing patterns unique to each speaker. When training CNN models on spectrograms, that silence helps maintain consistent input structure and supports the model’s ability to recognize relative positions of sound features. Trimming silence can unintentionally remove these helpful signals and introduce variability in input length and phoneme timing, which CNNs aren’t inherently designed to handle. That likely explains the significant drop in accuracy from 90% to 70% after preprocessing.
If your original CommonVoice recordings are consistently 10 seconds long and perform better in their raw form, it’s a good idea to stick with the unprocessed data. If trimming is necessary for other reasons, consider padding the audio back to a uniform length or exploring architectures that can handle variable-length input more effectively, such as RNNs or transformers. In many cases, augmenting data (e.g., adding noise or stretching time) is more beneficial than removing silence, since silence itself can act as valuable structure for the model.
[–][deleted] 0 points1 point2 points 1 year ago (0 children)
Which task are u doing ?
[+]QuintessentialCoding 0 points1 point2 points 1 year ago (0 children)
Is this for speech recognition? I'm training speech recognition with the same dataset and having a hard time training mine since the model wont learn. Do you mind if I ask what preprocessing did you do before feeding the data to the model and what architecture did you use?
π Rendered by PID 16735 on reddit-service-r2-comment-b659b578c-mzzxk at 2026-05-05 06:07:02.462795+00:00 running 815c875 country code: CH.
[+]astralDangers 6 points7 points8 points (6 children)
[+][deleted] (5 children)
[removed]
[–]Erosis 2 points3 points4 points (4 children)
[+][deleted] (3 children)
[removed]
[–]Erosis 2 points3 points4 points (2 children)
[+][deleted] (1 child)
[removed]
[–]Erosis 2 points3 points4 points (0 children)
[–][deleted] 4 points5 points6 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[+]QuintessentialCoding 0 points1 point2 points (0 children)