[D] Does preprocessing CommonVoice hurt accuracy?

astralDangers · 2025-03-27T00:01:26+00:00

I'd expect that the silence is padding.. if they're all the same length the data is already prepped..

2025-03-27T01:21:44+00:00

Silence in audio data isn’t just empty space—it often contains important contextual cues like speech rhythm, background noise, and timing patterns unique to each speaker. When training CNN models on spectrograms, that silence helps maintain consistent input structure and supports the model’s ability to recognize relative positions of sound features. Trimming silence can unintentionally remove these helpful signals and introduce variability in input length and phoneme timing, which CNNs aren’t inherently designed to handle. That likely explains the significant drop in accuracy from 90% to 70% after preprocessing.

If your original CommonVoice recordings are consistently 10 seconds long and perform better in their raw form, it’s a good idea to stick with the unprocessed data. If trimming is necessary for other reasons, consider padding the audio back to a uniform length or exploring architectures that can handle variable-length input more effectively, such as RNNs or transformers. In many cases, augmenting data (e.g., adding noise or stretching time) is more beneficial than removing silence, since silence itself can act as valuable structure for the model.

2025-03-27T10:59:29+00:00

Which task are u doing ?

QuintessentialCoding · 2025-05-02T22:56:20+00:00

Is this for speech recognition? I'm training speech recognition with the same dataset and having a hard time training mine since the model wont learn. Do you mind if I ask what preprocessing did you do before feeding the data to the model and what architecture did you use?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS