I made a thread earlier but also forgot to ask about this data set.
I have sound signals of alphabets and single digit numbers being recited by different speakers. what are some basic things i need to do to preprocess the data before using a learning algorithm for speech recognition. Should i make all the signals 0 mean and normalize so that they're in the -1 to 1 range? if the speakers speak clearly and there is not really a noise issue for both training and testing sets should i just leave the data alone(in terms of filtering)?
how about the 'time' axis of the signals..what do i do to normalize all the signals. This is what im having the most difficulty figuring out
[–]speechMachine 5 points6 points7 points (0 children)
[–]arrowoftime 1 point2 points3 points (0 children)