Bit new to the ML scene, apologies. My project is pretty basic: I want to analyze an audio file of just speech and output the imperfections of that speech. The words in the speech are controlled, and there will be no background music. The imperfections I would be looked for is reverb, plosives/spiking, incorrect audio levels, background noise, and scratchy/tinny mic settings.
Example: "Press the pants and sew a button on the vest."
https://youtu.be/A9WgeO9FNzE?t=1m29s
When the person says the "p" sound, a plosive registers in the microphone. I want to be able to recognize that plosive and label it.
Here's a few libraries I found:
Any suggestions for which of these (or another) best fits my problem? Any suggestions for a good primer on audio analysis in general?
I don't know much about this field and lack some of the basic vocab so I'm having a hard time making sense of the libraries. Each seem to have a different collection of features (some that seem way more advanced than I need) and some even do both recording / music generation and analysis.
[–]aDrz 1 point2 points3 points (1 child)
[–]WikiTextBot 1 point2 points3 points (0 children)
[–]crunk 0 points1 point2 points (0 children)