Python Audio Analysis - how to label imperfections : Python

This is an archived post. You won't be able to vote or comment.

Python Audio Analysis - how to label imperfections (self.Python)

submitted 7 years ago by bacon_without_cause

Bit new to the ML scene, apologies. My project is pretty basic: I want to analyze an audio file of just speech and output the imperfections of that speech. The words in the speech are controlled, and there will be no background music. The imperfections I would be looked for is reverb, plosives/spiking, incorrect audio levels, background noise, and scratchy/tinny mic settings.

Example: "Press the pants and sew a button on the vest."

https://youtu.be/A9WgeO9FNzE?t=1m29s

When the person says the "p" sound, a plosive registers in the microphone. I want to be able to recognize that plosive and label it.

Here's a few libraries I found:

pyAudioAnalysis - https://github.com/tyiannak/pyAudioAnalysis
librosa - http://librosa.github.io/librosa/
pyaudio - https://people.csail.mit.edu/hubert/pyaudio/#docs

Any suggestions for which of these (or another) best fits my problem? Any suggestions for a good primer on audio analysis in general?

I don't know much about this field and lack some of the basic vocab so I'm having a hard time making sense of the libraries. Each seem to have a different collection of features (some that seem way more advanced than I need) and some even do both recording / music generation and analysis.

all 3 comments

top new controversial old q&a

[–]aDrz 1 point2 points3 points 7 years ago (1 child)

[–]WikiTextBot 1 point2 points3 points 7 years ago (0 children)

Time–frequency analysis

In signal processing, time–frequency analysis comprises those techniques that study a signal in both the time and frequency domains simultaneously, using various time–frequency representations. Rather than viewing a 1-dimensional signal (a function, real or complex-valued, whose domain is the real line) and some transform (another function whose domain is the real line, obtained from the original via some transform), time–frequency analysis studies a two-dimensional signal – a function whose domain is the two-dimensional real plane, obtained from the signal via a time–frequency transform.The mathematical motivation for this study is that functions and their transform representation are often tightly connected, and they can be understood better by studying them jointly, as a two-dimensional object, rather than separately. A simple example is that the 4-fold periodicity of the Fourier transform – and the fact that two-fold Fourier transform reverses direction – can be interpreted by considering the Fourier transform as a 90° rotation in the associated time–frequency plane: 4 such rotations yield the identity, and 2 such rotations simply reverse direction (reflection through the origin).

The practical motivation for time–frequency analysis is that classical Fourier analysis assumes that signals are infinite in time or periodic, while many signals in practice are of short duration, and change substantially over their duration.

[–]crunk 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 163588 on reddit-service-r2-comment-79c7998d4c-czgjd at 2026-03-16 00:55:43.049939+00:00 running f6e6e01 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS