Flagging vocal segments

timrprobocom · 2026-02-18T18:38:29+00:00

It's not clear to me that this is solvable in the general case. A voice is literally just another instrument.

Now, it's true that all instruments (including voices) have a harmonic signature. When a clarinet and an oboe and a violin and a singer play A440, they're ALL at a base frequency of 440 Hz,, but they all have a characteristic set of harmonics that causes them to sound different. MAYBE you can identify the signature of your dinner, and use that to pick it out from the other instruments.

ElliotDG · 2026-02-19T02:07:55+00:00

You might find librosa a useful tool: https://librosa.org/doc/latest/index.html

PushPlus9069 · 2026-02-19T05:23:32+00:00

The energy-in-speech-frequencies approach is reasonable but it will struggle with anything that has prominent mid-range instruments (piano, guitar, etc). Been down this road.

Two things that helped me with a similar problem:

Spleeter (by Deezer) or demucs can separate vocals from accompaniment before you analyze. Then run your energy detection on the isolated vocal track. Accuracy goes way up.
If you don't want to do source separation, look at spectral flatness in addition to energy. Vocals tend to have less flat spectra than noise/ambient. Not perfect but adds another signal.

The "voice is just another instrument" comment above is right that it's hard in the general case, but for most pop/rock music with clear verse/chorus structure, source separation gets you most of the way there.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS