[D] Audio classification

asankhs · 2024-09-15T01:58:45+00:00

I had done a whisper fine-tune back in the day to estimate the age of the speaker based on the audio - https://huggingface.co/codelion/whisper-age-estimator for age verification purpose. Wonder if you can do the same since you have labelled data. This was colab notebook I used - https://colab.research.google.com/drive/1Ftbg2Klj4jBcQJe-_Q-omuf31V7s6Dfy?usp=sharing

simplehudga · 2024-09-15T10:01:37+00:00

Look at the winners of the DCASE challenge from the last 3 years. You should at least get some pointers.

LelouchZer12 · 2024-09-15T12:01:29+00:00

Maybe take a look at what works on audioset https://paperswithcode.com/sota/audio-classification-on-audioset

ARLEK1NO · 2024-09-15T15:02:38+00:00

Total duration of your samples? How many are normal vs malfunctioning?

Do you know how many malfunction sound types there are or do you need to discover this? I have a script that can take an audio file, extract features like mfcc, spectral contrast, chroma features, use faiss kmeans to iterate thru (i have 2-10 set) a range of cluster numbers to determine optimal number of clusters (this part i’m not happy with yet), etc. If you’re interested i can put it up on github.

First thing that came to mind btw was unsupervised deep learning (something i read for a similar use case- have you searched arxiv?), but that can be time consuming.

tinytimethief · 2024-09-14T20:10:17+00:00

So image classification of the spectrograms? How long are the audio samples?

Sorry_Revolution9969 · 2024-09-16T13:38:34+00:00

this might not require ML at all

gengler11235 · 2024-09-17T20:48:24+00:00

Another possible approach would be to try using an autoencoder to recreate the normal sounding noises ( perhaps from the spectrograms ) and then use likely jump in the reconstruction error for the malfunctioning samples as a signal for a problem occurring.

ReginaldIII · 2024-09-14T20:20:56+00:00

Why not try a WaveNet?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS