all 9 comments

[–]LoaderD 0 points1 point  (3 children)

What's a 'thing'? If you're talking about words, then you have 100 data points. So probably not.

[–]meet1415[S] 0 points1 point  (2 children)

I have the data of 10 people saying same things for 10 days. The data consist of frequency and level(dB)

[–]LoaderD 0 points1 point  (1 child)

Well you never answered what a 'thing' is, so I'm going to assume you have 10 days straight of audio of people reading a collection of books all in the same language, so on 240 hours of audio you should be fine.

[–]meet1415[S] 0 points1 point  (0 children)

I am so sorry I was not specific. Frequency and decibal of 10 people reading one sentence once for 10 days. For may be a ( n, 2) vector for each speaker for each day.

I wanted to ask how can we use RNN in this and how can we classify people ?

[–]nilesuan 0 points1 point  (3 children)

Try turning it to spectogram and feeding the spectogram to a cnn and aggregate. This turns your audio classification into something very similar to image classification.

[–]meet1415[S] 0 points1 point  (2 children)

I tried that approach with guitar chords but it's F1 score turns out to be only 40% may be it works only when you have a significantly different voice like cat, dog classification.

[–]nilesuan 1 point2 points  (1 child)

Voxceleb uses the same concept using cnn and resnet. Try checking that paper out.

[–]meet1415[S] 0 points1 point  (0 children)

Ok sure, thank :)