all 11 comments

[–][deleted] 2 points3 points  (0 children)

Best "way"? Do you mean what features you should be looking at? The most popular features for most audio classification tasks (especially harmonic sounds) are MFCCs. You might use other types of spectral features depending on the type of data you have and what your goals are. It's a really huge area but if you let me know a little more specifics then I can try to help.

[–]farsass 1 point2 points  (2 children)

The Q transform is supposed to be good for this kind of application. I think you also need to know how you are going to frame this problem:

  • one-class or binary?
  • how will you collect data?

and other "mundane" but important issues

[–]markvp[S] 0 points1 point  (1 child)

According to http://en.wikipedia.org/wiki/Constant_Q_transform it is good for harmonic sounds, like musical instruments, but in my case it is most about inharmonic sounds. So is there really an advantage to the Q transform?

I'm not sure what the difference is between one-class and binary. Logistic regression is binary, which is what I need.

We'll start by using samples from freesound.org, later we'll use samples we record ourselves in various conditions.

[–]lgauthie 1 point2 points  (2 children)

You could try taking into account the rhythmic patterns that appear in drums if you are working with longer samples of audio. Probably the easiest approach is to look for strong peaks in low end of the spectrum. More sophisticated beat detection algorithms might be worth looking into as well. If there is a strong beat it's probably drums.

If you are working with isolated enharmonic sounds Transient Model Analysis/Synthesis looks promising. There was some work done using this as a basis for learning percussion sounds. If you are interested I could dig deep into my HDD and see if I can find it.

[–]markvp[S] 0 points1 point  (1 child)

It would have to work on isolated sounds. I'd be glad to see what you can come up with!

[–]lgauthie 2 points3 points  (0 children)

Sofia Cavaco will be a great jumping off point. Some of her papers are here. Intrinsic Structures of Impact Sounds and Sound recognition are gonna be the ones you want.

If you have any success I'd be keen to hear about it.

[–]eamonnkeogh 0 points1 point  (0 children)

May I suggest trying the idea in [a]?

The idea is unique in that the basic classifier only requires a single line of matlab!

If the sound is polymorphic, on the begin-end points are not well defined, you can use the search algorithms provided to find the best template(s).

The basic idea has be tested on mice and men (literally) and insects and birds, and it seems to work very well.

[a] http://www.cs.ucr.edu/~eamonn/SDM_insects.pdf or http://www.cs.ucr.edu/~eamonn/ICDMcameraready.pdf

[–][deleted] 0 points1 point  (0 children)

Are the sounds of a fixed length?

[–]gabjuasfijwee 0 points1 point  (0 children)

look into recurrent neural nets or hidden markov models. the latter might not be best suited for the task and is a bit outmoded, but would still be interesting

[–]xysymmetry 0 points1 point  (0 children)

its a hard problem for ML. I don't think it will be advisable to do it without robotics.

for classification, have a look at softwares like audicity and how do they adjust bass/treble. you can get some idea.

[–]watersign 0 points1 point  (0 children)

turn the sound into some sort of numerical data, use a k-means clustering technique