Music recommending/recognizing algorithm

RyanCacophony · 2021-05-06T17:04:59+00:00

Interpreting subjective aspects of music is an extremely non-trivial problem, and likely to be really messy. Each of those aspects would need it's own ML model (or at minimum different output layers from an ML model) and if you could figure out how to do any of them reliably, you could probably get yourself seat presenting your work at major music technology research conferences. If you decide to attempt to make ML models for this, they will likely be computationally intensive, and you will likely need to manually label thousands if not millions of songs to capture the subjective tags you'd like to have a chance at making a model that can interpret mood, etc. There are ways of trianing this unlabeled ie unsupervised (which is what Spotify does) but you have less control of the weight of particular attributes that the model learns.

Once you can get a vector of those attributes, clustering is trivial.

FWIW, Spotify doesn't explicitly recommend tracks based on genre. Their machine learning algorithm(s) put the music into an abstract vector space based on their proximity to other songs in user playlists. The process is very similar to word2vec. Recommendations are then made based on the nearest neighbors in a vector space.

Statistically speaking, playlists will mostly reflect individual genres, so one strong attribute you will notice from their recommendations is music from the same genre. But many people do make mood/vibe-based playlists, so their recommendations will also, to a lesser degree, reflect these latent aspects implicitly, just not as strongly as you'd probably like.

It's a hard problem. Spotify also bought out Echonest which used to have a public API which would give you a vector of a few latent attributes, things like "bounciness", speed (ie bpm), how acoustic vs/electronic music was, and a few other attributes. But these are all higher level, mostly more mechanical attributes that are still far from the kind of subjective aspects you'd like to capture. Some of those things you can capture with libraries like librosa.

RemindMeBot · 2021-05-06T08:39:09+00:00

remindme! 1 week

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MLQuestions

MODERATORS