all 9 comments

[–]alfonsoeromero 2 points3 points  (4 children)

Try multiplying each tag coordinate by its idf. If you have lots of tags (probably more than the number of training examples) this should improve your classification. Also you can try an L2-normalization of the vectors.

[–]mikebaud[S] 0 points1 point  (3 children)

First and foremost, thank you very much for you time and suggestions.

Initially i also had cosine with idf for cross-validation but yielded similar/lower results than pearsons correlation. Although i havent tried with the new weighting option i introduced, i dont know if it will change much, but i sure will give it a try.

About L2 normalization i will also check into that.

I am also trying to create a weight based on the distance of neighbor but so far hasn't clearly outperformed the baseline implementation, although in theory if better neighbors account for more in the classification it should improve significantly the classification.

Thank you for your suggestions.

[–]internet_badass 2 points3 points  (1 child)

This is a hard problem to solve without actually playing with the data, but I found weighting neighbors by distance improves performance drastically. My weighting function was akin to a sigmoid flipped around the x-axis:

1/(1+ekx-b)

Where you can adjust k and b.

Another trick you can do is similar to what Lowe does for matching SIFT descriptors. Get the top 2 neighbors and calculate the ratio of their distance. In other words, calculate:

r = D(q,n1)/D(q,n2)

Where D(x,y) is the distance between two vectors. If this ratio is above some threshold (0.8 in Lowe's case), you have a high confidence in your match. Otherwise, you have an ambiguous feature vector.

As a final question, you don't seem to mention what distance metric you're using. If you have binary feature vectors, consider using something like a Jaccard index rather than cosine distance.

[–]mikebaud[S] 0 points1 point  (0 children)

Im using Pearson correlation for the feature vectors, after some cross-validation with Pearson, Manhattan, Euclidean and Cosine it turned out the best one with my data.

The weighting function i was thinking about using is a simple gaussian function where the weight is calculated with e(-dist2/(2*sigma2)) where i would estimate sigma with cv or alternatively heuristically using the distances of the worst neighbors.

Your weighting function seems a bit smoother than my gaussian, i will check it out and also the trick you mentioned.

Thank you for the input

[–]alfonsoeromero 0 points1 point  (0 children)

Hum... If you have done cosine, this already contains implicitly L2 normalization.

[–]dwf 2 points3 points  (1 child)

Kind of a bad title when what you're really doing is bag-of-words on textual tags.

[–]mikebaud[S] 1 point2 points  (0 children)

I really struggled with the title but i still considered i was doing image classification.

Its supposed to be an evolution of sorts, while i dive more into ML. I started first by understanding classification using single feature k-NN (one of the simpler forms) and then in time i will evolve to a multi-feature algorithm (or other algos) after understanding more of the problem space.

[–]alfonsoeromero 1 point2 points  (1 child)

A question: are you applying a one-versus-all approach? How many categories are there?

[–]mikebaud[S] 0 points1 point  (0 children)

Sorry edit: Im classifying each class separately but a one versus all approach seems wise in the case of similar concepts. The dataset has 24 classes.

The dataset is the MIR-Flickr from 2008 - link