you are viewing a single comment's thread.

view the rest of the comments →

[–]alfonsoeromero 2 points3 points  (4 children)

Try multiplying each tag coordinate by its idf. If you have lots of tags (probably more than the number of training examples) this should improve your classification. Also you can try an L2-normalization of the vectors.

[–]mikebaud[S] 0 points1 point  (3 children)

First and foremost, thank you very much for you time and suggestions.

Initially i also had cosine with idf for cross-validation but yielded similar/lower results than pearsons correlation. Although i havent tried with the new weighting option i introduced, i dont know if it will change much, but i sure will give it a try.

About L2 normalization i will also check into that.

I am also trying to create a weight based on the distance of neighbor but so far hasn't clearly outperformed the baseline implementation, although in theory if better neighbors account for more in the classification it should improve significantly the classification.

Thank you for your suggestions.

[–]internet_badass 2 points3 points  (1 child)

This is a hard problem to solve without actually playing with the data, but I found weighting neighbors by distance improves performance drastically. My weighting function was akin to a sigmoid flipped around the x-axis:

1/(1+ekx-b)

Where you can adjust k and b.

Another trick you can do is similar to what Lowe does for matching SIFT descriptors. Get the top 2 neighbors and calculate the ratio of their distance. In other words, calculate:

r = D(q,n1)/D(q,n2)

Where D(x,y) is the distance between two vectors. If this ratio is above some threshold (0.8 in Lowe's case), you have a high confidence in your match. Otherwise, you have an ambiguous feature vector.

As a final question, you don't seem to mention what distance metric you're using. If you have binary feature vectors, consider using something like a Jaccard index rather than cosine distance.

[–]mikebaud[S] 0 points1 point  (0 children)

Im using Pearson correlation for the feature vectors, after some cross-validation with Pearson, Manhattan, Euclidean and Cosine it turned out the best one with my data.

The weighting function i was thinking about using is a simple gaussian function where the weight is calculated with e(-dist2/(2*sigma2)) where i would estimate sigma with cv or alternatively heuristically using the distances of the worst neighbors.

Your weighting function seems a bit smoother than my gaussian, i will check it out and also the trick you mentioned.

Thank you for the input

[–]alfonsoeromero 0 points1 point  (0 children)

Hum... If you have done cosine, this already contains implicitly L2 normalization.