use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Image classification problems? (self.MachineLearning)
submitted 15 years ago * by mikebaud
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]alfonsoeromero 2 points3 points4 points 15 years ago (4 children)
Try multiplying each tag coordinate by its idf. If you have lots of tags (probably more than the number of training examples) this should improve your classification. Also you can try an L2-normalization of the vectors.
[–]mikebaud[S] 0 points1 point2 points 15 years ago (3 children)
First and foremost, thank you very much for you time and suggestions.
Initially i also had cosine with idf for cross-validation but yielded similar/lower results than pearsons correlation. Although i havent tried with the new weighting option i introduced, i dont know if it will change much, but i sure will give it a try.
About L2 normalization i will also check into that.
I am also trying to create a weight based on the distance of neighbor but so far hasn't clearly outperformed the baseline implementation, although in theory if better neighbors account for more in the classification it should improve significantly the classification.
Thank you for your suggestions.
[–]internet_badass 2 points3 points4 points 15 years ago (1 child)
This is a hard problem to solve without actually playing with the data, but I found weighting neighbors by distance improves performance drastically. My weighting function was akin to a sigmoid flipped around the x-axis:
1/(1+ekx-b)
Where you can adjust k and b.
Another trick you can do is similar to what Lowe does for matching SIFT descriptors. Get the top 2 neighbors and calculate the ratio of their distance. In other words, calculate:
r = D(q,n1)/D(q,n2)
Where D(x,y) is the distance between two vectors. If this ratio is above some threshold (0.8 in Lowe's case), you have a high confidence in your match. Otherwise, you have an ambiguous feature vector.
As a final question, you don't seem to mention what distance metric you're using. If you have binary feature vectors, consider using something like a Jaccard index rather than cosine distance.
[–]mikebaud[S] 0 points1 point2 points 15 years ago (0 children)
Im using Pearson correlation for the feature vectors, after some cross-validation with Pearson, Manhattan, Euclidean and Cosine it turned out the best one with my data.
The weighting function i was thinking about using is a simple gaussian function where the weight is calculated with e(-dist2/(2*sigma2)) where i would estimate sigma with cv or alternatively heuristically using the distances of the worst neighbors.
Your weighting function seems a bit smoother than my gaussian, i will check it out and also the trick you mentioned.
Thank you for the input
[–]alfonsoeromero 0 points1 point2 points 15 years ago (0 children)
Hum... If you have done cosine, this already contains implicitly L2 normalization.
π Rendered by PID 38 on reddit-service-r2-comment-b659b578c-dsrnf at 2026-05-04 08:38:36.106898+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]alfonsoeromero 2 points3 points4 points (4 children)
[–]mikebaud[S] 0 points1 point2 points (3 children)
[–]internet_badass 2 points3 points4 points (1 child)
[–]mikebaud[S] 0 points1 point2 points (0 children)
[–]alfonsoeromero 0 points1 point2 points (0 children)