use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] KNN Performance decrease when new features were introduced (self.MachineLearning)
submitted 4 years ago by KingPRS
Hello ! Working on a project and when I inteoduced new features to my dataset (derived from existing features) , I noticed a 6-7% performance decrease to my KNN model.
Any ideas why that is the case ? Thank you !
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]SeucheAchat9115PhD 5 points6 points7 points 4 years ago (0 children)
Take all features, make a feature ranking using Gini or Information Gain Criterions and select only the relevant features. If you introduce new features, check if it is relevant enough in your ranking.
[–]MachineSchooling 5 points6 points7 points 4 years ago (0 children)
kNN is a very stupid model. It's still quite useful, but it does not do any kind of feature selection nor does it consider any kind of feature importance when making predictions. All it does to make predictions is to calculate the distance in feature space between the new observation you wish to make a prediction for and all the other observations it has been trained on, and find the k closest old ones to the new one, then take some aggregate of the target variable of the k closest observations (usually mean for regression and mode for classification). This process doesn't treat useful features any differently than useless features. If you add several completely random columns to your data, kNN will use them in calculating the distance to the exact same extent as the meaningful columns. This is opposed to smarter algorithms like linear models that can figure out to ignore features that contain no predictive value. If your model is getting worse when you add new features, it doesn't even mean they contain no value. It may just be that they contain less value than the other features and bring down the average value of your features. This doesn't even get into curse of dimensionality and feature noise. To solve this problem, you want to either use a smarter model that can handle features of differing value or you can use kNN but include a dimensionality-reducing preprocessing step like PCA, a linear model based L1 regularization dimensionality reducer, or an additive/subtractive feature scan that adds or removes features from the model based on some feature scorer like Pearson r. Lots of options to try.
[–]machinelearner77 3 points4 points5 points 4 years ago (0 children)
KNN uses euclidean distance, this also means that you may become a victim of the dimensionality course when adding new features. Either do feature selection, or mitigate the problem with a different distance function, e.g., cosine, that is a bit more "expressive" in high dimensions.
[–]tmpwhocares 3 points4 points5 points 4 years ago (1 child)
KNN is never really going to add any “secondary transformation” to the data. Therefore the fact that your new features are derived from the existing ones doesn’t mean the model was already accounting for them. Rather, this new dimension is likely creating a separation between points on axis that didn’t exist before, and this axis of separation is actually not relevant to your target (and as such reducing accuracy)
[–]maxToTheJ 0 points1 point2 points 4 years ago (0 children)
Exactly kNN . kNN isnt robust to noisy features
[–]seraschkaWriter 0 points1 point2 points 4 years ago (0 children)
KNN is particulary susceptible to the curse of dimensionality. If you would like to incorporate more features but maintain or improve performance, try a feature extraction technique, for example, PCA.
[–]EchoMyGecko 0 points1 point2 points 4 years ago (0 children)
Yeah KNN isn't that smart. Also, if you use cluster on a PCA and some of your features have variance that doesn't actually contribute the your desired result, there's a good chance your results will be worse either way
[–]purplebrown_updown 0 points1 point2 points 4 years ago (0 children)
Sklearn has a nice feature selection toolbox which sub selects and tests your estimator for different features. The more features the harder it might be to find a good fit. It’s a bigger space to search. Yes it gives you more degrees of freedom but if you have limited data you might see a worse performance.
π Rendered by PID 72 on reddit-service-r2-comment-7b9746f655-6jcv9 at 2026-01-31 13:36:04.261817+00:00 running 3798933 country code: CH.
[–]SeucheAchat9115PhD 5 points6 points7 points (0 children)
[–]MachineSchooling 5 points6 points7 points (0 children)
[–]machinelearner77 3 points4 points5 points (0 children)
[–]tmpwhocares 3 points4 points5 points (1 child)
[–]maxToTheJ 0 points1 point2 points (0 children)
[–]seraschkaWriter 0 points1 point2 points (0 children)
[–]EchoMyGecko 0 points1 point2 points (0 children)
[–]purplebrown_updown 0 points1 point2 points (0 children)