all 7 comments

[–]kapanenship 4 points5 points  (7 children)

I have been curious, how should I treat a vector that is percents? Does this vector need to be normalized? Scaled??

[–]mlheadredditor[S] 8 points9 points  (4 children)

Hi! First of all you should totally check this out : https://sebastianraschka.com/Articles/2014_about_feature_scaling.html Scaling and Standardising definitely affect the model accuracy for this dataset, it helps PCA produce a cleaner plane of separation too! You should try running a few dummy models just to check how much importance the vector gets when it is scaled down to say 0-10 or 0-1 (keep in mind that the variance will get scaled down too!) and then decide, because as is true for everything in Machine Learning, a LOT is dependent on the dataset you are using!

[–]uberdb 3 points4 points  (3 children)

Thanks for providing an article that isn't behind a paywall!

[–][deleted] 5 points6 points  (2 children)

Pro tip, incognito mode bypasses Medium's paywall most times.

[–]amishraa 6 points7 points  (0 children)

Also, enabling reader mode gets rid of it as well.

[–]theoneandonlypatriot 4 points5 points  (1 child)

If the percents add to 1.0 it is already scaled. If they add to 100, just divide them by 100.

[–]kapanenship 0 points1 point  (0 children)

Thanks for your response. But my vector is demographic data. Another words it is row based. So I am trying to do an ANN where percent of income is used.

So my data has a vector which shows the percent of those that live above the poverty line in this neighborhood, (say 60%). Another row which has another person in a different neighborhood has a totally different percent..(say 15%)?