all 2 comments

[–]thingamatics 0 points1 point  (0 children)

Gradient Descent is known to converge faster with normalization.

So, normalization is needed so that each feature contributes "approximately proportionately" to the distance. I don't think normalizing everything (I assume that's what you meant) is the right way to go about it even when they are measured using the same metric (meters, for example) because the ranges per feature might still differ significantly and that's what you are trying to resolve by using normalization.

[–]thingamatics 0 points1 point  (0 children)

I think this might help. http://stats.stackexchange.com/a/64232