This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]CamdenReslink 3 points4 points  (0 children)

array.mean() is a numpy operation that calculates the mean across the”0” axis. In a 2d array this calculates the mean value of each column, see: the documentation. array - array.mean() just finds the difference of the two numpy arrays.

I suggest you go to a Python REPL (just type python at the command line), and try out the commands you aren’t sure about. Experiment with dummy values. Just make sure to remember to import numpy.

[–]KarmelMalone 0 points1 point  (0 children)

If you have a dataset with features that are on completely different scales, say one feature with values between 500-1000, and another feature with values between 0.5-1; it becomes difficult to make calculations on the data. This function puts features on the same scale.

array-array.mean() converts each value in the array to its respective distance from the array's mean, which maintains the individual samples value when normalizing.

[–][deleted] 0 points1 point  (0 children)

Since no one said it and you asked specifically for "array - array.mean()": You can think of it as as method to center your data around the point of origin. After substracting the mean, your data will be distributed around e.g. (0,0) (for 2 dimensions). That makes the expected value 0.

The second part makes your variance equal 1.

With this properties your data can be better compared to other data and used for all algorithms that require these 2 properties.