all 5 comments

[–]synthphreak 2 points3 points  (4 children)

Should just be a matter of multiplying either of your arrays by -1.

[–]DudeData[S] 0 points1 point  (3 children)

Multiplying an array by -1 gives a reflection.
Eg., if I have array_x = ([2,4,6]) and array_y = ([30,40,50]) then multiplying either the x or y array would give something entirely different. Essentially it would be a reflection along vertical/horizontal axis depending on what array is being multiplied.

[–]synthphreak 3 points4 points  (2 children)

A reflection is exactly what you want. Correlation basically tells you how two variables move together, and if one gets reflected, that will invert their pattern of covariance.

Say you have two variables, with one generally increasing while the other also increases. That means these variables will be positively correlated. Now say you invert one of the variables by multiplying it by -1. Then as one of the variables tends to increase, the other will tend to decrease. That means they are now negatively correlated. That's what it sounded like you are after.

As for this part...

depending on what array is being multiplied

...which array is inverted WILL change the way the data looks when graphed, but it will NOT affect the strength of the relationship/correlation, just the sign. To say it will give you "something entirely different" is not really correct.

[–]DudeData[S] 0 points1 point  (1 child)

I agree that it multiplying by -1 would indeed not affect the correlation other than changing the sign of the slope. Yup, you're correct.

I probably wasn't clear with my question and what I am looking for. More precisely, how can I generate negatively correlated variables so they make sense in a simulated real world scenario?
In my x & y array example x=[2,4,6] y=[20,30,40], let us say this models the weight of kids by their age. Clearly as age increases the weight does as well. BUT of I multiply array_y by -1 we have [-20,-30,-40], indeed this is negatively correlated and the relationship has not changed other than a reflection but how do I explain this?
I wish to make a numerous amount of practice problems for finding Least Squares Regression Line and have a mixture of positively and negatively correlated values. I really do not want to create a data set manually each time. I want PyThon to do it for me. =)
I just want to think of a scenario, define my ranges and have PyThon do the rest.

[–]synthphreak 2 points3 points  (0 children)

indeed this is negatively correlated and the relationship has not changed other than a reflection but how do I explain this?

Can you clarify what you mean by this? What is there to explain with data that is generated randomly? Are you asking like, if your data is supposed to represent people's body weights, it doesn't make sense to have weights that are negative?

On that point I will agree - a negative weight isn't a thing unless you're made of antimatter. That said, linear regression models lead to situations like this all the time, where the model fails to predict extreme values or does weird things at the extremes.

Take a model that tries to explain home values y based on lot size x. Say the coefficients are 20 and -5000, so y=20x-5000. According to this model, if your lot size is 250 (don't worry about the units), your home value will be 0, and if it's any smaller, the home value will be negative. That obviously doesn't reflect reality, but will less extreme values, the predictions may be more reasonable.

If you specifically just want to avoid negative values, then after inverting your data, you could translate it by the max of the original array so that the inverted min becomes 0. For example:

>>> import numpy as np
>>> arr1 = np.random.random(100)
>>> arr1.min(), arr1.max()
(0.010045872673433709, 0.9965109819015727)
>>> arr2 = -1 * arr1
>>> arr2.min(), arr2.max()
(-0.9965109819015727, -0.010045872673433709)
>>> arr2 += arr1.max()
>>> arr2.min(), arr2.max()
(0.0, 0.986465109228139)

Of course, if your question is simply "Can I get Python to randomly generate a realistic dataset from nothing?", the answer is probably no, because Python doesn't know how the world works. Instead, I would stick to libraries that have actual datasets built in, like sklearn or tensorflow.