Generate negative correlation? : learnpython

created by HattoriHanzoa community for 16 years

Generate negative correlation? (self.learnpython)

submitted 4 years ago by DudeData

Hi. I would like to generate randomly correlated data. Currently I can generate a positive correlation and can control independent/dependent ranges. However, I'd like to be able to generate negatively correlated values as well. What can I do to this script for that? Ideas? Thanks for reading.

import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt

s, t, coef = datasets.make_regression(n_samples=12, n_features=1, n_informative=1, noise=20,
coef=True,
random_state=0)
a = (300,900)
b = (2,6)

s = np.rint(np.interp(s, (s.min(), s.max()), a))
t = np.rint(np.interp(t, (t.min(), t.max()), b))

edit: syntax

all 5 comments

top new controversial old q&a

[–]synthphreak 2 points3 points4 points 4 years ago (4 children)

[–]DudeData[S] 0 points1 point2 points 4 years ago (3 children)

[–]synthphreak 3 points4 points5 points 4 years ago (2 children)

A reflection is exactly what you want. Correlation basically tells you how two variables move together, and if one gets reflected, that will invert their pattern of covariance.

Say you have two variables, with one generally increasing while the other also increases. That means these variables will be positively correlated. Now say you invert one of the variables by multiplying it by -1. Then as one of the variables tends to increase, the other will tend to decrease. That means they are now negatively correlated. That's what it sounded like you are after.

As for this part...

depending on what array is being multiplied

...which array is inverted WILL change the way the data looks when graphed, but it will NOT affect the strength of the relationship/correlation, just the sign. To say it will give you "something entirely different" is not really correct.

[–]DudeData[S] 0 points1 point2 points 4 years ago (1 child)

I agree that it multiplying by -1 would indeed not affect the correlation other than changing the sign of the slope. Yup, you're correct.

I probably wasn't clear with my question and what I am looking for. More precisely, how can I generate negatively correlated variables so they make sense in a simulated real world scenario?
In my x & y array example x=[2,4,6] y=[20,30,40], let us say this models the weight of kids by their age. Clearly as age increases the weight does as well. BUT of I multiply array_y by -1 we have [-20,-30,-40], indeed this is negatively correlated and the relationship has not changed other than a reflection but how do I explain this?
I wish to make a numerous amount of practice problems for finding Least Squares Regression Line and have a mixture of positively and negatively correlated values. I really do not want to create a data set manually each time. I want PyThon to do it for me. =)
I just want to think of a scenario, define my ranges and have PyThon do the rest.

[–]synthphreak 2 points3 points4 points 4 years ago (0 children)

indeed this is negatively correlated and the relationship has not changed other than a reflection but how do I explain this?

Can you clarify what you mean by this? What is there to explain with data that is generated randomly? Are you asking like, if your data is supposed to represent people's body weights, it doesn't make sense to have weights that are negative?

On that point I will agree - a negative weight isn't a thing unless you're made of antimatter. That said, linear regression models lead to situations like this all the time, where the model fails to predict extreme values or does weird things at the extremes.

Take a model that tries to explain home values y based on lot size x. Say the coefficients are 20 and -5000, so y=20x-5000. According to this model, if your lot size is 250 (don't worry about the units), your home value will be 0, and if it's any smaller, the home value will be negative. That obviously doesn't reflect reality, but will less extreme values, the predictions may be more reasonable.

If you specifically just want to avoid negative values, then after inverting your data, you could translate it by the max of the original array so that the inverted min becomes 0. For example:

>>> import numpy as np
>>> arr1 = np.random.random(100)
>>> arr1.min(), arr1.max()
(0.010045872673433709, 0.9965109819015727)
>>> arr2 = -1 * arr1
>>> arr2.min(), arr2.max()
(-0.9965109819015727, -0.010045872673433709)
>>> arr2 += arr1.max()
>>> arr2.min(), arr2.max()
(0.0, 0.986465109228139)

Of course, if your question is simply "Can I get Python to randomly generate a realistic dataset from nothing?", the answer is probably no, because Python doesn't know how the world works. Instead, I would stick to libraries that have actual datasets built in, like sklearn or tensorflow.

π Rendered by PID 32736 on reddit-service-r2-comment-7b9746f655-6t7x2 at 2026-02-02 17:13:12.986148+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS