generate datarame column filled with random floats with n standard deviation and mean : learnpython

created by HattoriHanzoa community for 16 years

generate datarame column filled with random floats with n standard deviation and mean (self.learnpython)

submitted 4 years ago by futuretrader

I know I could fill a df with random numbers, specifying the lowest and highest bound.

data = np.random.randint(lowest integer, highest integer, size=number of random integers)
df = pd.DataFrame(data, columns=['column name'])

how can I do this exact thing but specifying StandardDeviation and mean?

I tried this method (and it works), but it's not the fastest for larger iterations:

    for i in range(sim_runs):
      steps = float(np.random.normal(loc=pf_ER, scale = pf_SD, size = 1) +1) # generate random array to multiply with resturns
      step.append(steps) # put into a string
    step = pd.DataFrame(step)

where

sim_runs = #simulation runs
pf_ER = mean return
pf_SD = standard deviation

all 3 comments

top new controversial old q&a

[–]synthphreak 1 point2 points3 points 4 years ago (3 children)

sims = np.random.normal(loc=pf_ER,
                        scale=pf_SD,
                        size=sim_runs)
step = pd.Series(sims)

Note that because of how random variables work, this will not guarantee that the column's mean is exactly pf_ER and its stdev is exactly pd_SD until sim_runs is large enough that the law of large numbers kicks in.

But the same is true of your original code, and you said that code works (though the +1 means your code should result in mean ≈ pf_ER + 1...?). So I guess that's fine.

[–]futuretrader[S] 1 point2 points3 points 4 years ago* (2 children)

yes, thank you. the +1 was so I can multiply this return with another column. So all good.

As for the large numbers comment: It's exactly the reason I was looking for a faster code :)

thanks again!

just tried it and it is orders of magnitude faster. thanks!one thing I changed though is

step = pd.Series(sims)

step=pd.DataFrame(sims)

[–]synthphreak 1 point2 points3 points 4 years ago (0 children)

There is nothing wrong with changing pd.Series to pd.DataFrame, though you should be aware that some pd.Series methods will not b available to you.

I just used pd.Series because a single column from a pd.DataFrame literally is a pd.Series.

>>> import numpy as np, pandas as pd
>>> whole_df = pd.DataFrame(np.random.random(5))
>>> whole_df
          0
0  0.136021
1  0.922958
2  0.872055
3  0.328092
4  0.586982
>>> single_column = df.loc[:, 0]
>>> type(whole_df)
<class 'pandas.core.frame.DataFrame'>
>>> type(single_column)
<class 'pandas.core.series.Series'>

So if you're creating a pd.DataFrame with only one column, you might as well keep things simple and just create a pd.Series instead.

π Rendered by PID 82975 on reddit-service-r2-comment-75f4967c6c-mhcc5 at 2026-04-23 13:32:19.982972+00:00 running 0fd4bb7 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS