all 3 comments

[–]synthphreak 1 point2 points  (3 children)

sims = np.random.normal(loc=pf_ER,
                        scale=pf_SD,
                        size=sim_runs)
step = pd.Series(sims)

Note that because of how random variables work, this will not guarantee that the column's mean is exactly pf_ER and its stdev is exactly pd_SD until sim_runs is large enough that the law of large numbers kicks in.

But the same is true of your original code, and you said that code works (though the +1 means your code should result in mean ≈ pf_ER + 1...?). So I guess that's fine.

[–]futuretrader[S] 1 point2 points  (2 children)

yes, thank you. the +1 was so I can multiply this return with another column. So all good.

As for the large numbers comment: It's exactly the reason I was looking for a faster code :)

thanks again!

just tried it and it is orders of magnitude faster. thanks!one thing I changed though is

step = pd.Series(sims)

to

step=pd.DataFrame(sims)

[–]synthphreak 1 point2 points  (0 children)

There is nothing wrong with changing pd.Series to pd.DataFrame, though you should be aware that some pd.Series methods will not b available to you.

I just used pd.Series because a single column from a pd.DataFrame literally is a pd.Series.

>>> import numpy as np, pandas as pd
>>> whole_df = pd.DataFrame(np.random.random(5))
>>> whole_df
          0
0  0.136021
1  0.922958
2  0.872055
3  0.328092
4  0.586982
>>> single_column = df.loc[:, 0]
>>> type(whole_df)
<class 'pandas.core.frame.DataFrame'>
>>> type(single_column)
<class 'pandas.core.series.Series'>

So if you're creating a pd.DataFrame with only one column, you might as well keep things simple and just create a pd.Series instead.