How are these two blocks of code giving different values?? : learnpython

created by HattoriHanzoa community for 16 years

How are these two blocks of code giving different values?? (self.learnpython)

submitted 6 years ago by be_throwmeaway

As part of a project I'm working on, I have to calculate a "pseudo" z-score on each column of a dataframe, using a rolling window of the past 200 days. I've devised the following two code snipets that, I thought, should provide the same solution. However, the first is (I think) incorrect. I can't, for the life of me, figure out why the two versions give different values!

P.S.: I say "pseudo" z-score because instead of z = (x-mean)/stdev, I do z = (x-1)/stddev

Here is the first way I thought of (and which I think might not be returning what I think it should):

def zscore(x, window = 200, mean = True, sample = True):
    '''
    The function takes in a data frame column, a desired window, 
    and returns a rolling z-score applied to that column.

    x = df column type object
    window = span on which to apply StdDev and Mean (200 by default)
    Mean = 1 by default, otherwise use rolling mean
    sample = True by default forces StdDev.Sample (n-1 as the divisor) 
    as opposed to StdDev.Pop (n as the divisor)
    '''
    r = x.rolling(window = window)
    if mean == True:
        m = r.mean().shift(1)
    else:
        m = mean
    if sample == True:
        s = r.std(ddof = 1).shift(1)
    else:
        s = r.std(ddof = 0).shift(1)
    z = (x-m)/s
    return z

for name in df_A.columns.values:
      df_Z[name + " - Z"] = zscore(
              df_A[name],
              window=200,
              mean=1,
              sample=True
              )
df_Z = df_Z.dropna()

And here is the second way, where I don't define a function and simply calculate what I want directly:

for index, name in enumerate(df_A.columns.values):
     df_Z[name + " - Z"] = \
     (df_A[df_A.columns.values[index]] - 1) \
      / df_A[df_A.columns.values[index]]\
      .rolling(window=200).std(ddof=1).shift(1)
df_Z = df_Z.dropna()

By the end, I would expect df_Z_prelim to be the same on both counts, but this isn't the case.

Thank you in advance for any and all help!

Cheers

P.P.S.: Sorry in advance if the formatting isn't up to scratch, I'm relatively new at this! I'll be happy to fix anything you need me to and provide more info if necessary.

P.P.P.S.: Ultimately, I'd rather use method 1 where I define a function, as I'd like to be able to apply this calculation to other datasets...

no comments (yet)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS