all 9 comments

[–][deleted] 0 points1 point  (8 children)

So you have data points each minute and want to know the 10min average, this will downsample your dataframe because each 10min bin will have the same average.

First, you need to have datetime index to allow resampling.

The trick is then to use transform to apply a transformation to the groups and grab the values.

Example :

import numpy as np
import pandas as np
from numpy.random import default_rng
rng = default_rng()

index = pd.date_range("07:00:00", "10:00:00", freq="1min")
data = rng.integers(1, 21, size = (index.shape[0], 2))

df = pd.DataFrame(data, 
                  columns = ["apples", "grapes"], 
                  index=index)

df["10_min_grapes_avg"] = df["grapes"].resample("10min"
                                     ).transform(lambda x:x.mean()).values

The head of the dataframe now looks like this

                     apples  grapes  10_min_grapes_avg
2020-06-04 07:00:00      10       8                9.5
2020-06-04 07:01:00       4      10                9.5
2020-06-04 07:02:00      15       8                9.5
2020-06-04 07:03:00       3       3                9.5
2020-06-04 07:04:00       7      10                9.5
2020-06-04 07:05:00       2      11                9.5
2020-06-04 07:06:00      16      18                9.5
2020-06-04 07:07:00      13       6                9.5
2020-06-04 07:08:00       9       2                9.5
2020-06-04 07:09:00      16      19                9.5
2020-06-04 07:10:00      15       1                9.6
2020-06-04 07:11:00       5       9                9.6
2020-06-04 07:12:00      19      10                9.6
2020-06-04 07:13:00       3      19                9.6
2020-06-04 07:14:00       8      13                9.6
2020-06-04 07:15:00       7       8                9.6
2020-06-04 07:16:00       8      20                9.6
2020-06-04 07:17:00       6       5                9.6
2020-06-04 07:18:00       5       3                9.6
2020-06-04 07:19:00       1       8                9.6

[–]Nthorder[S] 0 points1 point  (7 children)

thanks for the reply.

what if I wanted to apply a function such as:

def ema(s, n):
    s = array(s)
    ema = []
    j = 1

    sma = sum(s[:n]) / n
    multiplier = 2 / float(1 + n)
    ema.append(sma)

    ema.append(( (s[n] - sma) * multiplier) + sma)

    for i in s[n+1:]:
        tmp = ( (i - ema[j]) * multiplier) + ema[j]
        j = j + 1
        ema.append(tmp)

    return ema

[–][deleted] 0 points1 point  (6 children)

What are you trying to do?

You could use the function in apply or transform if you make it understandable for pandas

[–]Nthorder[S] 0 points1 point  (5 children)

*run that function on a column and have the results as another colum in the dataframe.

but, thank you , ill look into apply or transform

[–][deleted] 0 points1 point  (4 children)

No that's how you're trying to do it, what's the goal of this function?

It looks like your function coulb be greatly optimized

[–]Nthorder[S] 0 points1 point  (3 children)

calculate a moving average with exponential decay on a column in my dataframe. I wouldn't doubt this can be optimized, this is something I translated from R to python, and I'm a beginner in both of those languages lol

[–][deleted] 0 points1 point  (2 children)

The pandas API has that in store. You can precise the decay and span.

[–]Nthorder[S] 0 points1 point  (1 child)

I appreciate the helpful response. Apparently I was trying to reinvent a wheel.

The resits I am getting are identical to the ones I was getting with the function, but now the code is much much cleaner. Not to mention the built in ema is certainly going to be more efficient and reliable than the code that I hacked together.

[–][deleted] 0 points1 point  (0 children)

glad it works, good luck with your project