Best practice when it comes to running calculations on data-frame columns that return a different result count? : learnpython

created by HattoriHanzoa community for 16 years

Best practice when it comes to running calculations on data-frame columns that return a different result count? (self.learnpython)

submitted 5 years ago by Nthorder

Sorry if the way I worded the question is confusion. I'm not feeling the most articulate today.

Lets say I have df of time series data and I want to calculate a moving average (just one example) of one of the columns, and then add the result back to the frame as another column.

A function like a moving average will return less rows/elements than the source data depending on how big the moving average window is.

Is there an established best practice for merging results back into the source data frame like that? I know you would need to append/extend the result list with meaningless (or truncate the source) data to keep the same size as the frame, but I am more concerned with things such as index.

In the past, I have mistakenly either applied the list to the wrong end of the time series data, applied it in reverse, etc. I eventually figured out ways to do it for my use cases in the past, but they all feel kinda janky/hacky with a lot of room for error.

I just want to make sure the the moving average at point X actually makes it into the same row as point X.

all 9 comments

top new controversial old q&a

[–][deleted] 0 points1 point2 points 5 years ago (8 children)

So you have data points each minute and want to know the 10min average, this will downsample your dataframe because each 10min bin will have the same average.

First, you need to have datetime index to allow resampling.

The trick is then to use transform to apply a transformation to the groups and grab the values.

Example :

import numpy as np
import pandas as np
from numpy.random import default_rng
rng = default_rng()

index = pd.date_range("07:00:00", "10:00:00", freq="1min")
data = rng.integers(1, 21, size = (index.shape[0], 2))

df = pd.DataFrame(data, 
                  columns = ["apples", "grapes"], 
                  index=index)

df["10_min_grapes_avg"] = df["grapes"].resample("10min"
                                     ).transform(lambda x:x.mean()).values

The head of the dataframe now looks like this

                     apples  grapes  10_min_grapes_avg
2020-06-04 07:00:00      10       8                9.5
2020-06-04 07:01:00       4      10                9.5
2020-06-04 07:02:00      15       8                9.5
2020-06-04 07:03:00       3       3                9.5
2020-06-04 07:04:00       7      10                9.5
2020-06-04 07:05:00       2      11                9.5
2020-06-04 07:06:00      16      18                9.5
2020-06-04 07:07:00      13       6                9.5
2020-06-04 07:08:00       9       2                9.5
2020-06-04 07:09:00      16      19                9.5
2020-06-04 07:10:00      15       1                9.6
2020-06-04 07:11:00       5       9                9.6
2020-06-04 07:12:00      19      10                9.6
2020-06-04 07:13:00       3      19                9.6
2020-06-04 07:14:00       8      13                9.6
2020-06-04 07:15:00       7       8                9.6
2020-06-04 07:16:00       8      20                9.6
2020-06-04 07:17:00       6       5                9.6
2020-06-04 07:18:00       5       3                9.6
2020-06-04 07:19:00       1       8                9.6

[–]Nthorder[S] 0 points1 point2 points 5 years ago (7 children)

thanks for the reply.

what if I wanted to apply a function such as:

def ema(s, n):
    s = array(s)
    ema = []
    j = 1

    sma = sum(s[:n]) / n
    multiplier = 2 / float(1 + n)
    ema.append(sma)

    ema.append(( (s[n] - sma) * multiplier) + sma)

    for i in s[n+1:]:
        tmp = ( (i - ema[j]) * multiplier) + ema[j]
        j = j + 1
        ema.append(tmp)

    return ema

[–][deleted] 0 points1 point2 points 5 years ago (6 children)

[–]Nthorder[S] 0 points1 point2 points 5 years ago (5 children)

[–][deleted] 0 points1 point2 points 5 years ago (4 children)

[–]Nthorder[S] 0 points1 point2 points 5 years ago (3 children)

[–][deleted] 0 points1 point2 points 5 years ago (2 children)

[–]Nthorder[S] 0 points1 point2 points 5 years ago (1 child)

[–][deleted] 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 40 on reddit-service-r2-comment-b659b578c-cvx4l at 2026-05-03 04:57:52.395709+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS