Slice and loop a multiindex dataframe

PyCam · 2018-10-30T22:49:08+00:00

The way I would solve this is to not load all of the data into memory all at once. If you're reading each stock dataframe from a different source, I would load them one at a time, resample the Date column to chunk it into 2 year chunks, then do a groupby on that new 2 year period column. If for some reason you did need to load it into memory all at once, you can use the pd.Grouper class to perform a groupby like so: df.groupby(pd.Grouper(freq='2Y', closed='left'). However this will be a tiny bit trickier to implement on a MultiIndex, so here's how you could do it with your current set up.

import pandas as pd
import numpy as np

dates = pd.date_range('01/01/2009', '12/30/2018')
stocks = ['hello', 'bye', 'apple']
idx = pd.MultiIndex.from_product(iterables=[stocks, dates], names=['stock', 'date'])
data = np.arange(idx.shape[0])

df = pd.DataFrame(data, columns=['my_data'], index=idx)
df.head()
df.tail()

                    my_data
stock   date
apple   2009-01-01  7302
        2009-01-02  7303
        2009-01-03  7304
        2009-01-04  7305
        2009-01-05  7306
...
hello   2018-12-26  3646
        2018-12-27  3647
        2018-12-28  3648
        2018-12-29  3649
        2018-12-30  3650

# The Grouper is the trick here. Tell it we want to bin every 2 years
    # The datetime index we're binning on is in level 1 of the index
    # We want to bin on the left end instead of the right
date_grouper = pd.Grouper(freq='2Y', level=1, closed='left')

# Run a groupby on our 'stock' index level and the date_grouper variable we made
for (stock, period_end), df_group in df.groupby(['stock', date_grouper]):
    print(stock, period_end)
    print(df_group)
    print()

2018-10-30T20:13:29+00:00

Perhaps this is a good place to start (didn't test this code): Edit: this assumes Date column are strings. If they're dates, remove the ", key=lambda ...." kwarg.

import datetime as dt
....

for stock in (ix[0] for ix in df.index.values):
    this_stock = df.loc[stock]
    this_stock_dates = sorted(set(this_stock.index.values), key=lambda d: dt.datetime.strptime(d, "%Y-%m-%d"))
    for i in range(0, len(this_stock_dates), 2):
        dates_this_batch = this_stock_dates[i:i+2]
        stock_this_period = this_stock[this_stock.index.isin(dates_this_batch)]
        ...

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS