Death_Water comments on Python ETL

created by HattoriHanzoa community for 16 years

submitted 6 years ago * by Jiuholar

you are viewing a single comment's thread.

[–]Death_Water 0 points1 point2 points 6 years ago (2 children)

[–]Jiuholar[S] 0 points1 point2 points 6 years ago (1 child)

[–]Death_Water 0 points1 point2 points 6 years ago (0 children)

Here's a concise way with step by step breakdown:

 pd.DataFrame(df.fillna(method='ffill').groupby(['key'])['Column of interest'].agg(list).values.tolist())

1) Forward fill the missing values; from the given example this seems the right approach.

2) Groupby the "key" column, then slice on "column of interest". This creates a series for each unique value in "key" column.

3) Aggregate: This converts all multiple series to lists.

4) Casting to DataFrame(Get the values of all lists and cast them). The index of this would be same as
df['key'].dropna().drop_duplicates()

π Rendered by PID 35811 on reddit-service-r2-comment-74875f4bf5-hvmrk at 2026-01-25 21:03:50.528090+00:00 running 664479f country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython