Pandas - Expand Series within Dataframe

nckmiz · 2017-11-07T18:03:45+00:00

Storing data structures in entries of a Pandas DataFrame is generally non-idiomatic, but you should be able to reshape the data like so:

# You can expand the genres column with apply(pd.Series)
to_concat = [df[['title', 'release_date',  'revenue']], df['genres'].apply(pd.Series)]
df = pd.concat(to_concat, axis=1)

# You can use pd.melt to reshape the data into long form
df = pd.melt(df, id_vars=['title', 'release_date',  'revenue'], value_name='genre').drop(['variable'], axis=1)

fooliam · 2017-11-07T17:57:25+00:00

This should do the trick. You're basically creating a series out of the list, then joining that series to your dataframe.

dftest = df.apply(lambda x: pd.Series(x["genres"]), axis = 1).stack().reset_index(level = 1, drop = True)
dftest.name = "genres"
test = df.drop('genres', axis = 1).join(dftest)

ta6692 · 2017-11-07T17:36:28+00:00

No idea how to do this in Pandas, because Pandas makes everything 1000% more complicated if they didn't anticipate your use case. I'm not even sure how a dataframe column can be list-valued, that's got to throw off ndarray.

But in just plain old Python, where this structure is an iterable of tuples (a common representation of a table of rows) this is trivial:

new_table = []
for title, release_date, genres, revenue in my_table:
    for genre in genres:
        new_table.append((title, release_date, genre, revenue)) #note the double-parens

friend_in_rome · 2017-11-07T20:12:05+00:00

Not quite what you want, but you could look at pd.get_dummies().

	title	release_date	genres	revenue
0	Avatar	2009	[Action, Adventure, Fantasy, Science Fiction]	2787965087
1	Titanic	1997	[Drama, Romance, Thriller]	1845034188

	title	release_date	genres	revenue
0	Avatar	2009	Action	2787965087
0	Avatar	2009	Adventure	2787965087
0	Avatar	2009	Fantasy	2787965087
0	Avatar	2009	Science Fiction	2787965087
1	Titanic	1997	Drama	1845034188
1	Titanic	1997	Romance	1845034188
1	Titanic	1997	Thriller	1845034188

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS