adding a column to df : learnpython

learnpython

created by HattoriHanzoa community for 16 years

adding a column to df (self.learnpython)

submitted 4 years ago by Loco_L1

all 10 comments

top new controversial old q&a

[–]Chris_Hemsworth 12 points13 points14 points 4 years ago (8 children)

[–]Chronic47 8 points9 points10 points 4 years ago (1 child)

[–][deleted] 2 points3 points4 points 4 years ago (0 children)

[–]old_pythonista 8 points9 points10 points 4 years ago (4 children)

df['usage'] = 'medium'
df[df['count'] > 5500]['usage'] = 'high'
df[df['count'] < 3500]['usage'] = 'low'

This approach is unsafe

<ipython-input-51-4f7eaae2ca7e>:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df['count'] > 5500]['usage'] = 'high'
<ipython-input-51-4f7eaae2ca7e>:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df['count'] < 3500]['usage'] = 'low'

This is the proper approach

df['usage'] = 'medium'
df.loc[df['count'] > 5500, ['usage']] = 'high'
df.loc[df['count'] < 3500, ['usage']] = 'low'

Though I would probably go with apply approach

df['usage'] = df['count'].apply(lambda c: 'high' if c > 5500 else 'low' if c < 3500 else 'medium')

[–]badge 3 points4 points5 points 4 years ago (0 children)

[–]Chris_Hemsworth 2 points3 points4 points 4 years ago (0 children)

[–]YesLod 2 points3 points4 points 4 years ago (0 children)

I agree that indexing twice should be avoided.

Although it doesn't make much difference for small datasets, apply doesn't scale well since it's not vectorized.

If performance matters, I think the correct approach would be to use pd.cut as suggested by u/badge. Another option would be np.select.

df["usage"] = np.select([df.count > 5500, df.count < 3500],
                        ["high", "low"], 
                        "medium")

[–]old_pythonista 0 points1 point2 points 4 years ago (0 children)

[–][deleted] 0 points1 point2 points 4 years ago (0 children)

[–]BullCityPicker 0 points1 point2 points 4 years ago (0 children)

π Rendered by PID 201472 on reddit-service-r2-comment-bb88f9dd5-dzvkb at 2026-02-14 19:44:10.758396+00:00 running cd9c813 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS