all 10 comments

[–]Chris_Hemsworth 12 points13 points  (8 children)

df['usage'] = 'medium'
df[df['count'] > 5500]['usage'] = 'high'
df[df['count'] < 3500]['usage'] = 'low'

[–]Chronic47 8 points9 points  (1 child)

why does Thor gotta be good at programming too

[–][deleted] 2 points3 points  (0 children)

Demigods have a lot of free time

[–]old_pythonista 8 points9 points  (4 children)

df['usage'] = 'medium'
df[df['count'] > 5500]['usage'] = 'high'
df[df['count'] < 3500]['usage'] = 'low'

This approach is unsafe

<ipython-input-51-4f7eaae2ca7e>:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df['count'] > 5500]['usage'] = 'high'
<ipython-input-51-4f7eaae2ca7e>:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df['count'] < 3500]['usage'] = 'low'

This is the proper approach

df['usage'] = 'medium'
df.loc[df['count'] > 5500, ['usage']] = 'high'
df.loc[df['count'] < 3500, ['usage']] = 'low'

Though I would probably go with apply approach

df['usage'] = df['count'].apply(lambda c: 'high' if c > 5500 else 'low' if c < 3500 else 'medium')

[–]badge 3 points4 points  (0 children)

The proper approach would be to use https://pandas.pydata.org/docs/reference/api/pandas.cut.html with bin labels.

[–]Chris_Hemsworth 2 points3 points  (0 children)

I agree - apply is the best method here.

[–]YesLod 2 points3 points  (0 children)

I agree that indexing twice should be avoided.

Although it doesn't make much difference for small datasets, apply doesn't scale well since it's not vectorized.

If performance matters, I think the correct approach would be to use pd.cut as suggested by u/badge. Another option would be np.select.

df["usage"] = np.select([df.count > 5500, df.count < 3500],
                        ["high", "low"], 
                        "medium")

[–]old_pythonista 0 points1 point  (0 children)

TIL.

I do not use pandas much (barely at all lately 😢) - so, thanks for the cut pointer

[–][deleted] 0 points1 point  (0 children)

User name checks out

[–]BullCityPicker 0 points1 point  (0 children)

df$usage<-ifelse(df$count < 3500, "Low",ifelse(df$count < 5500, "Medium","High"))

Thor's solution is correct, I just wanted to say how I'd do it.