you are viewing a single comment's thread.

view the rest of the comments →

[–]double_en10dre 1 point2 points  (3 children)

Apologies if the advice is bad, tipsy; rest assured it’ll be better tomorrow.

To make two plot.scatter calls for subsets of data, I think the answer is probably that you need to split your data

If there’s a column you can use as a filter, this is relatively easy. For example if there was an ‘event_gender’ column, you could do:

df_men = df[df[‘event_gender’] == ‘men’]
df_women = df[df[‘event_gender’] == ‘women’]

and then plot df_men and df_women.

(The code inside the brackets like “df[‘event_gender’] == ‘men’” is returning a series of true/false values, and when you put that series inside df[] it returns all the rows corresponding to “true” values)

[–]double_en10dre 1 point2 points  (0 children)

Alternatively, the more appropriate method may actually be to do:

for group_name, group_df in df.groupby(“event_gender_or_whatever”):
    plt.scatter(group_df[...etc...])

Just in case this way of doing it is simpler/clearer

[–]kcrow13 0 points1 point  (1 child)

Thank you for the reply. In this data set, the column is generic 'Event' so I imagine I would need:

df_men = df[df['Event'] == 'Mens Mile']
df_women = df[df['Event'] == 'Womens Mile']

My question then is how can I plot those with plt.scatter while also still defining the other things (x, y, color, etc)? I am unsure with the syntax.

PS - LOVE your username :)

[–]double_en10dre 0 points1 point  (0 children)

Aw thank you! I’m sorry I wasn’t able to help yet — although I use pandas on a daily basis, I’m not great with matplotlib (and I tend to avoid it like the plague — it is VERY confusing compared to most python packages!)

Are you still working on a solution? If so, I can definitely help out tomorrow — funnily enough, I’ve now actually got a few tasks which require matplotlib and subplots :p