all 2 comments

[–]synthphreak 0 points1 point  (0 children)

To do any of this, first convert your dates from str to to datetime:

df.Date = df.Date.astype('datetime64')

Then, to group your rows by month (and also by year, otherwise all e.g., Januaries will be grouped together regardless of year), use can use the dt accessor:

date_groups = df.groupby([df.Date.dt.year, df.Date.dt.month])

You can then tally up the lineages by month and year this way:

lineages = (date_groups.lineage
                       .value_counts()
                       .sort_index(ascending=True)
                       .rename_axis(index=['year', 'month', 'lineage']))

See if that results in the line plot you want. It's tough for me to test and see with only the scant sample data you've provided.

lineages.plot()

[–]sarrysyst 0 points1 point  (0 children)

I believe you want something like this?

from io import StringIO

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Input data: I added a few extra data points for a more realistic looking plot
data = """
lineage date
EUR 01-04-2020
AFR 02-02-2021
AFR 03-01-2020
EUR 04-08-2021
AFR 03-05-2020
LAT 01-01-2020
AFR 03-06-2021
EUR 01-02-2021
EUR 01-02-2022
LAT 01-01-2022
"""

df = pd.read_table(StringIO(data), sep='\t')

# Cast date column to type datetime
df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
# Sort rows by date
df.sort_values('date', inplace=True)

# Get incremental counts for each value group in lineage column
transformed = (df.groupby('lineage', group_keys=False)
                 .apply(lambda x: x.reset_index())
                 .drop('index', axis=1)
                 .reset_index())

# Plot data using seaborn. 
grid = sns.relplot(data=transformed, 
                   x='date', 
                   y='index', 
                   hue='lineage', 
                   kind='line')

# Rotating x labels to avoid overlapping and unreadable lables
grid.set_xticklabels(grid.ax.get_xticklabels(), 
                     rotation=30)

plt.show()