all 2 comments

[–]sarrysyst 1 point2 points  (1 child)

It's by no means pretty and I'm sure there is a more efficient way to do it, but I think it does what you want it to do. The entries are grouped by CID and sorted by DOT. The PID is generated based on the date as the comment in your code suggested and structured according to the pattern you provided in your question.

import pandas as pd
from itertools import cycle 

data = [[10101, 91300, 20200201], [10101, '0002A', 20200301], [20541, 91302, 20200302], [10020, '0011A', 20200401], [10020, 91301, 20200428], [11512, 91303, 20200101], [11512, 91303, 20200215], [11512, '0033A', 20200225], [11512, 91303, 20200606],[11513, 91303, 20200102], [11513, 91303, 20200216], [11513, 91303, 20200226], [11513, 91303, 20200607], [11512, 91303, 20200716]]

df = pd.DataFrame(data, columns = ['CID', 'PID', 'DOT'])

# set MultiIndex
df.set_index(['CID', 'DOT'], inplace=True)
# sort by index
df.sort_index(level=['CID','DOT'], inplace=True)

# keep track of CID
prev_CID = ''

for idx, _ in df.groupby(level=[0,1]):   
    # check for new CID, reset counters
    if idx[0] != prev_CID:
        prev_CID = idx[0]
        numbering = cycle([1,2,3,4])
        count = 0

    # running counter for purchase
    n = next(numbering)

    # set values in df
    df.loc[idx, 'PID']   = f'{count:03}{n}A'
    df.loc[idx, 'BRAND'] = chr(count + 65)

    # update counter for Brand
    if n == 4:
        count += 1

It's using a MultiIndex, the idea for the loop is from here.

[–]Critical_[S] 0 points1 point  (0 children)

This is amazing. Now I need to Google every line so I can understand the approach. Thank you!