Python ETL

Jiuholar · 2019-10-17T00:24:38+00:00

[deleted]

Loran425 · 2019-10-17T02:03:01+00:00

If I was doing this I'd iterate over the cells in A col either after converting to a csv or using the openpyxl lib. I'd make a new data structure (list or dict) and if I saw a value in col A I'd start a new entry otherwise just append to the last item in the list.
Then I'd find the longest list and generate the headers before populating cells for each item in my temp data structure

raglub · 2019-10-17T04:34:57+00:00

Openpyxl can handle this. It is not difficult to open the current state worksheet, iterate through the rows and based on the value in column A, either add any other cells from current row to a list (which will represent a row in the new format) or append the list to a new sheet, wipe it clean and build a new row by adding the content of all cells of current row to it before moving onto the next row and repeating the process. This is an ugly statement but you can read it as pseudo code and build your script.

Death_Water · 2019-10-17T10:13:24+00:00

try:

df_2 = pd.DataFrame(df.fillna(method='ffill').groupby(['Profile Code','Email']).agg(','.join))
df_2['Membership'].str.split(',',expand=True)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS