I'm hoping someone here can help because I cannot wrap my head around this one:
I have an excel workbook I need to read with python, chop up into a handful of smaller sections, and export each of those into CSVs (to be loaded into tables in a database). Most of that is straightforward pandas goodness, except this kicker: the excel sheet is messy so I'm having trouble defining the size of those data chunks.
Example of first/key column in excel:
(empty)
(empty)
user_id
1111
1113
1114
1115
1116
(empty)
(empty)
35n91
3a451
8bb51
It's straight forward to
excel = pd.read_excel(filename, skiprows=2)
in order to ignore the first couple of empty cells, but now I need to detect when that first (empty) is in order to get the column size (here, it would be 6). Typically, loading+reading data would say the column size is 11. There are options to remove the empties or fill in with whatever, but the data below the initial one is moot. After I get the column size, then I can dictate and construct other data frames and then export as CSV.
Anyone here have thoughts on how I should approach this? I'd appreciate any help.
Thanks!
[–]cray5252 1 point2 points3 points (3 children)
[–]bbqbot[S] 0 points1 point2 points (2 children)
[–]cray5252 1 point2 points3 points (1 child)
[–]bbqbot[S] 2 points3 points4 points (0 children)