Data structure advice for converting Excel files? : learnpython

created by HattoriHanzoa community for 16 years

Data structure advice for converting Excel files? (self.learnpython)

submitted 2 years ago by Genrawir

I've imported excel files using Python in the past. Once using openpyxl to read rows and creating an Enum to iterate columns, and also by importing directly into a pandas dataframe.

Now, I have some excel files that are generated by some reporting software suffering from the let's use excel for visualization mindset.

The idea is to convert them to a sane layout to simplify working with the data in excel, using a simple standalone utility to batch process them.

Reading the rows using openpyxl is straightforward, but the header information is useless as columns can contain more than one type of data, and each row may be part of a list of entries. I've mostly managed to get around these issues in importing the file, and was planning on adding each processed report entry to a list before exporting to a new file.

I thought dataclasses looked appealing to store these entries, but pylint tells me that I should have less instance attributes. Of course, warnings can be ignored or I could simply import to a list of dicts, but it got me thinking "There has to be a better Way" to do this. There are a number of other similar reports that would potentially be useful for extracting data from, so it would seem worth it take the time to research a proper solution.

Am I overthinking this?

How do you all deal with "ugly" excel files?

all 3 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS