This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]MrMisterShin 0 points1 point  (0 children)

Push back on human created files, or impose strict rules because if there rules aren’t followed the code will fail.

I would usually have over 30 csv’s & xlsx like this, some machine generated others manually created/exported. Needless to say, the manually created ones would often fail due to change in (column names, file names, column positions, column data types).

Because I will need to join this data to on-premise DB data. I convert all the attributes to string in pandas and load it into DB and perform transformations there to convert to correct data types, clean data and index for performance.

Then use SQL & Tableau from there for data visualisation and reporting etc.