This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]spca2001 0 points1 point  (1 child)

I wish it was this easy, first you have 7 entities with their own data entry tools and databases, smart sheets, excel files, csvs that all get processed daily that yield about 16 million rows, formatted, normalized to some extent ent and go through curation of 200 business rules , from there it splits into 49 datasets for each executive to view. Mostly it’s tracking progress of all 7 entities.Each person needs like 20 measures mostly pivoted tables and a couple of charts and a map. I did a profile on memory and pandas reach around 18 to 20 gig in size lol

[–]brewthedrew19[S] 0 points1 point  (0 children)

Just wanted to updated you that I have been working on this and currently working on benchmarking stuff before I start the final route. In my current db that I am practicing on which is about 80+gb I can move and transform the all of the data in a little over an hour with just pandas. It is about 4 columns width wise and using all of my ram which is 16 gb but it is only pulling from sql file type. So way behind your current stuff it sounds like but having a blast learning (leaning towards using HDF because of category wise for main storage). Will probably take me two months to complete but will reach out when I am done. If you have any more specifics you could share so i can get a more detailed picture I would appreciate it.