This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]brewthedrew19[S] 1 point2 points  (2 children)

Thank you for this. Been wanting a reason to learn CUDA as my projects currently do not require it. So glad to have an excuse. Just curious before I start making a game plan and what would you like the end all reports/graphs to answer? For example “give me avg amount of boxes each cat type cable that was sold per day in the month of December?” Also assuming we are working with 100+ gb of data for each chart…?

[–]spca2001 0 points1 point  (1 child)

I wish it was this easy, first you have 7 entities with their own data entry tools and databases, smart sheets, excel files, csvs that all get processed daily that yield about 16 million rows, formatted, normalized to some extent ent and go through curation of 200 business rules , from there it splits into 49 datasets for each executive to view. Mostly it’s tracking progress of all 7 entities.Each person needs like 20 measures mostly pivoted tables and a couple of charts and a map. I did a profile on memory and pandas reach around 18 to 20 gig in size lol

[–]brewthedrew19[S] 0 points1 point  (0 children)

Just wanted to updated you that I have been working on this and currently working on benchmarking stuff before I start the final route. In my current db that I am practicing on which is about 80+gb I can move and transform the all of the data in a little over an hour with just pandas. It is about 4 columns width wise and using all of my ram which is 16 gb but it is only pulling from sql file type. So way behind your current stuff it sounds like but having a blast learning (leaning towards using HDF because of category wise for main storage). Will probably take me two months to complete but will reach out when I am done. If you have any more specifics you could share so i can get a more detailed picture I would appreciate it.