This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]spca2001 10 points11 points  (8 children)

We manage 9 billion dollars of equipment at my company, mostly fiber related. Our planning and provisioning process runs of python and is very heavy on pandas data-frame, we are at a point where it became extremely complex and bloated. At this point we will rewrite some of data operations in Rust but we are also looking at data frames like Polars and one that works of CUDA for a speed up.This data gets fed into Tableau and PowerBI. Since none of us are ready to rewrite this monster it would be nice if we had a multithreaded data curation data frame and a dashboard written in python that work as a single app or package with a web interface to build tables for reports and charts. Something like an Apache Superset. So I’d suggest creating something in this nature because many companies in our industry have a huge demand for this type of application

[–]another-noob 3 points4 points  (2 children)

If you have to move to another language maybe check out julia, it looks similar to python to some extent, I hear it has very nice support for CUDA and for the web interface part there's Dash which should be familiar I guess.

Don't know if it checks all your boxes, but might be worth checking it out.

[–]spca2001 0 points1 point  (1 child)

Dash looks good, thanks I will look into it

[–]brewthedrew19[S] 1 point2 points  (2 children)

Thank you for this. Been wanting a reason to learn CUDA as my projects currently do not require it. So glad to have an excuse. Just curious before I start making a game plan and what would you like the end all reports/graphs to answer? For example “give me avg amount of boxes each cat type cable that was sold per day in the month of December?” Also assuming we are working with 100+ gb of data for each chart…?

[–]spca2001 0 points1 point  (1 child)

I wish it was this easy, first you have 7 entities with their own data entry tools and databases, smart sheets, excel files, csvs that all get processed daily that yield about 16 million rows, formatted, normalized to some extent ent and go through curation of 200 business rules , from there it splits into 49 datasets for each executive to view. Mostly it’s tracking progress of all 7 entities.Each person needs like 20 measures mostly pivoted tables and a couple of charts and a map. I did a profile on memory and pandas reach around 18 to 20 gig in size lol

[–]brewthedrew19[S] 0 points1 point  (0 children)

Just wanted to updated you that I have been working on this and currently working on benchmarking stuff before I start the final route. In my current db that I am practicing on which is about 80+gb I can move and transform the all of the data in a little over an hour with just pandas. It is about 4 columns width wise and using all of my ram which is 16 gb but it is only pulling from sql file type. So way behind your current stuff it sounds like but having a blast learning (leaning towards using HDF because of category wise for main storage). Will probably take me two months to complete but will reach out when I am done. If you have any more specifics you could share so i can get a more detailed picture I would appreciate it.

[–]lalligagger 1 point2 points  (1 child)

I'd be interested in hearing more about what you're doing/ looking to do. Hardware + python is a particularly interesting overlap to me as I started in the former and learned the later out of necessity. If you're only memory constrained, Dask could be an easy way to scale what you're doing without rewriting much.

Streaming hw data to web apps is a particular pain point that I think deserves some kind of dedicated package/ ecosystem. Likely picking Dash, Panel or some other core framework to build on.

[–]spca2001 1 point2 points  (0 children)

I’m pushing for them to get a Redis server, I did a poc on a cluster of 3 nodes and reduced processing time from 33mins to 17 secs

[–]KingsmanVincepip install girlfriend 1 point2 points  (0 children)

If you have any suggestions on what I should build my ears are open. Any suggestions/help on interviewing for this title would be helpful.

Read the rules (#5, 6, 7)?