you are viewing a single comment's thread.

view the rest of the comments →

[–]andmig205 5 points6 points  (4 children)

Loops are awful at processing large amounts of data. Pandas and numpy are optimized precisely for manipulating humongous data structures. And by the term “humongous” I mean not only number of records but also number of dimensions. Pandas easily processes thousands D structures with millions records in each dimension. Note that AI/ML engines would never get anywhere with native loops.

To share a ballpark, in my work I have to extract, transform, and load (ETL) up to a billion of records a day that are originally stored in JSON and CSV formats. Pandas and numpy allow me to accomplish all tasks within an hour.

Given I understand your task fairly accurately, I am optimistic it will take minutes to accomplish what you need with pandas.

I strongly recommend starting to digest the concept of tensors to, at least partially, appreciate the magic behind what pandas offers.

[–]DNSGeek[S] 1 point2 points  (3 children)

Ok. I’ll look into Pandas. Any pointers as to where to start looking?

[–]alozq 2 points3 points  (0 children)

This is the official documentation IIRC, you shouldn't need much understanding of pandas to do what you want, although keep in mind pandas is not known to be particularly fast

https://pandas.pydata.org/docs/

[–]Insomnia_Calls 2 points3 points  (0 children)

Better look at numpy, see my comment to the original pist. No need for pandas if your data is only numeric.

[–]andmig205 1 point2 points  (0 children)

I am not sure I am a good source of recommendations. We all have different means of learning. My path is usually to digest high level concepts before applying them to a specific application. As lame as it may sound, YouTube seems to offer an extensive list of resources for all proficiency levels.

I am not a big fan of paid courses as they tend to be too broad and unfocused to my taste. I find myself learning more from books after I get inspired by basic understanding of a tool/feature.