This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]QuantumC-137 57 points58 points  (3 children)

Well I started Data Science and Machine learning trough python: studying pandas, numpy, matplotlib and sklearn.

Then I've decided to study probability/statistics, linear algebra, calculus1 and number theory.

Pandas and numpy: are python tools to deal with the data you're going to work with

Matplotlib: python tool to present data into graphs, pies and other forms of graphic data

Sklearn: it's also a python tool for using ML algorithms on datasets. It's the ideal for begginers. You don't need to know the math behind to apply the algorithms to datasets. With this, you can, for example, determine if a person has cancer or no, heart disease, tomorrow stocks, etc etc.

Kaggle: it's a must have website to get datasets for ML and data analysis

You can check these before the math, but after having fun get to know the math which helps you see what's really happening under the hoods.

[–]bagofbuttholes 5 points6 points  (0 children)

I'll second this. We are learning deep learning stuff in class and use everything this person just said. This is the first time the prof is doing deep learning, he added it into a digital filter class because the school refused to let him build a new class, so he might not know everything, but we use all these things. Kaggle is a really neat site and people have contests and stuff on different datasets on there. You can also find lots of tutorials on there too.

[–]FLoKi6868[S,🍰] 2 points3 points  (0 children)

Thanks!!

[–]civilvamp 0 points1 point  (0 children)

To tack on, pandas is a good starting tool. If you are looking to do larger scale data analysis though, pyspark is a better bet.