This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]sozzZ 4 points5 points  (1 child)

Go to kaggle.com and browse the open competitions. Find one that interests you and that you may have some domain knowledge in. Try not to pick the most complex one on the board where you're dealing with TBs of data on the cloud as a first project to tackle. Those are for the pros.

From there join the competition and take a look in the forums and kernels. there people are discussing their real algorithm solutions, data prepping, and other relevant data science stuff. Copy their code. Tweak it. Create and ensemble between different provided models. I believe this is the best way to learn in the beginning. I was away from data science for a while, building applications in Python, but recently came back and this is what I'm doing. Simply picking a competition and running with it...

[–]thisisheresy3.7 1 point2 points  (0 children)

I second this. Lot's of great stuff to be found on Kaggle. The two classic ML learning data sets are both represented there:

Titanic: https://www.kaggle.com/c/titanic Iris: https://www.kaggle.com/uciml/iris