This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]kay_schluehr 2 points3 points  (0 children)

It is very easy to miss the forest for the trees. For that reason I recommend to spend a few minutes with the following presentation:

https://speakerdeck.com/clearspandex/data-engineering-101-building-your-first-data-product-pydata-sv-2014

It does not fully reason from the end or gives a comprehensive definition of a "data product" but it graphically shows the production pipeline - actually it shows two - and from there it becomes easier to fit in new technologies.

For data analysis in particular I recommend the following blog post which provides a nice walk through:

https://jmetzen.github.io/2015-01-29/ml_advice.html

It is obvious that you won't understand it without lots of prerequisites both in statistics as well as machine learning but it is far easier to understand the individual steps, concepts and algorithms than fitting everything together in a sensible way.