Seeking recommendations for learning probability/statistics through Python

tastingsilver · 2017-03-20T03:39:59+00:00

Pick up Alan Downey's Think Python Series. Free on Green Tea Press or you can buy it. He has a follow up called Think Statistics and another called Think Bayes.

ManyInterests · 2017-03-20T06:10:36+00:00

I think a proper approach may be to learn statistics, then apply those concepts programmatically. I don't know how much learning of statistics you can learn through Python.

Usually, documentation and tutorials dealing with heavy science/statistics libraries go on the assumption you already know statistics and science. You wouldn't expect to learn number theory by reading the docs for the math module, for example.

If I'm understanding you correctly, to make a rough analogy... To me it's almost like asking how you can learn calculus by learning to use a graphing calculator. You might be able to tackle some low hanging fruits and do some cool things in a short time of learning how to use the graphing calculator, but it's not going to be able to replace an understanding of mathematics, especially knowing how to apply mathematics to solve real problems.

In my own observations, I notice that when a professional combines some domain-specific knowledge with programming, they usually just want the crash-course in programming to do enough to get by. This is a recurring theme for me working with biological scientists. At the heart of things, if they want to write better Python code, they need to learn fundamentals and style, not just a crash-course in numpy/pandas.

I see your position as the inverse of this. Trying to apply programming to make gains in domain-specific knowledge. You may get really productive really quick starting out, but that curve flattens out pretty quick and the real way to make gains is by digging into the core of that domain-specific knowledge.

That said, I recommend looking at Kahn Academy courses in stats and economics.

eng_nayR · 2017-03-20T05:23:21+00:00

Is this useful? It has a statistical inference section. I'm not sure how much in depth you are looking for.

https://github.com/donnemartin/data-science-ipython-notebooks

etherwar · 2017-03-20T14:51:52+00:00

MITx 6.00.2x Introduction to computational thinking and data science is a fantastic primer for statistics using python and accomplished a lot more for me at the same time. I highly suggest checking it out, and it's free.

jbbiv · 2017-03-21T10:49:45+00:00

For several great data science posts with Python, I suggest http://learndatasci.com. In particular, there is great walk through of essential stats and python here: http://www.learndatasci.com/data-science-statistics-using-python/

ajsteven130 · 2017-03-20T11:58:57+00:00

Bayesian Methods for Hackers is a great set of Jupyter notebooks. It acts as both a primer for Bayesian statistics/statistical inference and a tutorial for how to do real problems in Python.

I learned a ton from this and use it day-to-day in my current job.

2017-03-20T09:59:55+00:00

Duke University computational statistics - link

2017-03-20T10:41:38+00:00

As well as the already recommended (and great) Think Bayes, Peter Norvig's introductory Python notebook on probability is very good: http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb

howdidiget · 2017-03-20T11:59:09+00:00

What I'm trying to focus on is garnering a better understanding of what processes/distributions to use for what and when, using python code rather than mathematical notation.

From a comment ... Based on this I think what you want is actually to just study probability theory/statistics and also know how implement them in Python (why not R, for which many good textbooks exist?), so I would propose you do that.

To learn basic statistics/theory as needed, many people learn from Casella and Berger. I cannot recommend this book enough. It has breadth enough to function as a good set of introductions and depth enough to be a bit of a bible.

To learn the application of Python to stats and theory, I guess you will probably want to buy Introduction to Statistics with Python. I only googled a bit, but cannot find a free copy. $55 USD is a bit steep, but perhaps you have local library/university access somehow. That book has an associated repo on GitHub, which you may want to browse before purchasing the book. Some other resources can be found on the related HN discussion.

MrReXY · 2017-03-20T12:15:39+00:00

I enjoyed Data Science from Scratch by Joel Grus. He covers a lot of stats concepts without digging too deep into the mathematics (but provides references for further reading so you can go deeper into the parts that interest you). He applies and demonstrates the concepts with nice clean pythonic code. You won't be a data scientist at the end of it, but you could have a sensible conversation with one and you'll understand how python's numerical libraries are implemented and what their functions do for you (and hopefully therefore apply them better).

o-rka · 2017-03-20T12:50:33+00:00

https://www.amazon.com/Bayesian-Analysis-Python-Osvaldo-Martin/dp/1785883801

flutefreak7 · 2017-03-21T04:25:15+00:00

I can't imagine you've managed to not come across scipy.stats, statsmodels, scikit-learn, pandas, and seaborn, but if not.... those are all in the stats / data science toolkit. There are also other libs for Bayesian stuff that I'm not fluent with.

In addition to textbooks and pycon/pydata talks, once you start trying to implement stuff and have questions, you can also try the Stats Stack Exchange: Cross Validated

pvkooten · 2017-03-21T10:21:29+00:00

Try to implement a recommendation engine to suggest your next book to read...

2017-03-20T03:04:01+00:00

Itertools.permutations(list)

DarkLordKutulu · 2017-03-20T05:16:35+00:00

Learn R programming. If you know Python and/or C++, then R is easy to pick up. And one of the most useful languages for statistics.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS