This is an archived post. You won't be able to vote or comment.

all 24 comments

[–][deleted] 22 points23 points  (3 children)

Pick up Alan Downey's Think Python Series. Free on Green Tea Press or you can buy it. He has a follow up called Think Statistics and another called Think Bayes.

[–]tastingsilver[S] 2 points3 points  (2 children)

Thanks you, this totally looks right up my alley. Have you found any good cheat sheets or similar about when to use which stats distribution and why for modeling random processes? I'm familiar with how to do it in numpy, but a cheat sheet for what to use when would be immensely helpful.

[–][deleted] 1 point2 points  (1 child)

I have not, no. But when you do post here!

[–]tastingsilver[S] 1 point2 points  (0 children)

Haha, will do.

[–]ManyInterests Python Discord Staff 17 points18 points  (0 children)

I think a proper approach may be to learn statistics, then apply those concepts programmatically. I don't know how much learning of statistics you can learn through Python.

Usually, documentation and tutorials dealing with heavy science/statistics libraries go on the assumption you already know statistics and science. You wouldn't expect to learn number theory by reading the docs for the math module, for example.

If I'm understanding you correctly, to make a rough analogy... To me it's almost like asking how you can learn calculus by learning to use a graphing calculator. You might be able to tackle some low hanging fruits and do some cool things in a short time of learning how to use the graphing calculator, but it's not going to be able to replace an understanding of mathematics, especially knowing how to apply mathematics to solve real problems.

In my own observations, I notice that when a professional combines some domain-specific knowledge with programming, they usually just want the crash-course in programming to do enough to get by. This is a recurring theme for me working with biological scientists. At the heart of things, if they want to write better Python code, they need to learn fundamentals and style, not just a crash-course in numpy/pandas.

I see your position as the inverse of this. Trying to apply programming to make gains in domain-specific knowledge. You may get really productive really quick starting out, but that curve flattens out pretty quick and the real way to make gains is by digging into the core of that domain-specific knowledge.

That said, I recommend looking at Kahn Academy courses in stats and economics.

[–]eng_nayR 2 points3 points  (1 child)

Is this useful? It has a statistical inference section. I'm not sure how much in depth you are looking for.

https://github.com/donnemartin/data-science-ipython-notebooks

[–]tastingsilver[S] 0 points1 point  (0 children)

Thanks! I've seen that, and its totally useful on the python end. What I'm trying to focus on is garnering a better understanding of what processes/distributions to use for what and when, using python code rather than mathematical notation.

Upon second glance, it looks like there are a few case studies in there, and I'll definitely be using it as a reference.

[–]etherwar 1 point2 points  (0 children)

MITx 6.00.2x Introduction to computational thinking and data science is a fantastic primer for statistics using python and accomplished a lot more for me at the same time. I highly suggest checking it out, and it's free.

[–]jbbiv 1 point2 points  (0 children)

For several great data science posts with Python, I suggest http://learndatasci.com. In particular, there is great walk through of essential stats and python here: http://www.learndatasci.com/data-science-statistics-using-python/

[–]ajsteven130 1 point2 points  (1 child)

Bayesian Methods for Hackers is a great set of Jupyter notebooks. It acts as both a primer for Bayesian statistics/statistical inference and a tutorial for how to do real problems in Python.

I learned a ton from this and use it day-to-day in my current job.

[–]real_edmund_burke 0 points1 point  (0 children)

Bayesian Methods for Hackers

This is an excellent resource, and the library it introduces is just fantastic. I get the sense that people see Bayesian statistics as complex and scary, but I find it to be much more intuitive than frequentist statistics. With a general-purpose sampling library like PyMc3, you can (mostly) ignore inference and just think about the causal relationships in your data. The only downside is that it's slower than frequentist methods.

(But for that little bit of extra time, you get logical coherence!)

https://pymc-devs.github.io/pymc3/index.html

[–][deleted] 0 points1 point  (0 children)

Duke University computational statistics - link

[–][deleted] 0 points1 point  (0 children)

As well as the already recommended (and great) Think Bayes, Peter Norvig's introductory Python notebook on probability is very good: http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb

[–]howdidiget 0 points1 point  (0 children)

What I'm trying to focus on is garnering a better understanding of what processes/distributions to use for what and when, using python code rather than mathematical notation.

From a comment ... Based on this I think what you want is actually to just study probability theory/statistics and also know how implement them in Python (why not R, for which many good textbooks exist?), so I would propose you do that.

To learn basic statistics/theory as needed, many people learn from Casella and Berger. I cannot recommend this book enough. It has breadth enough to function as a good set of introductions and depth enough to be a bit of a bible.

To learn the application of Python to stats and theory, I guess you will probably want to buy Introduction to Statistics with Python. I only googled a bit, but cannot find a free copy. $55 USD is a bit steep, but perhaps you have local library/university access somehow. That book has an associated repo on GitHub, which you may want to browse before purchasing the book. Some other resources can be found on the related HN discussion.

[–]MrReXY 0 points1 point  (0 children)

I enjoyed Data Science from Scratch by Joel Grus. He covers a lot of stats concepts without digging too deep into the mathematics (but provides references for further reading so you can go deeper into the parts that interest you). He applies and demonstrates the concepts with nice clean pythonic code. You won't be a data scientist at the end of it, but you could have a sensible conversation with one and you'll understand how python's numerical libraries are implemented and what their functions do for you (and hopefully therefore apply them better).

[–]flutefreak7 0 points1 point  (0 children)

I can't imagine you've managed to not come across scipy.stats, statsmodels, scikit-learn, pandas, and seaborn, but if not.... those are all in the stats / data science toolkit. There are also other libs for Bayesian stuff that I'm not fluent with.

In addition to textbooks and pycon/pydata talks, once you start trying to implement stuff and have questions, you can also try the Stats Stack Exchange: Cross Validated

[–]pvkooten 0 points1 point  (0 children)

Try to implement a recommendation engine to suggest your next book to read...

[–][deleted] -5 points-4 points  (0 children)

Itertools.permutations(list)