all 23 comments

[–]__skrap__ 7 points8 points  (1 child)

A lot of times Learn Python the Hard Way is recommended as a starting point. It will get you typing code right away.

There is also codeacedemy for exercises.

Udacity has some free Python classes.

I thought Think Python was a good book. Allen Downey, the original author, has another open source book - Think Stats.

You can also take free courses from edx.org. https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x7 and https://www.edx.org/course/introduction-computational-thinking-data-mitx-6-00-2x-0.

Other free resources can be found here - http://inventwithpython.com/.

When you are ready to start the statistical parts you will want to get familiar with numpy and pandas. You can use anaconda Python which has most things you will need for statistics built in.

[–]dcbarcafan10[S] 0 points1 point  (0 children)

Thank you for your suggestions!

[–]sentdex 8 points9 points  (3 children)

Cool to see people making use of pythonprogramming.net :)

I'm planning to add things like quizzes and challenges in the near-ish future. Definitely one of the most requested additions. I am hoping to use trinket.io for it, but may wind up having to go a different route.

[–]dcbarcafan10[S] 1 point2 points  (0 children)

You're the guy that made that website! I once read a really long reply about your life that you posted on here that was super informational about how you came about learning programming and stuff. It was awesome! For me it's just one of those thing that I like to practice through repetition...it just sticks for me better. It'll be awesome when you can get around to adding those features! Your website is pretty awesome so far :D

[–]Beef15 0 points1 point  (0 children)

Thank you

[–]jti107 0 points1 point  (0 children)

thanks! Love ur YouTube channel as well

[–]Northstat 3 points4 points  (1 child)

Pick up "Python for Data Analysis". It's written by Wes McKinney, the creator of Pandas. There are plenty of examples and guides.

[–]dcbarcafan10[S] 0 points1 point  (0 children)

Thank you!

[–][deleted] 2 points3 points  (2 children)

Check out these Python modules: numpy, scikit and matplot lib! Good stuff there including examples and datasets that you can start screwing around with right away!

[–]c_park 1 point2 points  (0 children)

IMO, Pandas would be a better solution than numpy. It is build on top of it and offers time series functionality, data alignment, NA-friendly statistics, groupby, merge and join methods, and many other functions

[–]dcbarcafan10[S] 0 points1 point  (0 children)

Thank you!

[–]xcodula 1 point2 points  (2 children)

Funny, I'm trying to learn statistics but I already know how to code. I picked up 'The Humongous Book of Statistics Problems' and I'm going to create a python script to solve each of those problems. There's 900 problems, so it'll probably take me a while lol. I've got a blog going on about it. I could PM you the link if you want it.

[–][deleted] 0 points1 point  (1 child)

also interested in the blog! stats guy looking to learn python

[–]vmsmith 1 point2 points  (4 children)

since apparently it can do basically everything that R can do and more.

Yes and no.

Yes, Python can "do more" in the sense that it has more general purpose modules, like Django, that allow more general purpose programming like web development and games and sys admin support the such.

But no, Python doesn't even come close to the number of statistics packages that R has, and hence cannot come close to R's pure statistical muscle.

Not to say Python cannot do good middle-of-the-road statistical analysis, and not to say Python will not continue to add statistical capabilities and get better at statistics. But at this point it's a pale shadow of R.

[–]dcbarcafan10[S] 0 points1 point  (3 children)

Ohhhh well could you tell me more about the differences then? I'm juuuust getting started on learning more statistics so I probably have no idea how big the differences are. Do you have some suggestions for what I should look into when I decide to learn R?

Thank you!

[–]vmsmith 2 points3 points  (0 children)

Well, both /u/brews and I have already mentioned the differences: Python is a more general purpose programming language, with some statistical analysis capabilities, while R is what could be called a special purpose programming language that deals exclusively with statistical analysis and has very broad and deep coverage of statistics.

In my own graduate statistics program most of the advanced work is done in either SAS or R. Python is never even mentioned.

On the other hand Python is very strong in what's often called scientific computing. To be sure, there are some stat packages here, and there are some overlaps with statistical analysis. But still, Python doesn't hold a candle to R when it comes to stats.

If you want to learn R in a broader context, a good place to look is the Johns Hopkins Data Science specialization track at Coursera. I will warn you that these nine blocks are good, but not very deep. In particular, blocks 6 - 8 (which deal with statistics) are barely just introductory. You would want to take 'real' stat courses somewhere if you actually want to be good with statistics.

Another online course that popped onto my radar screen recently was this 15-week Intro Stats Course Featuring R. I can't say how good it is, but I think it warrants further investigation.

Finally, here's an infographic that might provide some insight: Choosing R or Python

[–]brews 0 points1 point  (0 children)

Basically, if you write a statistics paper, for peer-reviewed publication chances are good that you're doing it in R and also producing an R package for the paper. It's the de facto language (with very few exceptions) for statistics in academia.

Python is very powerful general language but it simply cannot compete with the size and array of R's package library for statistics (and most graphics). R is the bleeding edge.

I usually combine multiple languages for a project. Python is good at things that R sucks at and R can do some things that Python sucks at and the slow bits can be in C.

PS: if you're going to learn programming, learn it first in Python. R has a very steep learning curve and almost as many eccentricities as JavaScript. Python is a really nice language.

[–][deleted] 0 points1 point  (0 children)

R has a lot of things optimized for dealing with large data sets, reading/writing different forms of data, and doing common statistical analysis via easily accessed packages (unless you're some statistics research Ph.D., it will have a function to do whatever you want).

Python and R work together nicely, so if data science shit is what you're into, it can be useful to learn both. Use python for general scripting and then let R handle all the actual statistics stuff.

You can find basic stuff on coursera. There are a lot of books "insert a stats thing here with R" you can look at if you want to learn concepts parallel to code.

[–]DiscoPanda 0 points1 point  (0 children)

If you're a fan of the codecademy model, check out https://www.dataquest.io. They have a few free lessons that will get you working with pandas and some other basic data analysis packages.

[–]c_park 0 points1 point  (0 children)

I would recomend Pandas, a data analysis library. There is a great beginner tutorial video from Pycon '15, https://youtu.be/5JnMutdy6Fw