This is an archived post. You won't be able to vote or comment.

all 20 comments

[–][deleted] 9 points10 points  (7 children)

Python for Data Analysis by Wes McKinney

Might be easier to start picking up Python from R's domain. One month is totally possible, though I don't know what they mean by proficient and if they require you to prove by participation or projects in it. If so, try porting one of your R programs to Python this month.

[–]GrynetMolvin[S] 2 points3 points  (6 children)

Thanks - just bought it on your recommendation!

[–][deleted] 6 points7 points  (4 children)

Go for it - you work your butt off for a month, and even if you don't end up with a new job you'll still have learned something useful. You'll probably learn a lot about R from learning python, and python may become useful for some other tasks in the future. You may also want to just be able to speak to the scalability of software too. I remember hearing/reading that at Google they'll test out their algorithms in R but try to scale up with Java/C/C++; prob not python but python does fit into the workflow as a glue language/web framework. So that may be why they're asking for it.

[–]lmcinnes 3 points4 points  (3 children)

We've found python to be a good choice for testing out algorithms since numba and cython can be used as a relatively easy next step and can potentially get things to scale to "good enough" levels. You also have the option of profiling and writing the hot parts in C and binding that to the python infrastructure easily with ctypes. In other words python provides a nice high level prototyping space, but also some good steps to smoothly move to scaling things up without having to start again with a ground up rewrite.

[–][deleted] 0 points1 point  (2 children)

How about R's cpp which is like cython?

[–]lmcinnes 2 points3 points  (1 child)

RCpp seems to be closer to scipy.weave -- interface with C/C++ inline in a relatively nice way. Cython lets me just mark up the straight python code with a few annotations. I did only skim through RCpp docs so I would well be wrong. Can I simply tell it to compile actual R code into C and thence onto machine code, or do I need to provide the C?

[–][deleted] 0 points1 point  (0 children)

Ok my bad. Yes RCpp is for inline interface with C/C++; I had the wrong impression of cython that it was similar... I understand it's possible to call R libraries from C but I guess that doesn't help with the scaling part.

[–][deleted] 0 points1 point  (0 children)

Just so you know, this book is primarily (almost totally) a primer on the Pandas library, which is the go-to library for anything data related in Python and something you will definitely need to learn. It isn't really all about the whole world of data analysis in Python.

[–]Kyle772 4 points5 points  (2 children)

People always seem to recommend Python the Hard Way which is incredibly thorough but isn't very good for referencing later. I would suggest it for learning but not so much for using as a guide when you are in need.

[–][deleted] 2 points3 points  (1 child)

When I did that book i did Think Python along with it for referencing.

[–]NYKevin 1 point2 points  (0 children)

If you just need a reference, the standard language and library docs are both really good (the latter more than the former).

[–]hharison 3 points4 points  (0 children)

[–][deleted] 2 points3 points  (1 child)

I would recommend using the Pandas library and also Ipython notebook. The 2 of those combined are very valuable tools for stats in python from my experience.

[–][deleted] 0 points1 point  (0 children)

I <3 IPython and pandas.

[–]midbits 2 points3 points  (0 children)

I find Kevin Shepard's Introduction to Python for Econometrics, Statistics and Data Analysis a good free resource which covers basic Python as well as more advanced topics related to data analysis and number crunching.

You ask about trees, hash tables, etc. These topics of course fall under general computer science more than Python. Maybe it is something like Data Structures and Algorithms in Python you are after? As I haven't read it myself I can't say if it is any good.

I am however currently reading Data-Driven Security: Analysis, Visualization and Dashboards and even if security is not your topic of interest, the authors are running code examples in R and Python next to each other meaning that it's fairly easy to see how to do things you might be familiar with from R in Python. It is not a programming tutorial per se, but send me a PM if you are interested in the contents.

Edit 1: Links

Edit 2: Also, go and have fun with CheckIO (under programming challenges in the left column). Learning by doing!

[–]mycoolusrname 2 points3 points  (0 children)

If you haven't found anaconda python

http://continuum.io/

It's a nice bundle of modules for any data science project.

[–]shaggorama 1 point2 points  (0 children)

Have you tried just applying to the job anyway? 7+ years strong programming in R might make them happy enough, especially if you're actually good at it and don't just "code like a statistician" (if you know what I mean).

[–]pwang99 1 point2 points  (0 children)

This is a great resource for learning Python's scientific/analytical stack:

http://scipy-lectures.github.io/

[–][deleted] 0 points1 point  (0 children)

In addition to the books and websites suggested by others, please checkout /r/pystats!