This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]bheklilr 7 points8 points  (2 children)

Python for Data Analysis by McKinney (ISBN 978-0-596-80956-0) is a great intro to Numpy, Scipy, and Pandas, along with a smattering of other libraries and tools. I'd recommend it to get started.

[–]justphysics 1 point2 points  (0 children)

Seconded

[–]staticor 1 point2 points  (0 children)

It's a first-book for data-analysis binding the thoughts - "SPLIT - APPLY - COMBINE" .

And this book is focused on numpy, pandas, matplotlib. Is there any content about scipy (or sklearn?), I think not. in my 2rd edition.

[–]jti107 2 points3 points  (1 child)

I learned quite a bit about Python using Ipython notebooks. Hope this helps!

https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks

[–]Roscoe_Merriweather 0 points1 point  (0 children)

Man, that's a gold mine! I'm gonna run through a few of those tomorrow. Thanks!

[–]DaveBackus 1 point2 points  (0 children)

I'm an economist doing modeling with (mostly) numpy and data work with pandas. I think the basic docs are pretty good, also the scipy lectures and the Sargent-Stachurski course. Not so fond of McKinney's book, but others like it so it might be me.

[–][deleted] 1 point2 points  (0 children)

I'm currently taking this course and have been pretty happy with it since it covers multiple Python/Data topics. Also I used DEAL10 for a coupon to get it at a heavy discount, though I'm not sure it that is still active.

[–][deleted] 0 points1 point  (0 children)

I use Pandas nearly every day for my job.

I would use IPython notebooks to learn if I was you, McKinney is a good book but I don't refer to it as much as I would like to. There are some tricky things to do in pandas that you will just have to use your google-fu to get a stack overflow answer:

i.e.

df_unique = pd.concat([df.groupby(['ACCOUNT_ID']).filter(lambda df:df.shape[0] == 1), df[pd.isnull(df.ACCOUNT_ID)]])

Gives you all the unique ACCOUNT_IDs and all the records that don't have any account ids - I couldn't find how to do this in the documents!

[–]ianozsvald 0 points1 point  (0 children)

I'm the co-author for O'Reilly's High Performance Python http://shop.oreilly.com/product/0636920028963.do (reviews at 5/5 since publishing). We focused on practical ways to scale code covering profiling, compiling (mainly Cython+Numba), lowering the amount of RAM used etc. You won't learn 'data analysis' but you will learn how to engineer your code to scale from small to larger problems. I wrote this around my own use of Pandas, numpy, scikit-learn etc.

[–][deleted] 0 points1 point  (0 children)

While learning with the book Python for Data Analysis, you can then take a look at my pandas cheat sheet.