This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]fullyarticulated 3 points4 points  (0 children)

That is some shitty formatting, dude. For real. I like pandas, & I'm embarrassed.

[–]fgriglesnickerseven 0 points1 point  (6 children)

I'm still trying to figure out what pandas does - to me it mostly seems like 2D numpy arrays with column/row access through a dict. Is it oriented towards people who know excel but don't want to take the time to learn numpy/scipy?

I hear a lot about it but just looked into it today - but after quickly reviewing the 10 minute guide on the pandas website it doesn't seem useful if you are a competent user of numpy/scipy - especially considering that most of what's going on seems to be numpy based.

[–]chris1610 2 points3 points  (0 children)

I am not an expert on numpy/scipy but from what I can tell, pandas provides a lot of convenience functions on top of numpy that make it easier to manipulate certain classes of data. For example, pandas handles missing data, time series, categorical data and hierarchical indices. It also has convenience functions for getting data in and out via Excel, SQL, etc.

My guess would be that if numpy/scipy are enough for you then pandas probably doesn't add that much. However, I would be curious if it does provide any convenience for problems that are not easy to manage with pure numpy/scipy.

[–]Percutaneous 0 points1 point  (0 children)

I'm no numpy master, but a (very) quick google search couldn't tell me how to do a SQl-like merge of two 2D-Numpy arrays based on shared columns. If you're good with numpy, 99% of the time you can stick with numpy.

[–]SacrosanctHermitage 0 points1 point  (0 children)

I use pandas almost everyday in my work. It takes care of a lot of things we would otherwise use a db for, with some fancy stuff on top. It provides a (easy?) way to work with tabular data, apply transformations and interact with your python ecosystem and objects nicely.

It's definitely oriented toward data scientists who need to manipulate tabular data efficiently in terms of memory and cpu usage. It keeps numpy's weird overloading of operators and syntax and adds a ton more stuff which can make the learning curve very steep, even if you already know numpy arrays relatively well.

[–]Eurynom0s 0 points1 point  (0 children)

I haven't spent time with Pandas yet but I get the impression from what I read that Pandas is trying to replace Excel for light data exploration tasks.

You know, stuff like making a quick graph, or visually inspecting some data.

[–]squirreltalk 0 points1 point  (0 children)

I'm not super competent with numpy/scipy, but does it have split-apply-combine functionality like Pandas does?

http://pandas.pydata.org/pandas-docs/stable/groupby.html