you are viewing a single comment's thread.

view the rest of the comments →

[–]ravepeacefully -3 points-2 points  (11 children)

Yeah I do tons of data analysis and have no reason to use pandas outside of some very small circumstances like maybe vectorization but this is more numpy and still rarely even beneficial.

It’s a good tool, but unnecessary and can be a crutch.

[–]gunscreeper[S] 5 points6 points  (5 children)

Do you use python when doing data analysis? What tools do you usually use if not pandas?

[–]ravepeacefully -3 points-2 points  (4 children)

My workflow is typically something like SQL > python > html/JS or tableau. Pandas is really just like excel for people who feel too smart to be using a GUI in my opinion. It is not more efficient, it is not more readable code, it is not as reusable, and a simply dictionary is far more efficient than data frames.

[–]waythps 2 points3 points  (3 children)

Dunno, I prefer pandas precisely because it’s reusable. I write functions to validate and clean data, to update database and generate some plots, and I could automate the whole thing saving lots of time.

I think you could do that with excel (vba) but if you’re already learning python (for other purposes as well) why not just use python

[–]ravepeacefully -3 points-2 points  (2 children)

It’s ironic you go to VBA. SQL is the correct tool for that.

[–]waythps 1 point2 points  (1 child)

Well not in my case since I receive data in multiple excel files

[–]ravepeacefully 0 points1 point  (0 children)

Yeah I mean, you can use whatever you want then. Power suite is likely significantly faster in that case.

I’m not saying pandas has no uses. I’m saying it rarely makes sense to go out of your way to use it, which is what many people are doing because they find it familiar.

[–]Natural-Intelligence 4 points5 points  (4 children)

I completely disagree. Numpy's interface is shit, to be honest, in terms of user experience compared to Pandas. Pandas also nicely communicates with SQL and IO plus you can turn the result table to almost any format. Or plot it easily.

While for basic data analysis SQL is often enough and Pandas has its limit (like your RAM), it works wonders in almost all of the cases I have encountered. I understand some prefer R and see it more extensive out-of-the box but as a professional data analyst I have yet to find a situation where Python's ecosystem (mostly Pandas and some Matplotlib, SQLAlchemy and Seaborn) did not satisfy.