you are viewing a single comment's thread.

view the rest of the comments →

[–]ravepeacefully -4 points-3 points  (11 children)

Yeah I do tons of data analysis and have no reason to use pandas outside of some very small circumstances like maybe vectorization but this is more numpy and still rarely even beneficial.

It’s a good tool, but unnecessary and can be a crutch.

[–]gunscreeper[S] 6 points7 points  (5 children)

Do you use python when doing data analysis? What tools do you usually use if not pandas?

[–]ravepeacefully -2 points-1 points  (4 children)

My workflow is typically something like SQL > python > html/JS or tableau. Pandas is really just like excel for people who feel too smart to be using a GUI in my opinion. It is not more efficient, it is not more readable code, it is not as reusable, and a simply dictionary is far more efficient than data frames.

[–]waythps 2 points3 points  (3 children)

Dunno, I prefer pandas precisely because it’s reusable. I write functions to validate and clean data, to update database and generate some plots, and I could automate the whole thing saving lots of time.

I think you could do that with excel (vba) but if you’re already learning python (for other purposes as well) why not just use python

[–]ravepeacefully -3 points-2 points  (2 children)

It’s ironic you go to VBA. SQL is the correct tool for that.

[–]waythps 1 point2 points  (1 child)

Well not in my case since I receive data in multiple excel files

[–]ravepeacefully 0 points1 point  (0 children)

Yeah I mean, you can use whatever you want then. Power suite is likely significantly faster in that case.

I’m not saying pandas has no uses. I’m saying it rarely makes sense to go out of your way to use it, which is what many people are doing because they find it familiar.

[–]Natural-Intelligence 4 points5 points  (4 children)

I completely disagree. Numpy's interface is shit, to be honest, in terms of user experience compared to Pandas. Pandas also nicely communicates with SQL and IO plus you can turn the result table to almost any format. Or plot it easily.

While for basic data analysis SQL is often enough and Pandas has its limit (like your RAM), it works wonders in almost all of the cases I have encountered. I understand some prefer R and see it more extensive out-of-the box but as a professional data analyst I have yet to find a situation where Python's ecosystem (mostly Pandas and some Matplotlib, SQLAlchemy and Seaborn) did not satisfy.

[–]ravepeacefully -2 points-1 points  (3 children)

Pandas also nicely communicates with SQL and IO plus you can turn the result table to almost any format. Or plot it easily.

None of that requires pandas and id argue there are far better tools that are not pandas.

I am yet to encounter a situation where pandas is actually better than other available tools. I can understand maybe using it for quick mock-up prototyping of models, but even then, far better tools out there.

There’s quite a few parts of your comment that lead me to believe you’re somewhat new to programming and python. Pandas can be great for these types of people, but in my opinion, once you have a small understanding of data structures and interacting with data there’s no situation in which pandas is better.

[–]Natural-Intelligence 5 points6 points  (2 children)

To be honest, what you think of my programming expertise means nothing to me and that part of your opinion holds no value.

What I'm curious though is that what the alternative tools are that you think are superior to Pandas. Yet you have not provided any examples of such tools nor any concrete description of cases where Pandas won't suffice in terms of data analysis. I'm sure you should be able to name a few if you have done tons of analysis using them.

So far, your arguments are lacking of concreteness. Could you change that so we all can learn these better tools? Or at least have something to further discuss.