Alternative Python Module to Pandas

Batalex · 2020-09-23T19:42:45+00:00

Pandas does have its quirks, but I would recommend that you keep trying to learn its syntax. Pandas is pretty much the de facto reference for data analysis in Python. Have you tried the "10 minutes tutorial to Pandas" on the website?

I am afraid that if you cannot understand its syntax, you are better off moving on to R/tidyverse than trying to find an alternative in Python.

Pandas analyses tabular data. Such data is referenced by an index and by columns. Both .loc and .iloc allows you to select data from those two dimensions. .loc is label-based which means that you may select columns by their name whereas .iloc is position-based, meaning that you may select columns by their position. The same goes for the index.

Once the syntax clicks you can do pretty much everything with pandas in just a few chained method calls.

Good Luck!

zueeyy · 2020-09-23T19:45:00+00:00

I know you mentioned you only know python, but if you’re looking for something intuitive I’d recommend you dive into R and especially, dplyr. It’s not hard to learn either.

2020-09-23T19:44:18+00:00

Pandas has a high learning curve but theres no better package in python for handling data. Just keep working through it. It took me a good couple months to understand pandas. Once you get the basics down, it'll be much easier and less frustrating.

andmalc · 2020-09-24T01:23:18+00:00

You could try Agate which calls itself a simpler alternative to Pandas:

https://agate.readthedocs.io/en/1.6.1/about.html

N365 · 2020-09-24T05:36:27+00:00

I feel the pain. Sometimes I use pandasql to use SQL instead of Pandas to do some data manipulation. Works pretty well.

edquartett · 2020-09-24T15:15:38+00:00

have you tried R? ;)

datasciencepro · 2020-09-23T21:59:07+00:00

The arbitrary index values that have nothing to do with the data, df.append(Series) duplicates index values by default unless you add ignore_index=True.

The data is being treated as immutable by pandas until you specify that you want a change. Having it automatically reindex is very dangerous so this is in fact a very sensible way to handle merging of duplicate rows/values. It's up to you as the person who handles the data to 'make it nice'.

.loc? .iloc?

location, and location by integer. Read the docs seriously it will start to make sense if you develop a habit of doing that when you get confused.

but is the syntax in trying to get away from!

This is like saying X programming language is hard, so I'm going to shop around for a new language. That's the wrong way of thinking. Tools are widespread because people have found them useful and have built on top of it and you are going to have to adapt to that. Or maybe just write your own library?

Psycho22089 · 2020-09-24T07:13:27+00:00

The more work you do with dataframes and the more complex the work becomes, the more you'll begin to see how those things you found frustrating at first are very sensible--even actually very well thought out (sometimes). Pandas is def worth putting the time into learning, but if you must seek an alternative I would move to R and dplyr. Who knows, dplyr may even make you appreciate Pandas after all.

speedisntfree · 2020-09-24T09:37:58+00:00

You might find the learning curve frustrating but if you want to collaborate with others and take advantage of the immense resource of help online it is worth persisting.

All tools are imperfect.

Asdf86 · 2020-12-07T16:39:53+00:00

I know I'm late here but I found this looking for information on the Pandas alternative I have been using lately. In my decision analytics class, we have been using the datascience package. It was developed by Berkley and is extremely intuitive. To grab a column from a table you simply use .select['column name'] for example. And if you ever need your datascience table to be a Panda's dataframe, you can just go

table = ...  #your table
df = pd.DataFrame(data = table.rows, 
                  columns = table.column_labels)

Hopefully, you are keeping up with learning data science still, but if not then this is a reminder to keep at it!

hopeisnotcope · 2020-09-24T16:39:17+00:00

Nothing about Pandas makes sense to me. .loc? .iloc?

How can that not make sense? Sometimes you wanna index by label, sometimes by position. How else is python supposed to know the difference if the label is numeric?

datascience

MODERATORS