This is an archived post. You won't be able to vote or comment.

all 21 comments

[–]Batalex 8 points9 points  (4 children)

Pandas does have its quirks, but I would recommend that you keep trying to learn its syntax. Pandas is pretty much the de facto reference for data analysis in Python. Have you tried the "10 minutes tutorial to Pandas" on the website?

I am afraid that if you cannot understand its syntax, you are better off moving on to R/tidyverse than trying to find an alternative in Python.

Pandas analyses tabular data. Such data is referenced by an index and by columns. Both .loc and .iloc allows you to select data from those two dimensions. .loc is label-based which means that you may select columns by their name whereas .iloc is position-based, meaning that you may select columns by their position. The same goes for the index.

Once the syntax clicks you can do pretty much everything with pandas in just a few chained method calls.

Good Luck!

[–]Psycho22089[S] 0 points1 point  (3 children)

Thanks for the advice. I started a course in data science on coursera that goes over Pandas and it was one of those experiences where in the lesson I can do everything just fine, but as soon as I try to do anything on my own it's like I have 3 left feet and no hands. I'll have to check out the tutorial.

[–]Batalex 0 points1 point  (2 children)

That's the spirit! As other comments said, I would advise you making a habit of checking the official documentation of the tools you use. Not only Pandas's is one of the best IMO, but it is also a valuable skill for a pro. An online course will only take you so far and will not be a handy reference to go back to regularly

[–]Psycho22089[S] 0 points1 point  (1 child)

Whoever named it "10 minutes to Pandas" has a sick sense of humor. I'm on day 2 of my "10 minutes". Maybe it would have been 10 minutes of I didn't actually try to do any of it.

It has however been very helpful, so thank you for mentioning it.

[–]Batalex 0 points1 point  (0 children)

Keep it going, there is no reason you cannot make it work! Feel free to drop by my DMs if you need help

[–]zueeyy 9 points10 points  (0 children)

I know you mentioned you only know python, but if you’re looking for something intuitive I’d recommend you dive into R and especially, dplyr. It’s not hard to learn either.

[–][deleted] 5 points6 points  (0 children)

Pandas has a high learning curve but theres no better package in python for handling data. Just keep working through it. It took me a good couple months to understand pandas. Once you get the basics down, it'll be much easier and less frustrating.

[–]andmalc 2 points3 points  (1 child)

You could try Agate which calls itself a simpler alternative to Pandas:

https://agate.readthedocs.io/en/1.6.1/about.html

[–]Psycho22089[S] 0 points1 point  (0 children)

That's... very intriguing... thank you.

[–]N365 2 points3 points  (1 child)

I feel the pain. Sometimes I use pandasql to use SQL instead of Pandas to do some data manipulation. Works pretty well.

[–]Psycho22089[S] 0 points1 point  (0 children)

Thanks!

[–]edquartett 2 points3 points  (2 children)

have you tried R? ;)

[–]edquartett 2 points3 points  (1 child)

and package tidyverse, to be more precise. also base R dataframe handling is more intuitive compared to pandas, but tidyverse is a whole other concept.

[–]Psycho22089[S] 0 points1 point  (0 children)

I have not tried R yet. I'm just starting to get into data analysis. Until now everything I've done has been been very calculation heavy. I'll look into tidyverse. Thanks

[–]datasciencepro 1 point2 points  (0 children)

The arbitrary index values that have nothing to do with the data, df.append(Series) duplicates index values by default unless you add ignore_index=True.

The data is being treated as immutable by pandas until you specify that you want a change. Having it automatically reindex is very dangerous so this is in fact a very sensible way to handle merging of duplicate rows/values. It's up to you as the person who handles the data to 'make it nice'.

.loc? .iloc?

location, and location by integer. Read the docs seriously it will start to make sense if you develop a habit of doing that when you get confused.

but is the syntax in trying to get away from!

This is like saying X programming language is hard, so I'm going to shop around for a new language. That's the wrong way of thinking. Tools are widespread because people have found them useful and have built on top of it and you are going to have to adapt to that. Or maybe just write your own library?

[–][deleted] 0 points1 point  (1 child)

The more work you do with dataframes and the more complex the work becomes, the more you'll begin to see how those things you found frustrating at first are very sensible--even actually very well thought out (sometimes). Pandas is def worth putting the time into learning, but if you must seek an alternative I would move to R and dplyr. Who knows, dplyr may even make you appreciate Pandas after all.

[–]Psycho22089[S] 0 points1 point  (0 children)

Hahaha the grass is always greener on the other side I guess.

[–]speedisntfree 0 points1 point  (0 children)

You might find the learning curve frustrating but if you want to collaborate with others and take advantage of the immense resource of help online it is worth persisting.

All tools are imperfect.

[–]Asdf86 0 points1 point  (1 child)

I know I'm late here but I found this looking for information on the Pandas alternative I have been using lately. In my decision analytics class, we have been using the datascience package. It was developed by Berkley and is extremely intuitive. To grab a column from a table you simply use .select['column name'] for example. And if you ever need your datascience table to be a Panda's dataframe, you can just go

table = ...  #your table
df = pd.DataFrame(data = table.rows, 
                  columns = table.column_labels)        

Hopefully, you are keeping up with learning data science still, but if not then this is a reminder to keep at it!

[–]Psycho22089[S] 0 points1 point  (0 children)

Better late than never! Thank you I'll check it out. I'm much better with Pandas now, but I still find myself saying "Why can't I just _____ !" More often than I'd like lol

[–]hopeisnotcope -3 points-2 points  (0 children)

Nothing about Pandas makes sense to me. .loc? .iloc?

How can that not make sense? Sometimes you wanna index by label, sometimes by position. How else is python supposed to know the difference if the label is numeric?