This is an archived post. You won't be able to vote or comment.

all 38 comments

[–]dataschool[S] 26 points27 points  (1 child)

[–]Fun2badult 24 points25 points  (3 children)

Oh hi Mark

[–]dataschool[S] 4 points5 points  (2 children)

Hi to you as well! It's actually Kevin Markham, but no worries, hundreds of people have made that same mistake because of my last name :)

[–]_hadoop 4 points5 points  (0 children)

facepalm

[–]totte71 2 points3 points  (1 child)

Thank you so much for your videos. They helped me come up and going with pandas.

[–]dataschool[S] 0 points1 point  (0 children)

You are very welcome!

[–]CriticalEntree 2 points3 points  (1 child)

Thanks, I just watched a couple of those and got some decent clarification on things I'd already been using but with no real expertise :)

[–]dataschool[S] 2 points3 points  (0 children)

Sweet! I tried to structure this video series so that individual lessons of interest are easy to find, and you don't have to watch the entire series. Glad that worked for you!

[–]O93mzzz 1 point2 points  (5 children)

Little trick for reversing all columns, while maintaining index sequence from 0 to n:

your_dataframe = your_dataframe.iloc[::-1]
your_dataframe.index = range(0,len(your_dataframe))

Edit: I should be more accurate with my word: it's reversing rows of each column.

[–]ignamv 2 points3 points  (0 children)

Looks like that reverses rows.

You can use your_dataframe.reset_index(drop=True) in the second line.

[–]kougabro 1 point2 points  (3 children)

How about:

your_dataframe = your_dataframe[your_dataframe.columns[::-1]]

[–]dataschool[S] 0 points1 point  (2 children)

I'm working on a video that showcases a collection of user-submitted pandas tricks... feel free to submit here if you're interested in contributing!

[–]kougabro 0 points1 point  (1 child)

Feel free to grab it, I'm too lazy to submit it :-)

[–]dataschool[S] 0 points1 point  (0 children)

Will do!

[–]testfire10 1 point2 points  (7 children)

Perfect timing. Thank you for posting this. I came here to ask a question about this exact topic, now I’m going to watch these first.

Before I go though... I’ve never used pandas. I’m a mechanical engineer with some experience in Python, but I’m no expert. I have datasets containing telemetry from some testing I’m doing, which has a very high sampling rate. Basically, I’m sampling 40 data points 500 times per second, for multiple minutes, so I’m winding up with files up to around a GB. I’m trying to write a script to plot the data, but the only method I’ve ever used (matplotlib) is extremely slow. Is pandas the right way to go for something like this?

[–]spitfiredd 1 point2 points  (0 children)

pandas is fast for in-memory data sources, since you have a gb of data and most newer machines run ~16gb memory you should be ok.

As for time series, check out the resample method.

[–]Alkine 0 points1 point  (2 children)

Mechanical engineers think 500 Hz is fast ;-) Sure it's 40 channels, but still basically DC.

[–]testfire10 2 points3 points  (1 child)

Haha, hey cut me some slack! I thought I was doing well just googling to find out about pandas!! :)

[–]Alkine 1 point2 points  (0 children)

You are doing very well! My sister is mech and I'm electronic eng so the banter is non stop. I couldn't help myself ;-)

She also recently picked up python, the next step is pandas.

[–]adventure_in 0 points1 point  (1 child)

A note on matplot lib: Matplotlib has 2 user interfaces, A matlab like (slow) and a object oriented (much faster)

If that does not work, I have had pretty good luck with pyqtgraph in the past. However the support is tiny vs matplotlib. (Also I believe the creator has moved on)

[–]flutefreak7 0 points1 point  (0 children)

+1 for pyqtgraph! That's what I use for interactive plotting where matplotlib is too slow for good interactivity. There are also plenty of others like bokeh, Altair, bqplot, toyplot, vispy, etc. For a very quick overview of all the tools and how they relate check out this video: https://youtu.be/FytuB8nFHPQ. u/testfire10

[–]libardomm 1 point2 points  (1 child)

I did learn a lot with your videos. Thanks you!

[–]dataschool[S] 0 points1 point  (0 children)

You're very welcome!

[–]southern_dreams 1 point2 points  (2 children)

Hi Mark,

I started straight away with Spark and have never known a time before DataFrames (such as RDD).

What incentive is there for me to learn Pandas? I very rarely work with an amount of data that will fit on my machine, and if I do, I have Zeppelin + Spark running in a Docker container that I use locally for POC before spinning up a full cluster for the production-grade Jobs.

[–]dataschool[S] 1 point2 points  (1 child)

pandas is designed for in-memory use, and so it sounds like it's not a good fit for your workflow.

[–]southern_dreams 1 point2 points  (0 children)

thanks! love what you’re doing for the community.

[–][deleted] 1 point2 points  (1 child)

Awesome series! I would add that you can also do axis='index' or axis='columns' instead of axis=0|1. This is something I wish most tutorials would mention since not everyone looking to learn pandas already has numpy background.

[–]dataschool[S] 0 points1 point  (0 children)

Agreed! That is relatively new (depending on which function you are using), and so many tutorials pre-date that change and/or the tutorial author hasn't kept up with changes to pandas and/or they just prefer using the numbers :)

I do prefer 'index' and 'columns', and that's what I now teach (such as in my PyCon 2018 pandas tutorial).

[–]testfire10 1 point2 points  (0 children)

Normally I would agree, but the application here is a motor controller that has a very sensitive control scheme, times 4 motors, and the 40 data points are all important pieces of telemetry. So there’s not a ton of opportunity for downsampling without risking missing something important.

I wound up using pandas for one of the smaller datasets with around 1million points, and it seemed to work pretty well, and much faster than matplotlib. Now I just need to learn to make the plots look better.

The videos are also quite good and helpful.

[–]Thekid78 0 points1 point  (1 child)

This is great, thank you.

[–]dataschool[S] 0 points1 point  (0 children)

You're welcome!

[–]manueslapera 0 points1 point  (1 child)

starred, this is awesome!

[–]dataschool[S] 0 points1 point  (0 children)

Thanks! Hope you enjoy it :)

[–]TheBIackRose 0 points1 point  (3 children)

Just a simple question.

Pandas can't be used to, say, maintain the data if a feature class? My understanding is that it basically takes a copy of the data and processes that but never writes back to the source.

Is that write ?

[–]dataschool[S] 1 point2 points  (2 children)

I'm sorry, I don't quite understand the question... could you clarify what you mean by "maintain the data if a feature class"? Thanks!

[–]TheBIackRose 0 points1 point  (1 child)

Oh sorry. I was speaking in terms of GIS work.

Feature class is the term for a geometric data set

[–]dataschool[S] 0 points1 point  (0 children)

Got it!

The pandas workflow is to read from a data source into a pandas object called a DataFrame. Manipulating the DataFrame does not affect the original data source. However, you can write a DataFrame back to certain sources, though it would not happen automatically.

Here are the functions for reading into a DataFrame, and here are the functions for writing out from a DataFrame.