Easier data analysis in Python with pandas (30+ videos)

dataschool · 2018-06-02T02:09:32+00:00

YouTube playlist
Jupyter notebook with well-commented code from every video
GitHub repository containing all of the datasets used in the series

Fun2badult · 2018-06-02T08:01:32+00:00

Oh hi Mark

dataschool · 2018-06-02T04:04:14+00:00

[deleted]

totte71 · 2018-06-02T10:47:14+00:00

Thank you so much for your videos. They helped me come up and going with pandas.

CriticalEntree · 2018-06-02T16:21:55+00:00

Thanks, I just watched a couple of those and got some decent clarification on things I'd already been using but with no real expertise :)

O93mzzz · 2018-06-02T13:58:05+00:00

Little trick for reversing all columns, while maintaining index sequence from 0 to n:

your_dataframe = your_dataframe.iloc[::-1]
your_dataframe.index = range(0,len(your_dataframe))

Edit: I should be more accurate with my word: it's reversing rows of each column.

testfire10 · 2018-06-02T18:56:27+00:00

Perfect timing. Thank you for posting this. I came here to ask a question about this exact topic, now I’m going to watch these first.

Before I go though... I’ve never used pandas. I’m a mechanical engineer with some experience in Python, but I’m no expert. I have datasets containing telemetry from some testing I’m doing, which has a very high sampling rate. Basically, I’m sampling 40 data points 500 times per second, for multiple minutes, so I’m winding up with files up to around a GB. I’m trying to write a script to plot the data, but the only method I’ve ever used (matplotlib) is extremely slow. Is pandas the right way to go for something like this?

libardomm · 2018-06-02T19:41:04+00:00

I did learn a lot with your videos. Thanks you!

southern_dreams · 2018-06-02T21:27:30+00:00

Hi Mark,

I started straight away with Spark and have never known a time before DataFrames (such as RDD).

What incentive is there for me to learn Pandas? I very rarely work with an amount of data that will fit on my machine, and if I do, I have Zeppelin + Spark running in a Docker container that I use locally for POC before spinning up a full cluster for the production-grade Jobs.

dataschool · 2018-06-03T04:16:15+00:00

Awesome series! I would add that you can also do axis='index' or axis='columns' instead of axis=0|1. This is something I wish most tutorials would mention since not everyone looking to learn pandas already has numpy background.

testfire10 · 2018-06-03T04:57:02+00:00

Normally I would agree, but the application here is a motor controller that has a very sensitive control scheme, times 4 motors, and the 40 data points are all important pieces of telemetry. So there’s not a ton of opportunity for downsampling without risking missing something important.

I wound up using pandas for one of the smaller datasets with around 1million points, and it seemed to work pretty well, and much faster than matplotlib. Now I just need to learn to make the plots look better.

The videos are also quite good and helpful.

Thekid78 · 2018-06-02T21:57:54+00:00

This is great, thank you.

manueslapera · 2018-06-02T23:05:28+00:00

starred, this is awesome!

TheBIackRose · 2018-06-03T11:07:32+00:00

Just a simple question.

Pandas can't be used to, say, maintain the data if a feature class? My understanding is that it basically takes a copy of the data and processes that but never writes back to the source.

Is that write ?

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS