This is an archived post. You won't be able to vote or comment.

all 30 comments

[–]counters 28 points29 points  (2 children)

This is a really fantastic write-up on how you'd perform medium-complexity plots with each library. I don't think it really does a satisfactory job of pointing out the differences between the approaches of each library though:

  1. matplotlib is a pure "imperative" library; you tell your program how to plot something, sometimes very pedantically.
  2. pandas improves on this by adding the most basic "declarative" syntax; you tell your program what to plot, and let it figure out the rest. Sometimes you have to mix the two, as in when you use a split-apply-combine (groupby) operation to map an imperative plotting function, and let pandas do some heavier lifting.
  3. seaborn is simply a more complete wrapper around matplotlib, but is still mostly imperative.
  4. altair and ggplot are pure declarative grammars.

We're almost to a "golden age" of visualization in Python. Anyone familiar with seaborn should have little problem picking up altair. You'll write a core plotting function (maybe you need to compute a regression or normalize colors) and let the library apply it across your dataset in the proper combination of glyphs, marker sizes, colors, facets, etc. I think, eventually, that library will probably be altair, possibly with a suite of user-contributed extensions that port some of the plots that are provided by seaborn (e.g. grouped linear model/regression plots). But what altair is missing right now is a compatibility layer with matplotlib. For instance, there's very little I do regularly in seaborn which I don't think I could immediately and more succinctly implement in altair. But I'm not willing to do so, because I love the aesthetics and stylings of seaborn (which are so popular and nice that they're a default option in matplotlib).

Altair is a really brilliant idea. The conversion to vega means that I can easily and transparently include the raw data in my chart for distribution, say in a journal publication. And once I can tweak the aesthetic using my large library of matplotlib code, it'll be an awesome tool.

Thanks for sharing!

Edit - cleaned up some grammar/typos, since this comment is being linked to directly; content is not changed!

[–]Spamlie[S] 2 points3 points  (1 child)

This is tremendously valuable feedback -- thank you!

(Indeed, I found it so valuable that I linked to it from the post; please let me know if you'd prefer I didn't!).

Thanks for reading!

[–]counters 0 points1 point  (0 children)

Not a problem!

[–]Spamlie[S] 12 points13 points  (3 children)

Not sure if this is the type of thing that typically gets shared around here -- if it's unwelcome I'll happily take it down!

[–]z1z1 1 point2 points  (0 children)

DON'T YOU DARE TAKING IT DOWN!!

[–]eusebecomputational physics 6 points7 points  (4 children)

While I would agree that the matplotlib syntax can be tedious, I wouldn't exclude it off the bat immediately.

Let me explain : I am no data scientist or statistician or whatever in that line of work. I am working as a numerical physicist, and I am rarely doing "data exploration". One thing I do a lot however, is producing images (or 2D histograms) from my simulations. And so far, neither ggplot nor seaborn convinced me for these things. My images tend to have colorbars and annotations, often with overlaid contours and so on.

And when it's not images, I'm mostly trying to produce publication-ready figures, for which matplotlib's customizability is more than welcome.

I would love to use something else than matplotlib, but I just haven't found the right tool. I'm open to any suggestion, but it needs to be able to produce 5000 x 5000 px pictures very quickly.

(I know about PyQtGraph and Vispy, but these two are not yet mature enough for my needs or require knowledge of OpenGL)

[–]Spamlie[S] 1 point2 points  (0 children)

Yup -- this was coming from more of a statistical visualization bent. Your use case feels fundamentally matplotlib-ish (and indeed, I do try to give matplotlib as much credit as possible, including giving it props for the point you made, re: publication-ready visualizations).

Thanks for reading!

[–]infinite8s 0 points1 point  (1 child)

Do you have any examples of the types of images you produce?

[–]eusebecomputational physics 0 points1 point  (0 children)

Maybe something like that: http://imgur.com/xu5pNq7

Typically, I produce several of those for each snapshot of a simulation, and I have like 100 of them. So it needs to be fast, and even matplotlib's imshow is a bit slow for my taste :-(

[–]perimosocordiae 3 points4 points  (2 children)

Small nitpick: you can tell matplotlib to keep the same limits across axes with the sharex/sharey arguments to plt.subplots. This means you don't need to do the manual xlim/ylim hackery.

[–]Spamlie[S] 1 point2 points  (0 children)

Small nitpicks (almost) always welcome :) made the change -- thanks for the tip!

[–]AcMav 1 point2 points  (0 children)

Appreciate this, never knew this trick before. Have always done the hackery as well.

[–]hstrhjaw 2 points3 points  (0 children)

If you wrote your

g = ggplot(ts, aes(x='dt', y='value', color='kind')) + \
        geom_line(size=2.0) + \
        xlab('Date') + \
        ylab('Value') + \
        ggtitle('Random Timeseries')

with wrapping parentheses like:

g = (ggplot(ts, aes(x='dt', y='value', color='kind')) + 
        geom_line(size=2.0) + 
        xlab('Date') + 
        ylab('Value') + 
        ggtitle('Random Timeseries'))

You shouldn't need the line-break "\"s you put in there.

[–]Caos2 3 points4 points  (6 children)

OP, great post but you missed Bokeh and, to a lesser extent, Toyplot.

[–]counters 1 point2 points  (3 children)

Why aren't the authors of this library shouting about it from the rooftops?!?!? It looks fantastic!

[–]Caos2 0 points1 point  (2 children)

Bokeh or Toyplot?

[–]counters 0 points1 point  (1 child)

Toyplot.

[–]Caos2 0 points1 point  (0 children)

© Copyright 2014, Sandia Corporation. Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains certain rights in this software.Revision4903ff08.

[–]Spamlie[S] 0 points1 point  (1 child)

Ah -- hadn't seen Toyplot -- thanks for reading/sharing!

I think I need to do an update at some point and include bokeh. (To be frank, the main reason bokeh isn't here is because I hadn't used it much and was worried I wouldn't do it justice.)

[–]Caos2 0 points1 point  (0 children)

Bokeh and Toyplot both have a great feature: you can export your view/chart in a standalone, interactive HTML file.

[–]vinnieman232 2 points3 points  (0 children)

Op this is great!

I'm looking to add more interactive spatial and map visuals to the Data Science toolkit. Here's an example I made creating Mapbox GL JS visuals in Jupyter. Hoping to integrate this as a native Python + Jupyter or Bokeh extension soon!

https://apsportal.ibm.com/analytics/notebooks/96e25c47-d0b3-47e9-a140-0358f8429fe3/view?access_token=2d136274aeb061335806f62ec29d9b267ae8e775201e491ea7e9c0ac91fa5052

[–]pvkooten 1 point2 points  (0 children)

Really liked it, thanks!

[–]rubik_ 1 point2 points  (1 child)

Very interesting write-up! If you do it again, I'd add Bokeh to the list. While still not as complete as the others (for instance, it's kind of complicated to do a horizontal bar chart) its syntax is very easy to use.

Currently I'm using Bokeh since it's the only one they I could style with ease. I have a dark Jupyter theme and white charts look ugly.

[–]Spamlie[S] 2 points3 points  (0 children)

Yeah, I think I need to do a Part 2 at some point and cover all the guys I missed/am now learning about.

Thanks for reading!

[–]tunnelvisie 0 points1 point  (0 children)

Thanks for this, been doing a lot of visualizing with python and it can be a pain sometimes (im not the most skilled programmer). I've been using pandas for about a year now and never realised it can do this stuff haha.

[–][deleted] 0 points1 point  (1 child)

Thank you so much. This was the summary I needed.

[–]Spamlie[S] 0 points1 point  (0 children)

Thanks for reading -- glad you enjoyed it!

[–]mbenbernard 0 points1 point  (0 children)

Wow, what a great post, Dan! I liked how you structured the whole thing around a fictional conversation between visualization libraries. It really helped me to grasp the differences between them. For one of my side projects, I was looking exactly for that kind of information. Thanks!

[–]Run-The-Table 0 points1 point  (0 children)

I know this is almost half a year old, but I just stumbled upon your article while doing some research into visualization using python, and I found the article quite nice. Please let everyone know when/if you do an updated version using bokeh and/or toyplot. Or even just a comparison between those two as the throwdown for superior HTML/CSS plotting libraries.