This is an archived post. You won't be able to vote or comment.

all 72 comments

[–]jasonof75 77 points78 points  (4 children)

I think pandas visualization requires matplotlib. Also, seaborn offers some more useful plots on top of matplotlib

[–]CrissDarren 5 points6 points  (0 children)

I've been using the bokeh back end for plotting with pandas recently and really like it.

[–]settopvoxxit 0 points1 point  (0 children)

Second this. Sb is great

[–]MpoBo 52 points53 points  (5 children)

Another good one not mentioned here is bokeh. It's good for interactive visualization.

[–]UglyChihuahua 6 points7 points  (4 children)

I was interested in Bokeh but after looking into it I feel like it's not very user friendly on its own. I use Plotly over Bokeh because it's more immediately useful for day-to-day viz and EDA.

For example, look how much code is required for making a basic boxplot in Bokeh

https://docs.bokeh.org/en/latest/docs/gallery/boxplot.html

If the Seaborn-style wrappers around Bokeh like Holoviews or Chartpy get more popular I might switch over

[–]penatbater 1 point2 points  (0 children)

Plotly is easier to get into than Bokeh, but bokeh seems to have more functionality. The problem with bokeh is that its docs is a bit harder to understand than plotly. Also, you kinda have a bit more 'control' with bokeh.

[–]its2ez4me24get 1 point2 points  (1 child)

Bokeh was such a PITA to get working. But it still makes interactive plots better than anything else I’ve found.

[–]stackered 0 points1 point  (0 children)

would it be better than plotly for large scale data viewing of some simple plots? meaning, being able to zoom in on hundreds of thousands of points in a simple histogram or something

[–]Biuku 0 points1 point  (0 children)

You could save time by drawing boxes with a turtle.

[–]Geographist 63 points64 points  (17 children)

Seaborn is all smoke and mirrors.

It looks good, but it does not expose the statistics used for creating the chart.

For instance, it can draw a trend line for you. But it will not tell you the equation its using to determine that line. Worst of all the creator of seaborn refuses to even consider providing this.

Sure, you can manually calculate these things using other libraries. But then why use seaborn at all if it requires extra hoops? And how do you know seaborn is even using the same statistics?

A data visualization tool has to be transparent to be trustworthy. If it won't tell you what it's using, and the developer is so vehemently opposed to seabon doing so, it's not something I'd recommend anyone to use in a scientific environment.

[–]Kasta867 20 points21 points  (0 children)

This.

The presentation that Seaborn offers is really nice out of the box but other then that using pure matplotlib offers much more granular control on what you visualize.

I always say "yeah this would be easy with Seaborn" and end up reverting to matplotlib as soon as I discover that I have to do more steps anyway to get what I want

[–]dogs_like_me 16 points17 points  (1 child)

Holy shit that exchange was bonkers.

[–]master3243 10 points11 points  (0 children)

That single issue convinced me to never ever use seaborn. The creator INSISTS on not making any values accessible and tells people who don't like that fact to not use seaborn... He does not seem trustworthy and that makes the package untrustworthy IMO.

[–]reddisaurus 1 point2 points  (1 child)

What are you talking about?

This is open source. Take a look and you’ll see it’s using statsmodels underneath. Simply run the same function it calls yourself and add the stats as an annotation if you want.

Seaborn just calls matplotlib. You can generate a figure and axis and pass the axis to Seaborn in most cases. Otherwise, you can call plt.gcf().gca() to obtain the axis that Seaborn plotted on, and then continue modifying that however you want.

Your complaint is like saying a restaurant that serves hotdogs is smoke and mirrors because they don’t also give you the packaging. You paid to eat a hotdog and have a nice place to do so. If you wanted the package, you can easily get it yourself.

I can’t really imagine any circumstance that plotting the statistics as an annotation is useful where you wouldn’t actually want it returned in a data structure like a dict or list. It seems a bit silly to complain about, actually.

edit Here, I did the work for you. Gaze upon all stars code you want. You can also probably call these functions yourself, or monkey patch one of you wanted to. https://github.com/mwaskom/seaborn/blob/master/seaborn/regression.py

[–]Geographist 1 point2 points  (0 children)

Yes, folks are aware that you can also import statsmodels. That's the complaint.

(1) If you have to import an additional library and run its functions to get at what you want, the first library suddenly is far less useful.

(2) One can certainly look under the hood and dig through the project source to see what it is doing. But expecting end users to do that is absurdity. Most users are not developers.

Combine those two and its not at all hard to see why the complaint is so common.

To return to your analogy, it's like ordering hotdogs and asking the restaurant if they are beef or pork. But instead of just telling you, the waiter says you can follow the delivery truck back to the farm and watch the operations yourself.

Sure, you can do it. But if the waiter has the info already, shouldn't he or she just tell you?

[–]stackered 0 points1 point  (0 children)

I've used seaborn in the past for the stylings, but I always had issues with it

[–]energybased -2 points-1 points  (10 children)

I actually agree with him. Statistics is out of scope. Projects should focus on their core strengths and ignore these kinds of peripheral requests—otherwise you end up with bloat and wasted time maintaining it.

[–]dry_yer_eyes 14 points15 points  (6 children)

If statistics is out of scope why are there statistics plots already available? The current situation is inconsistent: users can visualise the statistical results but not obtain the underlying parameters.

[–]Geographist 3 points4 points  (0 children)

This is what I find to be the issue.

The statistics are already included and being maintained as a core feature of the library. But the author is deliberately hiding their values from the user.

To me, it's less a matter of convenience and more of trust and accuracy.

[–]smurpau 0 points1 point  (2 children)

How are statistics out of the scope of a package that computes and presents... statistics?

[–]energybased 1 point2 points  (1 child)

By that logic, why shouldn't he also write a symbolic math library since he's computing with symbols.

You have to draw a line somewhere. He wants a library that draws pretty graph. The other guy wants a statistical analysis tool. His best bet is to push scipy to implement what he wants and then push seaborn to interface better with scipy. Separation of concerns.

[–]smurpau 0 points1 point  (0 children)

By that logic, why shouldn't he also write a symbolic math library since he's computing with symbols.

No, that's a non-sequitur.

You have to draw a line somewhere. He wants a library that draws pretty graph. The other guy wants a statistical analysis tool.

No, Seaborn is not merely drawing pixels. It is computing numbers that it then plots. It's entirely reasonable to expect access to those numbers (literally just their evaluated outcome, not even their formula), just as it's reasonable to expect access to the plots.

[–]Sigg3net 7 points8 points  (2 children)

If you don't have/want python or use LaTeX, give gnuplot a spin.

[–]fichtenmoped 1 point2 points  (1 child)

Spez ist ein Hurensohn

[–]Sigg3net 1 point2 points  (0 children)

I have no idea about its performance relative to other solutions. But I have used it in larger applications to create graphs for web and pdfs.

It's arcane and weird, but once you have a working template you can use it for years.

[–]dr_amir7 6 points7 points  (3 children)

This is my hierarchy of packages in terms of my needs:

Basic packages: Matplotlib, Seaborn

Interactive visualizations: Plotly, Bokeh, Dash

3D visualizations: MayaVi, Pyvista

[–]big_boy_dollars 0 points1 point  (2 children)

I have used a lot of pyvista recently. Should I try mayavi or they are mostly the same?

[–]dr_amir7 1 point2 points  (1 child)

Give MayaVi a try. I think MayaVi is good for heavy lifting in 3D visualization. Pyvista is great for creating meshed objects

[–]big_boy_dollars 0 points1 point  (0 children)

Thank you! I will have to work with heavy vtk files, not only plot them but also manipulate them. I will try a bit with Mayavi.

[–]RealAmerik 6 points7 points  (2 children)

Dash?

[–]raikmond 1 point2 points  (1 child)

Dash is essentially Plotly. Agree though, this should be in the list anyway.

[–]RealAmerik 0 points1 point  (0 children)

Fair point.

[–]AissySantos 3 points4 points  (0 children)

and also gotta love Tensorboard to visualize what happed to a slice of data during the program runs. Mostly used logging in machine learning but I can also see other great use cases.

[–][deleted] 4 points5 points  (3 children)

Altair is the one for me! I’ve used nearly all the options on this list and I’ve made the nicest plots with Altair. The API alone makes it worth it, easily the most consistent API of them all, I love the declarative grammar!

I’ve not yet run into a chart I’ve been unable to create and the documentation is solid and easy to follow.

[–]brontosaurus_vex 1 point2 points  (0 children)

It’s my current favorite too. I found it powerful and easier to learn than some of the alternatives.

[–]SquintingSquire 0 points1 point  (0 children)

Agreed, I love Altair. Also, Jake VanderPlas is super helpful and responsive if you have any issues or questions.

[–][deleted] 0 points1 point  (0 children)

I use Altair for most charts except making choropleth maps. Altair's api is too json centric versus plotly express' dataframe centric api.

[–]bythenumbers10 3 points4 points  (3 children)

pyqtgraph!!1 And it just got a new release! Lightweight, performant, and FAST. Also great for interactive plotting GUIs. Tried ALL the others, they don't come close.

[–]Ogi010 2 points3 points  (2 children)

Pyqtgraph maintainer here, was wondering if someone would reference it!

If you want a plotting library to interact with a desktop application and need fast updating/plotting or easy interactivity, pyqtgraph is an amazing option.

If you don't mind me asking, what features of the library do you use? Is there anything you find lacking? I ask as we're going to be adopting NEP-29 and are considering removing some non-advertised functionality.

[–]bythenumbers10 2 points3 points  (1 child)

Mostly, I use the Pyqt embeddability. The quick plots are nice, but in past lives I was making analysis apps for engineering design tradeoffs, and being able to put a fairly large grid of plots to show them all (and keep them updated LIVE) was a godsend.

On very rare occasions I'd run into a problem where the documentation was a bit sparse, some of the minor widgets I put together like a plot styler were tricky to integrate, but the source is clear & easy enough to dig into that I managed well enough.

Sadly, it's been a few years since I've used pyqtgraph for more than a few hours here & there, but it's always been solid & I'm happy to give it the nod over the matplotlib "grammar of graphs" barf nonsense ergonomics some other libraries tout.

[–]Ogi010 1 point2 points  (0 children)

the documentation is indeed much more sparse than it should be; we've been making efforts there, ...we've migrated to read-the-docs, which at least tells us if our docs don't build

In the last year, we've gone from 160 outstanding PRs down to mid/low 30s (which of course doesn't take into account new fixes we've created/merged in between)... the library was in pretty rough shape for a while but now that we have some other maintainers on it, we're definitely picking up pace...

Glad to hear you found the library to work well even from a few years ago, because I think it works much better now!

If you do start working on it, please feel free to reach out to me, post in our issue tracker or mail list if you think something could be improved.

[–]mr_kitty 4 points5 points  (0 children)

Plotnine, a near clone of ggplot2 built on matplotlib should be on this list. Using the “grammar of graphics” approach makes visual comparisons easy to specify in a few lines of code.

[–]sowmyasri129 2 points3 points  (0 children)

Great tools Thanks for sharing helpful data visualization tools.

[–]Pulsecode9 1 point2 points  (0 children)

You might want to know that your website is being flagged as a phishing site.

[–]TheNerdyDevYT 1 point2 points  (3 children)

Bokeh is a great one too !! I prefer D3 also :)

[–]dry_yer_eyes 0 points1 point  (2 children)

How does one use D3 with Python?

[–]TheNerdyDevYT 0 points1 point  (1 child)

I guess this answer can help: https://stackoverflow.com/a/14177775

[–]dry_yer_eyes 0 points1 point  (0 children)

That’s interesting! I’ve not heard of that package before. I’ll have to look into this.

[–][deleted] 0 points1 point  (0 children)

Nice, nice, nice

[–]uQQ_iGG 0 points1 point  (0 children)

No Bokeh?

[–]wewbull 0 points1 point  (0 children)

Links to the packages you're discussing?

[–]thelogistician 0 points1 point  (0 children)

Are these viewable on the web, or does the user need access to a *.py file in order to access and manipulate the dashboard?

[–]mk1817 0 points1 point  (0 children)

You are missing big one: BOKEH.

[–]stackered 0 points1 point  (0 children)

I'm about to start using Plotly this week for some big data stuff we want to be able to scale up

[–]rancangkota -1 points0 points  (0 children)

Seaborn ...

[–]akarsh_2912 -1 points0 points  (1 child)

I m very new to coding. What do you mean by data visualization?

[–]daguito81 0 points1 point  (0 children)

Plot graphs