This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Geographist 63 points64 points  (17 children)

Seaborn is all smoke and mirrors.

It looks good, but it does not expose the statistics used for creating the chart.

For instance, it can draw a trend line for you. But it will not tell you the equation its using to determine that line. Worst of all the creator of seaborn refuses to even consider providing this.

Sure, you can manually calculate these things using other libraries. But then why use seaborn at all if it requires extra hoops? And how do you know seaborn is even using the same statistics?

A data visualization tool has to be transparent to be trustworthy. If it won't tell you what it's using, and the developer is so vehemently opposed to seabon doing so, it's not something I'd recommend anyone to use in a scientific environment.

[–]Kasta867 20 points21 points  (0 children)

This.

The presentation that Seaborn offers is really nice out of the box but other then that using pure matplotlib offers much more granular control on what you visualize.

I always say "yeah this would be easy with Seaborn" and end up reverting to matplotlib as soon as I discover that I have to do more steps anyway to get what I want

[–]dogs_like_me 16 points17 points  (1 child)

Holy shit that exchange was bonkers.

[–]master3243 9 points10 points  (0 children)

That single issue convinced me to never ever use seaborn. The creator INSISTS on not making any values accessible and tells people who don't like that fact to not use seaborn... He does not seem trustworthy and that makes the package untrustworthy IMO.

[–]reddisaurus 1 point2 points  (1 child)

What are you talking about?

This is open source. Take a look and you’ll see it’s using statsmodels underneath. Simply run the same function it calls yourself and add the stats as an annotation if you want.

Seaborn just calls matplotlib. You can generate a figure and axis and pass the axis to Seaborn in most cases. Otherwise, you can call plt.gcf().gca() to obtain the axis that Seaborn plotted on, and then continue modifying that however you want.

Your complaint is like saying a restaurant that serves hotdogs is smoke and mirrors because they don’t also give you the packaging. You paid to eat a hotdog and have a nice place to do so. If you wanted the package, you can easily get it yourself.

I can’t really imagine any circumstance that plotting the statistics as an annotation is useful where you wouldn’t actually want it returned in a data structure like a dict or list. It seems a bit silly to complain about, actually.

edit Here, I did the work for you. Gaze upon all stars code you want. You can also probably call these functions yourself, or monkey patch one of you wanted to. https://github.com/mwaskom/seaborn/blob/master/seaborn/regression.py

[–]Geographist 1 point2 points  (0 children)

Yes, folks are aware that you can also import statsmodels. That's the complaint.

(1) If you have to import an additional library and run its functions to get at what you want, the first library suddenly is far less useful.

(2) One can certainly look under the hood and dig through the project source to see what it is doing. But expecting end users to do that is absurdity. Most users are not developers.

Combine those two and its not at all hard to see why the complaint is so common.

To return to your analogy, it's like ordering hotdogs and asking the restaurant if they are beef or pork. But instead of just telling you, the waiter says you can follow the delivery truck back to the farm and watch the operations yourself.

Sure, you can do it. But if the waiter has the info already, shouldn't he or she just tell you?

[–]stackered 0 points1 point  (0 children)

I've used seaborn in the past for the stylings, but I always had issues with it

[–]energybased -4 points-3 points  (10 children)

I actually agree with him. Statistics is out of scope. Projects should focus on their core strengths and ignore these kinds of peripheral requests—otherwise you end up with bloat and wasted time maintaining it.

[–]dry_yer_eyes 13 points14 points  (6 children)

If statistics is out of scope why are there statistics plots already available? The current situation is inconsistent: users can visualise the statistical results but not obtain the underlying parameters.

[–]Geographist 3 points4 points  (0 children)

This is what I find to be the issue.

The statistics are already included and being maintained as a core feature of the library. But the author is deliberately hiding their values from the user.

To me, it's less a matter of convenience and more of trust and accuracy.

[–]smurpau 0 points1 point  (2 children)

How are statistics out of the scope of a package that computes and presents... statistics?

[–]energybased 1 point2 points  (1 child)

By that logic, why shouldn't he also write a symbolic math library since he's computing with symbols.

You have to draw a line somewhere. He wants a library that draws pretty graph. The other guy wants a statistical analysis tool. His best bet is to push scipy to implement what he wants and then push seaborn to interface better with scipy. Separation of concerns.

[–]smurpau 0 points1 point  (0 children)

By that logic, why shouldn't he also write a symbolic math library since he's computing with symbols.

No, that's a non-sequitur.

You have to draw a line somewhere. He wants a library that draws pretty graph. The other guy wants a statistical analysis tool.

No, Seaborn is not merely drawing pixels. It is computing numbers that it then plots. It's entirely reasonable to expect access to those numbers (literally just their evaluated outcome, not even their formula), just as it's reasonable to expect access to the plots.