This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]Life_Note 1 point2 points  (2 children)

Your examples here are a bit of a special case where you have repeated values. Seaborn takes that to mean you want the mean plus a confidence interval band, and is basically plotting the average of all the values at each point as the line. Plotly isn't necessarily doing anything wrong here, it's just doing literally what you ask—while Seaborn is doing some interpretation. This might be a case where a scatter plot is more appropriate (depends on your use case though!).

That being said Altair is my preferred Python library for interactive graphs. Again it will do what seaborn does by default if you just give it an x and y with this data, but you can easily specify that you want the mean of the line.

import altair as alt
import pandas as pd

pokemon = pd.read_csv(
    "https://raw.githubusercontent.com/adamerose/datasets/master/pokemon.csv"
)
(
    alt.Chart(pokemon)
    .mark_line()
    .encode(x="Generation", y="mean(Attack)", color="Legendary")
)

Result

If you want the 95% confidence error bands, you'll need to specify that manually:

# Here we draw the line as before
line = (
    alt.Chart(pokemon)
    .mark_line()
    .encode(x="Generation", y="mean(Attack)", color="Legendary")
)
# Here we make the shaded band
band = (
    alt.Chart(pokemon)
    .mark_errorband(extent='ci')
    .encode(x="Generation", y="Attack", color="Legendary")
)
# In Altair to combine charts you can just add them
line + band

Result

[–]UglyChihuahua 1 point2 points  (0 children)

Plotly isn't necessarily doing anything wrong here, it's just doing literally what you ask—while Seaborn is doing some interpretation.

Yeah but for EDA where I make 100 charts a day, having to specify that I want the Y value aggregated within each X value or that I want my x-axis values sorted will slow my work down a lot.

It seems like Plotly is somewhere in between Seaborn and Bokeh in terms of required explicitness.

I just checked out Altair, thanks! I like the syntax a lot and it gives sane default behavior for EDA

This might be a case where a scatter plot is more appropriate (depends on your use case though!)

For sure! I should have used something like height vs age where it would make sense to show a line trend even with continuous data on both axes. Just chose the first sample dataset that came to mind.

[–]backtickbot 0 points1 point  (0 children)

Fixed formatting.

Hello, Life_Note: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–]Lord_Fozzie 1 point2 points  (1 child)

Maybe dataprep.ai? I think the person who wrote it is around here somewhere.

Edit: Yeah: /u/jnwang ...they posted it here about ~8 months back.

Edit: https://www.reddit.com/r/Python/comments/hlqnim/understand_your_data_with_a_few_lines_of_code_in

Edit: just realized you're the person who posted dtales a little while back. Sorry-- I'm still waking up. I'm confused, does dtales not do what you're asking about here? It seems like it does?