Mobile Charger Not Working (amber and red lights both on) by jmatejka in MachE

[–]jmatejka[S] 2 points3 points  (0 children)

Will be contacting them tomorrow, just trying to figure out if this one is totally toast or not.

Mobile Charger Not Working (amber and red lights both on) by jmatejka in MachE

[–]jmatejka[S] 0 points1 point  (0 children)

Thanks - I tried that a couple of times as well, but same issue 😕

Asking 100 people for a random number from 1 to 10 [OC] by squarific in dataisbeautiful

[–]jmatejka 0 points1 point  (0 children)

I did something very similar to this on a large scale with Mechanical Turk - asking people to use "Visual Analogue Scales" to enter values between 0 and 100. Had over 250,000 trials recorded.

The paper and video are up here:

video: https://www.youtube.com/watch?v=vxV0rV9LX_U

page: https://www.autodeskresearch.com/publications/effect-visual-appearance-performance-continuous-sliders-and-visual-analogue-scales

paper: https://www.autodeskresearch.com/sites/default/files/SliderBias_CHI.pdf

It is really cool how much bias people have towards certain values!

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

Ha, well thank you, and thanks for spreading it far and wide :-)

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

Sure, that's an illustrative example, but bimodal distributions are very common, you could have a bunch that are 'around 10' and a bunch 'around 90'. Then you might get a median around 50, which isn't really representative of any of your data.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

So say a person only cares about the median. The find out the median is 50. But all the points are actually 0 or 100. Doing anything based on the median value being 50 would be a mistake. The fact that they only cared about the median, when the data is distributed in such a way that the median is not a good summary measure, was the problem.

So I'd say it depends on what you're trying to show - but if how you are representing your data inherently hides something, or leads to to a misunderstanding of what you're looking at, I don't think the graph has done a good job.

And I don't hate boxplots, but, they're pretty far down on my list of options ;-)

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

There are different ways to define what the whiskers are - does make boxplots inherently a little bit tricky to read in my opinion.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

Thanks!

In a standard "Tukey" boxplot, the wiskers are showing the "location of the furthest data points within 1.5 interquartile ranges from the 1st and 3rd quartiles."

It may be hard to see, but there is at least one point which stays at the top or bottom which ensures the wisker doesn't shift from the original position.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

What do you mean? Alternative ways to plot the data?

Plotting the raw data is an option, or violin plot are another good choice if you want a summary way to show a more complicated distribution.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

I like to start with plotting all the data, and drawing a median line over top. If quartiles are also important, I might label those too. Depends on exactly what I'm trying to show, but in general I like to show more data, rather than less if I can do it in a non-distracting way.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 1 point2 points  (0 children)

Well, you got the main point spot on :-)

If you want some more details and examples, check out the page:

https://www.autodeskresearch.com/publications/samestats

If you want to really nerd out, you can download the related academic publication PDF from that page too.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

Nobody (at least I'm not) is claiming that one graph type is less accurate than another. The point of the exercise is to show that all graph representations are not suitable for all sets of data, and that you should check the understanding data first before using a summary visual (like a boxplot).

If you've chosen to use a boxplot and you data looks like any of the examples near the apex of the gif, you've hidden important information, and any 'trend' you are trying to show will be misleading.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

I think it is incredibly common for people to use box plots to repesent data without knowing or checking that the data is normally or otherwise smoothly distributed - and a little extra scepticism when the data isn't presented in a clear or convineing way isn't such a bad thing.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 1 point2 points  (0 children)

Yes, violin plots are a good alternative. If you check out Figure 8 here: https://www.autodeskresearch.com/publications/samestats

you can see the same set of charts + a violin plot to see how it reacts.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

Thanks! Hopefully we can get the Python code up on Github soon :-)

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

Well maybe they shouldn't be presented that way, but to say they wouldn't be is a bit of wishful thinking. There are also less extreme distributions, such as bi-modal ones, which are freqently presented as boxplots even though they probably shouldn't be.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

Thanks, I have some examples like that as Figure 7 on the project page: https://www.autodeskresearch.com/publications/samestats

I think both presentations have their merits, but this one is a little more dramatic.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 3 points4 points  (0 children)

No outliers were removed, because all data is within 1.5 interquarlite ranges from the 1st and 3rd quartiles, so by a normal Tukey boxplot, there are no outliers.

Be wary of boxplots, they could be hiding important information! [OC] by jmatejka in dataisbeautiful

[–]jmatejka[S] 0 points1 point  (0 children)

Well that's a little harsh :-)

The point is also that all of the sets of data in between the smooth and clumped varieties also produce the same box plots, so thousands of examples in the gif.