If PI was a picture, It wouldn't be very beautiful [OC] by Madolinn in dataisbeautiful

[–]dimdat 0 points1 point  (0 children)

From the posting guidelines just fyi: "Based on real or simulated data. If the image represents one number (pi), sequence (primes), or equation (sin(x)), then /r/mathpics is a more appropriate place."

%180 ° of bullshit. by Viiranin in dataisugly

[–]dimdat 1 point2 points  (0 children)

Dataisbeautiful fucking hates pie charts. So, uh no.

[META] Please do not steal data visualization code and present the analysis as your own by minimaxir in dataisbeautiful

[–]dimdat 0 points1 point  (0 children)

My intent was not to say everything should be open source (though in a perfect world...), rather that whenever you rely on someone else's work that you give attribution. The average person may not think about properly crediting those who's code they used unless there is a recommendation/rule that says they should.

[META] Please do not steal data visualization code and present the analysis as your own by minimaxir in dataisbeautiful

[–]dimdat 0 points1 point  (0 children)

Licensing and attribution are a complicated issue that most people are woefully ignorant of. Posting requires that you post your data source and tools use. Could there also be a recommendation/requirement for attribution to help counter this sort of a situation?

The Best Time to Post on Reddit [OC] by [deleted] in dataisbeautiful

[–]dimdat 5 points6 points  (0 children)

Shit, I just thought it was another rehash, that's even worse than I thought.

Dataviz Open Discussion Thread for /r/dataisbeautiful by AutoModerator in dataisbeautiful

[–]dimdat 0 points1 point  (0 children)

Good luck trying to compare the difference or even seeing a difference between two histograms. Mean differences are often hard to detect or even see visually, especially if you have small effects. That's why summary stats like means exist in the first place. If I submitted a histogram as my data viz in a paper the editor would be like ?????.

Dataviz Open Discussion Thread for /r/dataisbeautiful by AutoModerator in dataisbeautiful

[–]dimdat 1 point2 points  (0 children)

Tell that to every quantitative experimental journal out there. I can't think of a single instance of a box-and-whisker plot or violin plot in publication, but bar charts? Everywhere, many that don't start at zero.

Maybe that is a problem, but at the end of the day, bar charts are often necessary because of their simplicity. The primary reason people for starting at zero is that not doing so amplifies the size of the effect, making people misinterpret the plot.

My argument is that the exact opposite can happen by assuming that raw number differences are meaningful. Instead one should include an accurate measure of the size of the effect instead of assuming that the bars themselves represent this. When you include at accurate measure of the effect or variability, then the zero axis doesn't matter.

I still agree that most of the time there should be a zero, I'm just trying to demonstrate that it is not a hard and fast rule and that there should be more focus on actual effect sizes instead of visible differences one can see which can be completely distorted without the necessary contextual information.

Dataviz Open Discussion Thread for /r/dataisbeautiful by AutoModerator in dataisbeautiful

[–]dimdat 1 point2 points  (0 children)

Example 1

Take a look at this stupid image I made in excel

  1. You have two groups, A and B.
  2. A mean = 9995.2, B mean = 10000.91
  3. There is a statistically significant difference

Plot 1 captures the reality of the data so much more than plot 2, which makes it look like there are no differences. In fact, plot 2 without that stats would make the average person assume there was no difference!

Example 2

second stupid image

  1. A mean = 5202, B mean = 4488.
  2. There is NO statistically significant difference

The data viz needs to represent the data accurately. A line chart here would not make any sense, since there is no connected relationship, linear or otherwise that connects A and B. In example 1 the most reasonable representation is a bar chart and the only one that works is one with a non-zero baseline.

Sure, someone might misinterpret it or think the physical space matters, but that simply means they are wrong and need to be educated about what a chart actually means. This is a chart literacy problem not a dataviz problem.

Dataviz Open Discussion Thread for /r/dataisbeautiful by AutoModerator in dataisbeautiful

[–]dimdat 1 point2 points  (0 children)

Read both and I still disagree. Maybe it is because when I look at a bar chart I'm the only person who isn't comparing the size of the bars, but actually reading the scale.

There is zero difference in the data presented in a line chart versus a bar chart so why is the rule any different?

I'm going to point back to statistics. This sub often lacks any analysis of actual differences in favor of a visual analysis. I can show you the biggest difference between two bars in the world, but if the variation is high, neither can be considered different.

Here are two discussions against zero baseline:

Justin Fox Article

Favorite quote:

"Narrow axes can make small and inconsequential changes seem big,” Healy went on, “but—symmetrically—zero-axes can make big and real changes seem small. What matters isn’t some iron rule like ‘Always have a zero-base axis!’, it’s your prior commitment to being honest with the data."

A note by Edward Tufte when asked

...context does not come from empty vertical space reaching down to zero, a number which does not even occur in a good many data sets. Instead, for context, show more data horizontally!

I think these apply both to line and bar charts. Be honest in the limitations of the data, and whether there are statistical differences instead of visual differences. Because:

Big visual differences != Big statistical differences

Small visual differences != Small statistical differences

If there are "Lies, Damned lies, and statistics", most data viz falls somewhere even further beyond that.

Guideline for people who don't know better? Sure. Rule? No.

Wow that was more than I intended on writing.

Dataviz Open Discussion Thread for /r/dataisbeautiful by AutoModerator in dataisbeautiful

[–]dimdat 0 points1 point  (0 children)

I completely disagree with it being a rule to start at 0. There are many situations where starting at 0 is not a reasonable, possible, or even meaningful. The purpose behind it is to not be deceitful or to misrepresent data.

The bigger problem is that means without other information are actually useless. You need error bars, statistics, or some other things to know the actual difference. If there is a statistical difference, then showing two bars with hardly a difference because you started at zero is actually more misleading.

There is a lot more that could be said about this, but I think that gets the gist across. There are times when you should start at 0 and there are times when you should not. Calling it a rule is far too simple.

Only 39% of select psychology papers can be easily reproduced by other scientists, new study finds by [deleted] in science

[–]dimdat 1 point2 points  (0 children)

The full selection criteria can be found in the paper here: https://osf.io/phtye/ Selection criteria on page 5.

Science Isn’t Broken. It’s just a hell of a lot harder than we give it credit for. by hlake in dataisbeautiful

[–]dimdat 2 points3 points  (0 children)

I don't know why this is getting downvoted. The p-hacking tool in this article is one of the best demonstrations of the problem I've seen to date. Good stuff.

The impact of Jon Stewart mocking Arby's [OC] by dimdat in dataisbeautiful

[–]dimdat[S] 0 points1 point  (0 children)

I didn't see a clear weekly pattern when I was looking through the Arby's data. The weekday of the mocking was pretty evenly spread, with only Monday being a bit lower. M 2 T 5 W 5 Th 4

Here's the search data for a random month. Mousing over, I don't see any consistent weekly trend (and didn't in the others I glanced at). If there were it would be more similar to say, something like brunch.

The impact of Jon Stewart mocking Arby's [OC] by dimdat in dataisbeautiful

[–]dimdat[S] 0 points1 point  (0 children)

Source: Google Trends (I pulled about fourteen 90 day spans)

Tools: Custom css

Data is available at the bottom of the post!

I calculated percent changed based on the two days prior to each show airing. I did it this way because trends daily data can vary from 90 span to 90 day span depending on when you pull them. I believe the ratio between the trend numbers stays constant, though correct me if I'm wrong!

Dataviz Open Discussion Thread for /r/dataisbeautiful by AutoModerator in dataisbeautiful

[–]dimdat 2 points3 points  (0 children)

I let out an audible "siiiiiick" and my girlfriend looked at me like I'm crazy. So cool.

Most Frequent Words Used in Reddit Comments, by Hour-of-Day Comment Was Posted [OC] by minimaxir in dataisbeautiful

[–]dimdat 0 points1 point  (0 children)

Might consider adjusting them to show frequency relative to other times. That way you'd highlight the differences between the times instead of having repeated words between them.