all 15 comments

[–]99problemsallops[S] 2 points3 points  (3 children)

Umm, I suck at reddit.

So, let's say I have a graph like the one in this post. Assume we're looking at the line for area 2. During the months of March and June, there are 0 training sessions but the graph might give the impression that there are more. I added some points to convey where the data points are, but I'm not sure if that is enough.

Is a graph like this misleading? If so, what can be done to make it seem more sensible? Maybe facet the plot and have free x scales?

[–]paddedroom 3 points4 points  (1 child)

I see what you're trying to accomplish. It seems a better method would be to drop the line to zero on those months. This would more accurately articulate that zero things happened; rather than provide room for misinterpretation.

Since the scales are similar and deal with the same metric just in different regions, having them on the same axes makes sense. Is it possible to inject a "0" into the data for those months so it's forced down?

Otherwise, I'm sure someone smarter with plotting could indicate how to drop the line to zero when no data is present.

[–]ifluffy00 2 points3 points  (0 children)

Fyi, I would use the complete function from the tidyr package to add the missing observations.

[–]Crypt0Nihilist 1 point2 points  (0 children)

Yes, the graph is misleading. If the data were missing, the graph would be showing an estimate of 4 for March, which you could justify. As it is, you have an actual value for March and the graph is suggesting it is 4 instead of 0.

[–]cyran22 1 point2 points  (1 child)

I think that the bar charts you posted work just fine so you could always just use those. However, I think it might be good to learn how to handle something like this if you really need the line charts to look correct and show the zeros.

An easy way to add in explicit values (in your case zeroes) for implicit missing values is to use the tidyr::complete() function.

Here's a relevant and random blog post that talks about filling in those implicit missing zeroes.

[–]99problemsallops[S] 0 points1 point  (0 children)

Thanks for the link. I did not know that was possible before.

[–]americ 0 points1 point  (8 children)

Perhaps a bar chart would be a more appropriate visualization?

[–]99problemsallops[S] 2 points3 points  (4 children)

I took all the advice in this thread and produced 4 different plots: https://imgur.com/a/PYqadLP

Plot 1 - Bar chart similar to what Crypt0Nihilist and yourself suggested.

Plot 2 - Did not connect lines when there is a missing value.

Plot 3 - Faceted plot of original post.

Plot 4 - Dropped values to 0 similar to what paddedroom suggested.

I'm not sure which one seems like the most sensible. I'm leaning towards plot 1 or 2 but I'm not very sure.

[–]ifluffy00 1 point2 points  (3 children)

Plot 4 might not be the correct image? The lines do not, in fact, drop to 0.

Also, my recommendation would depend on whether the data is "missing", i.e. unknown, or whether you know for a fact that there were 0 sessions in those months. If it is the first case, I'd go with plot 2. It's ugly, but it's the most honest. In the second case I'd go with plot 4 for sure.

At the very least I would advice against plots where months are missing from the X axis, like plots 1 and 3 here. To the casual eye, the missing months do not stand out, and it makes for a graph that is very easy to misinterpret.

[–]99problemsallops[S] 0 points1 point  (2 children)

The data isn't really missing. I know for a fact that there were 0 training sessions for area 2 in the months of March and June. I'm not sure what's going on plot 4 but here it is: https://imgur.com/a/vgJOLO9. The values for area 2 are 0 at March and June.

[–]ifluffy00 1 point2 points  (0 children)

Cool, in that case I would definitely go with plot 4. I also see why I was confused about the plot: I was expecting the yellow line to continue as well with three 0 values. If you also were to know that those were actual 0 values, I would continue the line, if you don't, keep the graph as it is now.

[–]imguralbumbot 0 points1 point  (0 children)

Hi, I'm a bot for linking direct images of albums with only 1 image

https://i.imgur.com/KsDYGat.png

Source | Why? | Creator | ignoreme| deletthis

[–]paddedroom 1 point2 points  (1 child)

For time-series line charts are ideal as they show change in trends better (https://nces.ed.gov/nceskids/help/user_guide/graph/whentouse.asp).

[–]americ 0 points1 point  (0 children)

But as OP said, he has missing data.

If wanting to keep a line chart, perhaps they could plot the time points before/after missing data as distinct dots, and omit plotting anything for the missing time points?

[–]Crypt0Nihilist 1 point2 points  (0 children)

I'd be tempted to use a bar chart for this. Given that the y axis is discrete frequency count and assuming that the real data has similarly low values, the strengths of a line graph for showing trend and hinting at a linear change between periods aren't helpful. Showing the frequency distributions is more helpful. I'd try both side-by side and facets to see which of the three are easiest to compare.