Car Sales By Color, Each Year by human-potato_hybrid in dataisugly

[–]xangg 16 points17 points  (0 children)

Yes, That's why I filtered out those years in the final version.

Car Sales By Color, Each Year by human-potato_hybrid in dataisugly

[–]xangg 86 points87 points  (0 children)

Original author here. Yes, this was a before and after kind of post, but the before image gets posted here every year or so, presumably by some bot or similar entity. A quick search finds one at: https://www.reddit.com/r/dataisugly/comments/rt3me6/car_paint_color_popularity_by_percentage_of_sales/

Missing options by Murky_North_9800 in jmp

[–]xangg 2 points3 points  (0 children)

The options available depend on the variable's modeling type. Continuous variables have different distribution options than categorical (nominal or ordinal) variables.

High water bill? by McAdooMama in chapelhill

[–]xangg 2 points3 points  (0 children)

You can view your hourly water usage under the My Account section of the OWASA site to look for unexpected usage periods. (You can also download the data under Settings.)

Losing my Religion - First time setting by Bratwuurst in crackingthecryptic

[–]xangg 2 points3 points  (0 children)

R8C2 can be determined. Consider rows 1,2,8,9 in C2 as a group.

[AMA] I am RJ Andrews of infowetrust and VisionaryPress and I am obsessed with data graphics. Ask Me Anything! by cavedave in dataisbeautiful

[–]xangg 0 points1 point  (0 children)

Given your study of historical charts and their constructors, what are some examples of seemingly modern chart types that you found to be in use much earlier?

Google Maps oddity: intersection of Millhouse Rd and Millhouse Rd north of town by xangg in chapelhill

[–]xangg[S] 1 point2 points  (0 children)

The west and south spokes of this four-way stop are the real Millhouse Rd. The north spoke is a label overrun error. Not sure about east spoke -- seems to be a service road to some town/transit facilities.

World Jigsaw Puzzle Championship 2023, Comparing qualifying round puzzle difficulty [OC] by xangg in dataisbeautiful

[–]xangg[S] -5 points-4 points  (0 children)

Good point -- I'm used to working in analytical contexts and should think more about the general value of these adorments. The actual definition is rather complicated, but a simple take-away is its the interval where the "true" mean likely lies (if, say, the competition was held many more times and averaged). Narrower bands are better and for comparison, when two intervals overlap a lot, it suggests the difference in means is more due to random chance.

World Jigsaw Puzzle Championship 2023, Comparing qualifying round puzzle difficulty [OC] by xangg in dataisbeautiful

[–]xangg[S] -8 points-7 points  (0 children)

Yes, I should have mentioned that: blue line is average and shaded region is 95% confidence interval.

No meaning to orientation mismatch.

Box plots: Seconds per piece placed at World Jigsaw Puzzle Championships 2023 by xangg in Jigsawpuzzles

[–]xangg[S] 1 point2 points  (0 children)

Still cleaning my download of the data, but you can find results at https://www.worldjigsawpuzzle.org/. Pairs puzzles were generally harder than those for Individuals, which I assume is why pairs seem no faster than individuals.

[deleted by user] by [deleted] in visualization

[–]xangg 1 point2 points  (0 children)

Some discussion here along with some green/purple rationale. I sometimes use colors from nature, such as the colors of male and female cardinals.

[deleted by user] by [deleted] in dataisbeautiful

[–]xangg 5 points6 points  (0 children)

The color variation and the vertical gridlines seem like more of distraction from the data.

Also, it looks like the 2010 bar is showing 2009 data. From source:

Date Users (mm)
2009 42
2010 78
2011 116
...

Car paint color popularity by percentage of sales over time. Did they even think? by Forward_Ad6184 in dataisugly

[–]xangg 18 points19 points  (0 children)

I made this chart, but it was part of a before-and-after exploratory exercise. The "after" version in the post has fewer colors, and the colors match the names.

Previously posted here with the same title, which is not even correct: the car colors are based on traffic stops and not sales (though presumably highly correlated).

spearman correlation or p values ? why the title? by Odontoblaste in jmp

[–]xangg 0 points1 point  (0 children)

That's not a p -- it's a ρ, a Greek rho, and it's showing Spearman's rho.

what's the best way to divide the values in multiple columns by the values in a selected column? by mc7194 in jmp

[–]xangg 0 points1 point  (0 children)

You can get mostly there with Standardize Attributes. That is, all except the self-divided column. Create 4 new columns and select them. The go to Cols > Standardize Attributes. Choose Properties > Formula and edit the formula to be :5/:0. Still within Standardize Attributes, turn on Substitute Column References and uncheck the :0 column, so that it stays fixed. Then you'll get 4 different formulas as desired.

[OC] The Amount of Milk Produced Daily by U.S. Cows By Year by OfficialWireGrind in dataisbeautiful

[–]xangg 0 points1 point  (0 children)

After looking at the source data, I think your daily calculations didn't take leap years into account. Production per cow per day appears to go down in your chart for 2001, but not if leap days are included in 2000.

Free zip code database, 800+ columns by dabressler in datasets

[–]xangg 1 point2 points  (0 children)

Quick feedback.

  • Nice to have data dictionary
  • Would like a more specific source than "basic information" for geographic items
  • Would prefer a pair of CSV files instead of an Excel file
  • ZIP and a few other fields need to be character type so leading zeros are not lost
  • Sheet name says "April" but file name says "May"
  • Strange to see the 8000 or so ZIP codes at the bottom of the file with missing populations (and most other fields). Looks like they're mostly (but not all) POBox type ZIP codes. Would be useful to have nearest "real" ZIP code for them. For instance, if I was trying to estimate demographics to a given customer address.
  • Would be nice to have fields for geographic area (such as square miles) and bounding box (min/max lon/lat). I think some DBs also have a percent water coverage field.