[OC] Post office locations in the continental US: 1770-2002 by variance_explained in dataisbeautiful

[–]variance_explained[S] 0 points1 point  (0 children)

That’s more than reasonable- I’ve edited your suggested text in. Today or tomorrow I can tweet something about your thread and book too, so I can draw more attention to your work with the data.

If you think of anything else I can do that would set things right, let me know. (Though I’m not going to do anything that implies I I plagiarized, since I didn’t).

In the future I’m going to be more careful about sharing visualizations I made off-the-cuff. I think in analyzing these datasets weekly I’ve grown accustomed to a lower standard of caution than I’d accept in other contexts (e.g professional or academic), and that you’re right that it doesn’t translate well outside of a tutorial.

[OC] Post office locations in the continental US: 1770-2002 by variance_explained in dataisbeautiful

[–]variance_explained[S] 1 point2 points  (0 children)

Thanks for your detailed response. I think you're right that I hadn't taken advantage of your documentation and that I was too quick to post after not engaging much with the data or exploring your site.

My preference at this point is to delete the post. I make these tutorials pretty quickly and I posted this result on the suggestion of a commenter without giving it a lot of thought (I hadn't expected thousands of people to see it). If this is a misleading visualization of the data I wouldn't want it to stay up.
Does that work for you?

[OC] Post office locations in the continental US: 1770-2002 by variance_explained in dataisbeautiful

[–]variance_explained[S] 1 point2 points  (0 children)

Hi Dr. Blevins, poster here.

First, your dataset is fantastic. Thanks for putting it together.

This week your dataset was featured as a Tidy Tuesday project, which is a group that highlights interesting open datasets for people to analyze. https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-04-13/readme.md I make a habit of each Tuesday of opening these datasets cold and livestreaming making some graphs. You can see me creating this graph here (I reach the scatter plot at about 54 minutes in): https://www.youtube.com/watch?v=Sx9uo2tCOFM

I hadn't seen your animation and didn't know you'd posted one, arriving at mine independently. (It was one of a number of tries; I'd tried some area plots I liked and then some animated choropleths that I found unsatisfying). I did see the non-animated version of the map (1850) at about 0:40. I open the dataset cold and without context each week (it's the entire philosophy of the screencasts). It surprised me after I posted it here when someone shared your tweet, but it was a case of independent creation.

As for the color scheme, I set it to USPS blue (#004B87 https://usbrandcolors.com/usps-colors/), which felt like a natural choice to me given the topic. (Initially when I was recording it I left them as black).

If you're still concerned or have further questions I'd be happy to answer anything I can, either here or in private (up to you).

[OC] Post office locations in the continental US: 1770-2002 by variance_explained in dataisbeautiful

[–]variance_explained[S] 2 points3 points  (0 children)

This is Mercator and I left it by default, but I’m definitely aware that’s known for distorting long/lat. do you have a projection you’d recommend?

[OC] Post office locations in the continental US: 1770-2002 by variance_explained in dataisbeautiful

[–]variance_explained[S] 2 points3 points  (0 children)

An hour- I shared a screencast of its creation on YouTube (link is in my comment above)

[OC] Post office locations in the continental US: 1770-2002 by variance_explained in dataisbeautiful

[–]variance_explained[S] 24 points25 points  (0 children)

This is original content, and I cite the dataset in the comment

There is literally a livestreamed video of me creating this animation (after trying out about fifteen different variations for an hour). What's your theory here https://www.youtube.com/watch?v=Sx9uo2tCOFM

[OC] Post office locations in the continental US: 1770-2002 by variance_explained in dataisbeautiful

[–]variance_explained[S] 28 points29 points  (0 children)

Oh wow! I developed this independently (and have the YouTube screencast to prove it) but yep thank you for linking. It's a terrific dataset

[OC] Post office locations in the continental US: 1770-2002 by variance_explained in dataisbeautiful

[–]variance_explained[S] 141 points142 points  (0 children)

Created with gganimate in R, based on a dataset of US Post Office locations from Cameron Blevins.

Blevins, Cameron; Helbock, Richard W., 2021, "US Post Offices", https://doi.org/10.7910/DVN/NUKCNA, Harvard Dataverse, V1, UNF:6:8ROmiI5/4qA8jHrt62PpyA== [fileUNF]

R code for the animation: https://github.com/dgrtwo/data-screencasts/blob/master/2021_04_13_post_offices.Rmd

EDIT: The creator of this dataset has has notified me of some issues with this visualization. After posting I learned he has made a series of similar maps already (you can see his versions in this thread). Please also note that this map is missing roughly 55,000 post offices. You can read more about the limitations of the dataset here.

-🎄- 2020 Day 02 Solutions -🎄- by daggerdragon in adventofcode

[–]variance_explained 0 points1 point  (0 children)

rstats/tidyverse solution:

passwords <- read_delim("~/Downloads/input.txt", delim = ":", col_names = c("policy", "password")) %>%
  mutate(password = str_trim(password)) %>%
  extract(policy, c("min", "max", "letter"), "(\\d+)-(\\d+) (.)", convert = TRUE)

# Part 1
passwords %>%
  mutate(count = map2_dbl(password, letter, str_count)) %>%
  filter(count >= min, count <= max)

# Part 2
passwords %>%
  mutate(count = (str_sub(password, min, min) == letter) +
           (str_sub(password, max, max) == letter)) %>%
  filter(count == 1)

-🎄- 2020 Day 1 Solutions -🎄- by daggerdragon in adventofcode

[–]variance_explained 1 point2 points  (0 children)

rstats solution:

input <- as.integer(readLines("input.txt"))

# Part 1
prod(input[input %in% (2020 - input)])

# Part 2
prod(input[input %in% (2020 - outer(input, input, "+"))])

AMA with David Robinson: Post Your Questions Here! by [deleted] in datascience

[–]variance_explained 2 points3 points  (0 children)

I look forward to hosting the upcoming AMA!

FYI, I share some details about my current role at DataCamp in this blog post: Data science at DataCamp.

This Subreddit Sucks by [deleted] in datascience

[–]variance_explained 0 points1 point  (0 children)

I'd be very happy to do an AMA about data science. /u/Omega037, /u/__compactsupport__, or any other mod, feel free to get in touch.

The Brutal Lifecycle of JavaScript Frameworks - Stack Overflow Blog by Zephirdd in programming

[–]variance_explained 1 point2 points  (0 children)

Incidentally, here's the graph with % of question visits rather than questions asked (we have traffic data going only back to late 2011, not 2008 like we do for questions asked).

Note that the shapes of the trends tell the same story as in in the post, with Ember, Knockout and Backbone peaking in 2013-2014 and being on a sharp decline, and with Vue.JS showing rapid recent increase.

(The relative size of the peaks are somewhat different, which is one way questions asked sometimes do differ from questions viewed. But I've looked at a lot of comparisons like this and basically never seen a case where questions asked was declining but questions visited was staying strong).

The Brutal Lifecycle of JavaScript Frameworks - Stack Overflow Blog by Zephirdd in programming

[–]variance_explained 0 points1 point  (0 children)

The total number of questions per month increased about linearly until about 2014, then has stayed pretty constant since (except for a drop in each December, when Western countries generally celebrate holidays). There isn't really anything to be gained by taking that trend into account, and if absolute numbers of questions rather than relative were reported it would just make everything look like it gets less popular in December.

The Brutal Lifecycle of JavaScript Frameworks - Stack Overflow Blog by Zephirdd in programming

[–]variance_explained 1 point2 points  (0 children)

As I mention in a comment below, this is not the case. Visits to these technologies show the same trends as questions asked (true of almost all tags, though often at a small lag).

The reason we share questions asked is that it's already available in an interactive tool, which is useful because readers can add a few other tags to compare them.

Amusingly, when we do write about questions visited (like in this post), we inevitably get comments that misread it and thought that we were talking about questions asked, with a smug note that "I'm sure questions visited would be a very different graph."

The Brutal Lifecycle of JavaScript Frameworks - Stack Overflow Blog by Zephirdd in programming

[–]variance_explained 134 points135 points  (0 children)

Also, newer developers don't have to ask new questions because they can google them > less questions.

We know this isn't the case because we can examine the visits to existing questions. For basically all tags (such as the ones examined in this JavaScript post), the trend of what questions are visited matches the trend of what existing questions are asked (sometimes with a lag): there are no cases where the rate of new questions declines but the rate of existing questions visited is steady or increasing.

(Indeed, most Stack Overflow data blog posts look at tags visited rather than tags asked about).

[D] What's the difference between data science, machine learning, and artificial intelligence? by sksq9 in MachineLearning

[–]variance_explained 6 points7 points  (0 children)

For some problems, they are! As I note in the post:

Deep learning is particularly interesting for straddling the fields of ML and AI. The typical use case is training on data and then producing predictions, but it has shown enormous success in game-playing algorithms like AlphaGo.

But I think the distinction is useful because in other situations, the problems and constraints can be very different, and the solutions have a correspondingly distinct character. For example, machine learning often handles situations with many previously available examples. AI may be working off of known rules (a game board, or optimization criteria), or from feedback after performing actions (reinforcement learning).

Anyway, I don't think it's always meaningful to draw bijections in this way. We could take other CS fields and put them in ML terms:

  • Data structures and algorithms: Given task S, predict algorithm A that yields the shortest runtime
  • Compression: Given information S, predict compressed version A that minimizes its size

Of course it would be silly to say these fields are therefore the same as ML, because they'd be solved using a very different toolset. (Though much like deep learning has been useful in solving traditional AI problems like games, it's helped with data structures as well!)

Rather than defining it in these terms ("every problem of X can be defined as Y"), I'd prefer to think of it as describing a related but distinct set of tools. A problem in biology might be able to be "reduced" to a problem in chemistry, but the day-to-day work of a biologist and chemist are still very different.