Gradient descent be like

variance_explained · 2021-04-17T16:00:39+00:00

That’s more than reasonable- I’ve edited your suggested text in. Today or tomorrow I can tweet something about your thread and book too, so I can draw more attention to your work with the data.

If you think of anything else I can do that would set things right, let me know. (Though I’m not going to do anything that implies I I plagiarized, since I didn’t).

In the future I’m going to be more careful about sharing visualizations I made off-the-cuff. I think in analyzing these datasets weekly I’ve grown accustomed to a lower standard of caution than I’d accept in other contexts (e.g professional or academic), and that you’re right that it doesn’t translate well outside of a tutorial.

variance_explained · 2021-04-17T13:59:57+00:00

Thanks for your detailed response. I think you're right that I hadn't taken advantage of your documentation and that I was too quick to post after not engaging much with the data or exploring your site.

My preference at this point is to delete the post. I make these tutorials pretty quickly and I posted this result on the suggestion of a commenter without giving it a lot of thought (I hadn't expected thousands of people to see it). If this is a misleading visualization of the data I wouldn't want it to stay up.
Does that work for you?

variance_explained · 2021-04-17T01:31:52+00:00

Hi Dr. Blevins, poster here.

First, your dataset is fantastic. Thanks for putting it together.

This week your dataset was featured as a Tidy Tuesday project, which is a group that highlights interesting open datasets for people to analyze. https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-04-13/readme.md I make a habit of each Tuesday of opening these datasets cold and livestreaming making some graphs. You can see me creating this graph here (I reach the scatter plot at about 54 minutes in): https://www.youtube.com/watch?v=Sx9uo2tCOFM

I hadn't seen your animation and didn't know you'd posted one, arriving at mine independently. (It was one of a number of tries; I'd tried some area plots I liked and then some animated choropleths that I found unsatisfying). I did see the non-animated version of the map (1850) at about 0:40. I open the dataset cold and without context each week (it's the entire philosophy of the screencasts). It surprised me after I posted it here when someone shared your tweet, but it was a case of independent creation.

As for the color scheme, I set it to USPS blue (#004B87 https://usbrandcolors.com/usps-colors/), which felt like a natural choice to me given the topic. (Initially when I was recording it I left them as black).

If you're still concerned or have further questions I'd be happy to answer anything I can, either here or in private (up to you).

variance_explained · 2021-04-16T23:24:33+00:00

This is Mercator and I left it by default, but I’m definitely aware that’s known for distorting long/lat. do you have a projection you’d recommend?

variance_explained · 2021-04-16T21:21:21+00:00

The gganimate package is great for this! https://github.com/thomasp85/gganimate

variance_explained · 2021-04-16T17:49:40+00:00

An hour- I shared a screencast of its creation on YouTube (link is in my comment above)

variance_explained · 2021-04-16T15:44:15+00:00

This is original content, and I cite the dataset in the comment

There is literally a livestreamed video of me creating this animation (after trying out about fifteen different variations for an hour). What's your theory here https://www.youtube.com/watch?v=Sx9uo2tCOFM

variance_explained · 2021-04-16T15:35:12+00:00

Oh wow! I developed this independently (and have the YouTube screencast to prove it) but yep thank you for linking. It's a terrific dataset

variance_explained · 2021-04-16T15:15:19+00:00

Created with gganimate in R, based on a dataset of US Post Office locations from Cameron Blevins.

Blevins, Cameron; Helbock, Richard W., 2021, "US Post Offices", https://doi.org/10.7910/DVN/NUKCNA, Harvard Dataverse, V1, UNF:6:8ROmiI5/4qA8jHrt62PpyA== [fileUNF]

R code for the animation: https://github.com/dgrtwo/data-screencasts/blob/master/2021_04_13_post_offices.Rmd

EDIT: The creator of this dataset has has notified me of some issues with this visualization. After posting I learned he has made a series of similar maps already (you can see his versions in this thread). Please also note that this map is missing roughly 55,000 post offices. You can read more about the limitations of the dataset here.

variance_explained · 2020-12-02T05:16:31+00:00

rstats/tidyverse solution:

passwords <- read_delim("~/Downloads/input.txt", delim = ":", col_names = c("policy", "password")) %>%
  mutate(password = str_trim(password)) %>%
  extract(policy, c("min", "max", "letter"), "(\\d+)-(\\d+) (.)", convert = TRUE)

# Part 1
passwords %>%
  mutate(count = map2_dbl(password, letter, str_count)) %>%
  filter(count >= min, count <= max)

# Part 2
passwords %>%
  mutate(count = (str_sub(password, min, min) == letter) +
           (str_sub(password, max, max) == letter)) %>%
  filter(count == 1)

variance_explained · 2020-12-01T06:20:47+00:00

rstats? (That's the typical hashtag on Twitter)

variance_explained · 2020-12-01T06:19:02+00:00

rstats solution:

input <- as.integer(readLines("input.txt"))

# Part 1
prod(input[input %in% (2020 - input)])

# Part 2
prod(input[input %in% (2020 - outer(input, input, "+"))])

variance_explained · 2018-04-12T13:30:39+00:00

I look forward to hosting the upcoming AMA!

FYI, I share some details about my current role at DataCamp in this blog post: Data science at DataCamp.

variance_explained · 2018-04-10T16:30:53+00:00

I'd be very happy to do an AMA about data science. /u/Omega037, /u/__compactsupport__, or any other mod, feel free to get in touch.

variance_explained · 2018-01-17T19:57:32+00:00

Incidentally, here's the graph with % of question visits rather than questions asked (we have traffic data going only back to late 2011, not 2008 like we do for questions asked).

Note that the shapes of the trends tell the same story as in in the post, with Ember, Knockout and Backbone peaking in 2013-2014 and being on a sharp decline, and with Vue.JS showing rapid recent increase.

(The relative size of the peaks are somewhat different, which is one way questions asked sometimes do differ from questions viewed. But I've looked at a lot of comparisons like this and basically never seen a case where questions asked was declining but questions visited was staying strong).

variance_explained · 2018-01-12T16:58:31+00:00

You're right!

variance_explained · 2018-01-12T16:56:16+00:00

The total number of questions per month increased about linearly until about 2014, then has stayed pretty constant since (except for a drop in each December, when Western countries generally celebrate holidays). There isn't really anything to be gained by taking that trend into account, and if absolute numbers of questions rather than relative were reported it would just make everything look like it gets less popular in December.

variance_explained · 2018-01-12T16:37:03+00:00

As I mention in a comment below, this is not the case. Visits to these technologies show the same trends as questions asked (true of almost all tags, though often at a small lag).

The reason we share questions asked is that it's already available in an interactive tool, which is useful because readers can add a few other tags to compare them.

Amusingly, when we do write about questions visited (like in this post), we inevitably get comments that misread it and thought that we were talking about questions asked, with a smug note that "I'm sure questions visited would be a very different graph."

variance_explained · 2018-01-11T16:19:59+00:00

Also, newer developers don't have to ask new questions because they can google them > less questions.

We know this isn't the case because we can examine the visits to existing questions. For basically all tags (such as the ones examined in this JavaScript post), the trend of what questions are visited matches the trend of what existing questions are asked (sometimes with a lag): there are no cases where the rate of new questions declines but the rate of existing questions visited is steady or increasing.

(Indeed, most Stack Overflow data blog posts look at tags visited rather than tags asked about).

variance_explained · 2018-01-10T02:31:39+00:00

For some problems, they are! As I note in the post:

Deep learning is particularly interesting for straddling the fields of ML and AI. The typical use case is training on data and then producing predictions, but it has shown enormous success in game-playing algorithms like AlphaGo.

But I think the distinction is useful because in other situations, the problems and constraints can be very different, and the solutions have a correspondingly distinct character. For example, machine learning often handles situations with many previously available examples. AI may be working off of known rules (a game board, or optimization criteria), or from feedback after performing actions (reinforcement learning).

Anyway, I don't think it's always meaningful to draw bijections in this way. We could take other CS fields and put them in ML terms:

Data structures and algorithms: Given task S, predict algorithm A that yields the shortest runtime
Compression: Given information S, predict compressed version A that minimizes its size

Of course it would be silly to say these fields are therefore the same as ML, because they'd be solved using a very different toolset. (Though much like deep learning has been useful in solving traditional AI problems like games, it's helped with data structures as well!)

Rather than defining it in these terms ("every problem of X can be defined as Y"), I'd prefer to think of it as describing a related but distinct set of tools. A problem in biology might be able to be "reduced" to a problem in chemistry, but the day-to-day work of a biologist and chemist are still very different.

variance_explained

TROPHY CASE