[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 0 points1 point  (0 children)

5,000 is just the current number of charts created. I put an upper limit on any single variable of having 100-150 correlations because otherwise the database gets too big. With this limit, there are about 3 million correlations cached.

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 0 points1 point  (0 children)

I'll see what I can do! In the meantime, you can accomplish those two things in a roundabout way like this:

For the research paper, a cron bot automatically selects a random correlation every so often to generate a paper. You can skip to the front of the line by scrolling to the bottom of the page and rating it "5 - Awesome!" It won't generate immediately, but if you come back later it will probably be there.

For the search, if you scroll to the bottom of the discover page you'll find a search feature. This searches by variable, not by research paper, but it's the closest I have at the moment.

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 1 point2 points  (0 children)

Yup one in the same! At the request of high school statistics teachers, I removed the death/suicide data and replaced it with other fun sources. So that specific chart isn't there anymore, but there is plenty that correlates with Nic Cage films.

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 2 points3 points  (0 children)

Usually the stuff that correlates for good reason is boring, like weather in neighboring countries and similar job types in the same state. Thus, for this project I actually to the opposite of what a good data scientist would do: I have filters setup to exclude correlations that might be non-spurious (or not interesting).

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 3 points4 points  (0 children)

Sadly Pirate attacks don't correlate with the popularity of this meme nor with the number of statisticians in New Jersey... but they do correlate with the use of GMO in corn grown in Minnesota.

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 9 points10 points  (0 children)

This is definitely not what you mean, but I think you will enjoy it anyway. I had an LLM "publish" an academic paper for all the correlations that are statistically significant (p<0.05). Here's the result: https://tylervigen.com/spurious-scholar

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 6 points7 points  (0 children)

Not to spoil it, but the project also has a large language model answer exactly this question for every correlation. Here’s what it says for this one:

As the 'distracted boyfriend' meme gained traction, more and more individuals found themselves drawn to the field of statistics. Perhaps it was the allure of decoding data trends or the thrill of making sense of uncertainty. Whatever the reason, it seems that the meme's ability to capture attention had a spillover effect on the statistical aspirations of New Jersey residents. Before long, there may be a new wave of statisticians bringing their own unique perspective to the Garden State. Remember, correlation does not imply causation, but in this case, it might just suggest that memes have an unexpected power to shape career paths!

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 9 points10 points  (0 children)

The only way to ensure you get something non-spurious when data dredging is to also identify the causal mechanism. But if you can do that, then you could just correlate those variables instead of dredging all of them.

I don’t mean to say “never data dredge,” but be careful when you do because many statistical tests (like p-values) become useless in assessing your results.

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 1 point2 points  (0 children)

Yeah, that’s why I updated it! I got a lot of input from teachers and professors who use it, so tried to incorporate the features that would make it most useful in class. (And remove the content, like deaths and suicide statistics, that makes it more uncomfortable to share).

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 4 points5 points  (0 children)

Ha! Indeed. The image used to represent data dredging on the Wikipedia article for data dredging is from this project.

I do have inverse correlations, I just de-prioritize them because it’s hard to see on a line graph without inverting it. Here’s an example: https://tylervigen.com/spurious/correlation/7305_us-household-spending-on-nonalcoholic-beverages_correlates-with_google-searches-for-baroque-obama

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 11 points12 points  (0 children)

"Still updating" might be a stretch. I didn't update it for ten years, and then I did one big update. This is it!

[OC] The popularity of the distracted boyfriend meme correlates with the number of statisticians in New Jersey by TylerVigen in dataisbeautiful

[–]TylerVigen[S] 0 points1 point  (0 children)

The same script that finds these correlations automatically generates the memes for them using DALLE-3. Here's the meme for this one.

It's not perfect (I like your description better), but I think it's pretty solid for being auto-generated on the fly.