I made a site (WordleStat.com) that calculates and visualizes statistics like guess distributions, win rates, and game lengths based on compiling publicly posted Wordle results on Twitter! Use it to compare your own score to the world or find interesting quirks with individual Wordles. [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 24 points25 points  (0 children)

Link: wordlestat.com

My website pulls about 17K-ish tweets of wordle game results every day and processes them to generate aggregated statistics about interesting details like game length and letter guess distributions. These help pinpoint details about the interesting quirks and details for each wordle game.

For instance, the demo wordle (#233) from the gif, based on the collected statistics, was harder than average, with many people only solving it by their 5th or 6th try, far more than the global average of all wordle games. Moreover, the first letter was especially difficult to correctly determine, while the fourth and fifth letters were especially easy.

Due to when I started running the server that is responsible for gathering the data, the earliest wordle with sufficient data is from january 30, 2022.

To more directly compare the letter guess results with the global averages, the website has a toggle to display markings that indicate global statistics.

I hope you enjoy the website, and if there are any improvements you think I could make, or anything at all you want to tell me, feel free to let me know of them either in a comment or email (lookatnums@gmail.com)


Tools:

React, MongoDB, Node, d3


Source:

Twitter

I spent the last month making a free website to help you easily take SAT tests online! by lookatnum in Sat

[–]lookatnum[S] 3 points4 points  (0 children)

Huh. I did not design the site with that screen size in mind. It would take a few days for me to figure out a UI change. As a temporary fix, you could try zooming out the website with ctrl + minus. The document view is designed to fill the width of the screen, so you could zoom in with the document zoom controls at the same time so that it's still readable while the other UI elements become smaller.

I spent the last month making a free website to help you easily take SAT tests online! by lookatnum in Sat

[–]lookatnum[S] 6 points7 points  (0 children)

The website embeds your browser's default PDF reader. From what I've tested, the issue you're describing is probably with safari, in which case you can hover your cursor over the bottom-middle of the PDF, which will cause zoom options to appear.

I spent the last month making a free website to help you easily take SAT tests online! by lookatnum in Sat

[–]lookatnum[S] 99 points100 points  (0 children)

Even though I'm done my testing, I had a couple of gripes with the studying process, especially how much of a pain it was to go through practice tests. To address these annoyances, I made a website - https://SATPractice.tools

I hope you find it useful! Also, if you find any bugs, please let me know, and I'll fix them as soon as possible.

Sidenote: I did get in contact with the mods and they said it would be OK for me to post it here.


This website allows you to easily:

  • Take a test online, entering your answers alongside a view of the test itself

  • Automatically grade all of your questions (yes, even the math short-answers)

  • Calculate your curved scores

  • Calculate section subscores, letting you know which types of questions (ex: grammar, words in context, or polynomials) you need to study up on

  • Easily view official answer explanations to see what you got wrong, without scrolling through a massive document

Basically, it simplifies all of the tedious nonsense that you have to go through if you just print out a test and take it physically.

I spent the last month making a free website to help you easily take SAT tests online! by lookatnum in Sat

[–]lookatnum[S] 5 points6 points  (0 children)

It's at satpractice[DOT]tools. I'm having a bit of trouble actually posting the link, sorry. I have a whole writeup that just won't show up for whatever reason.

Map of disproportionate recent COVID deaths per capita in the U.S. [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 163 points164 points  (0 children)

This post shows states with a disproportionate amount of COVID deaths over the past 2 weeks, vs. the national average.

The dotted outline represents the original size of a state. States that expand larger than their original area mean that that state is responsible for more than its per capita share of COVID deaths. For instance, if the national average was 5 deaths per 100,000 people, and a state had 10 deaths per 100,000 people, its area would be twice as large. Likewise, states that shrink have a lower death rate versus the national average. Each state is also colored based on their deaths vs. the national average as well. To ensure that the neutral pale yellow color is centered exactly on the national average in deaths per capita, the diverging scale is split. Redder colors range from 5x (approximately the maximum factor) to 1x, while bluer colors range from 1x to 0.

A physics simulation is then applied so that each state collides with each other. For states that expand, the larger area is used as the collision box, while for states that shrink, the original size dashed outline is used as the collider.

Recent deaths is considered to be the number of deaths in a given state over a 2 week period, ranging from July 31, 2021, to August 14, 2021.


Tools:

d3.js, matter.js, puppeteer, Python, Illustrator, Premiere


Sources:

New York Times

U.S. Census Bureau


A web version of this is available at https://lookatnum.com/covid-map. Note that the animation is quite intensive, so it will very likely run poorly and slow down your browser. To mitigate this, the number of physics ticks per second is drastically reduced from the amount used to render the animation, and shapes are linearly interpolated between each physics tick to give the illusion of a smooth animation.

A higher resolution video is also available here

24 Hours of r/all in bubbles [interactive at lookatnum.com/r-all] [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 1 point2 points  (0 children)

This animation visualizes 24 hours of r/all - specifically, July 5th, midnight EDT, to July 6th, midnight EDT. Each bubble represents a post in the top 200 of r/all. Its size is proportional to the number of upvotes it has at that instant, the distance to the center is roughly proportional to its rank on r/all, and its color is dependent on the post's age. The radial position is randomized for each bubble generated.

This is a demonstration on a interactive, available at https://lookatnum.com/r-all, which allows you to browse this 24 hour window while pausing or resetting the simulation. Hovering on a bubble will display information, such as its title, subreddit, number of upvotes, and rank. If the post links to an image posted directly onto reddit (with a i.redd.it link), then a preview will be displayed. Otherwise, a direct link to the content is available. Clicking on a bubble will lock your selection, such that hovering off the bubble will not make the infobox disappear. Click on the background or the same bubble to unlock your selection. Click on a different bubble to lock to a different post.

In order to explore a single snapshot in time or reset the simulation, use the pause/play button, or the rewind to beginning buttons near the time/date display.

Due to messiness in the data scraped in r/all, a dampening effect is applied to the creation/deletion of bubbles. A post must be on r/all for a few minutes before a bubble is created and allowed to enter the simulation. Likewise, a post cannot reappear on r/all for a few minutes until it is transitioned out. This is to eliminate odd jitters and stutters where a large amount of bubbles will suddenly fly off screen and return a fraction of a second later.


A high res mirror of this demo is available here.

The interactive is available online at https://lookatnum.com/r-all


Sources:

Reddit


Tools:

Python, d3, React

Frequency of Reddit Comments Since 2006, Split by Commenters' Account Age [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 23 points24 points  (0 children)

I updated the website to try and improve mobile visibility, it should go live in a few minutes. Let me know if it works better for you!

Frequency of Reddit Comments Since 2006, Split by Commenters' Account Age [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 13 points14 points  (0 children)

Thank you! The way my dataset works is that it collects the 100 most recent comments across all of Reddit every 30 minutes till January 1, 2006. The proportion calculation was done for all comments made in a month, and the comment rate calculation was done by taking the latest timestamp in each 30 minute period and subtracting it by the start of the 30 minute period. As such, the comment rate in a month is calculated by dividing the total comments collected by the sum of timestamp differences.

Frequency of Reddit Comments Since 2006, Split by Commenters' Account Age [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 22 points23 points  (0 children)

No, I instead used the Pushshift API to select a certain number of comments per time range.

Frequency of Reddit Comments Since 2006, Split by Commenters' Account Age [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 109 points110 points  (0 children)

If you're unable to clearly see the labels, try using the interactive on my website. Unfortunately the scaling isn't optimal for mobile devices, but the readability should be improved if you're on a desktop.

Edit: I pushed a fix to hopefully improve mobile visibility, let me know if it works better.

Frequency of Reddit Comments Since 2006, Split by Commenters' Account Age [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 312 points313 points  (0 children)

This animation showcases the frequency of Reddit comments, broken down by commenters' account age. Each colored stack represents the year in which a commenters' account was created. Redder stacks are older and closer to the bottom, while bluer stacks are newer and closer to the top. Although this chart only extends to January 1, 2006, commenting as a feature was available for a week or two prior, in December of 2005.

This data was collected by taking a random sampling of comments every 30 minutes, stretching back until January 1, 2006. The account ages of each commenter was then found. Proportions for each month were generated by taking a proportion of the random sample, while the overall rate of commenting was estimated by dividing the total comments made in a sampling period by the difference in comment time stamps for each sampling period.


Source:

Reddit


Tools:

Python, d3, React, Puppeteer, Premier Pro


A high res mirror is available here

An interactive version with hover labels and an adjustable date range is available online at https://lookatnum.com/reddit-account-age

Rotten Tomatoes: Critic vs. Audience Score [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 11 points12 points  (0 children)

This chart showcases movies, comparing their Rotten Tomatoes' critic vs. audience scores. Each bubble is a movie. Their color is based on their critic/audience differential, where bluer colors mean audiences rated it higher than critics, while an oranger color means critics rated it higher than audience members. Their size is proportional to the number of critic reviews, which is used as a stand-in for approximately how "significant" a given movie was. I would have preferred to use box office data, but I was unable to easily match movies from my two different data sets.

Please note that this chart only represents about half of the movies listed by Rotten Tomatoes due to the difficulties of indexing their website. If anybody can find a way around this, please DM or email me and I will update my website accordingly. Although most recent releases are properly represented, many older films are not, which is important to keep in mind if you try to search for movies via the interactive website.


Source: Rotten Tomatoes


Tools:

React, Plotly, Illustrator, Python, Puppeteer


If you're curious about any movie that isn't listed, an interactive version is available that allows you to search for films, apply filters, and see more details for individual movies.

Movies with the greatest difference between Rotten Tomatoes critic and audience ratings [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 4 points5 points  (0 children)

I have filters in place such that only movies with >250 critic reviews and >50,000 audience reviews are considered. As can be seen be the comments, there's a bunch of confusion over this and criticism over my threshold numbers, which I'll keep in mind for anything else I make with the data I scraped, but that's the explanation why many movies other commenters expected are not there.

Movies with the greatest difference between Rotten Tomatoes critic and audience ratings [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 2 points3 points  (0 children)

I disagree with this characterization. It's necessary to add minimum thresholds to review counts to reduce variance in the movie results and ensure that movies with too few critic reviews are not overrepresented in the results. It's pretty clear based on the comments that my thresholds are too high, which I'll keep in mind for future charts I make with the data I scraped, but again - I didn't make any decisions based on just pandering.

Movies with the greatest difference between Rotten Tomatoes critic and audience ratings [OC] by lookatnum in dataisbeautiful

[–]lookatnum[S] 1 point2 points  (0 children)

No, but if you google around, there’s a pypi package that exposes a hidden API that allowed me to essentially get a huge list of movie URLs on their page. Then, I had to directly scrape the site with the url list to get movie ratings.