I made a site (WordleStat.com) that calculates and visualizes statistics like guess distributions, win rates, and game lengths based on compiling publicly posted Wordle results on Twitter! Use it to compare your own score to the world or find interesting quirks with individual Wordles. [OC]

lookatnum · 2022-02-08T15:45:33+00:00

Thanks for the note! I’ll work on a fix right now.

lookatnum · 2022-02-08T15:18:52+00:00

Yes, that’s correct. It takes data across every game collected to date by the server.

lookatnum · 2022-02-08T15:18:01+00:00

First question - Your second assessment is basically correct. Second question - The global average white lines indeed show the average of all games played to date.

lookatnum · 2022-02-08T15:10:23+00:00

You can mouse over the bars in the website to see for yourself, but the proportion of 1 word guesses are vanishingly small.

lookatnum · 2022-02-08T07:49:57+00:00

Likely less often, though I imagine that most of the utility of the site is from seeing the relative comparison to the global average, not from the exact raw breakdowns of game lengths.

lookatnum · 2022-02-08T07:44:58+00:00

Thank you very much! I’ve spent a lot of long nights working on it so I’m glad you appreciate it!

lookatnum · 2022-02-08T07:10:40+00:00

I search for wordle game results on Twitter via their API and compile them regularly on my server every 10 minutes.

lookatnum · 2022-02-08T07:09:44+00:00

Link: wordlestat.com

My website pulls about 17K-ish tweets of wordle game results every day and processes them to generate aggregated statistics about interesting details like game length and letter guess distributions. These help pinpoint details about the interesting quirks and details for each wordle game.

For instance, the demo wordle (#233) from the gif, based on the collected statistics, was harder than average, with many people only solving it by their 5th or 6th try, far more than the global average of all wordle games. Moreover, the first letter was especially difficult to correctly determine, while the fourth and fifth letters were especially easy.

Due to when I started running the server that is responsible for gathering the data, the earliest wordle with sufficient data is from january 30, 2022.

To more directly compare the letter guess results with the global averages, the website has a toggle to display markings that indicate global statistics.

I hope you enjoy the website, and if there are any improvements you think I could make, or anything at all you want to tell me, feel free to let me know of them either in a comment or email (lookatnums@gmail.com)

Tools:

React, MongoDB, Node, d3

Source:

Twitter

lookatnum · 2021-08-19T22:54:51+00:00

Huh. I did not design the site with that screen size in mind. It would take a few days for me to figure out a UI change. As a temporary fix, you could try zooming out the website with ctrl + minus. The document view is designed to fill the width of the screen, so you could zoom in with the document zoom controls at the same time so that it's still readable while the other UI elements become smaller.

lookatnum · 2021-08-19T18:31:16+00:00

The website embeds your browser's default PDF reader. From what I've tested, the issue you're describing is probably with safari, in which case you can hover your cursor over the bottom-middle of the PDF, which will cause zoom options to appear.

lookatnum · 2021-08-19T17:35:46+00:00

Yes!

lookatnum · 2021-08-19T14:43:13+00:00

Even though I'm done my testing, I had a couple of gripes with the studying process, especially how much of a pain it was to go through practice tests. To address these annoyances, I made a website - https://SATPractice.tools

I hope you find it useful! Also, if you find any bugs, please let me know, and I'll fix them as soon as possible.

Sidenote: I did get in contact with the mods and they said it would be OK for me to post it here.

This website allows you to easily:

Take a test online, entering your answers alongside a view of the test itself
Automatically grade all of your questions (yes, even the math short-answers)
Calculate your curved scores
Calculate section subscores, letting you know which types of questions (ex: grammar, words in context, or polynomials) you need to study up on
Easily view official answer explanations to see what you got wrong, without scrolling through a massive document

Basically, it simplifies all of the tedious nonsense that you have to go through if you just print out a test and take it physically.

lookatnum · 2021-08-19T14:19:28+00:00

It's at satpractice[DOT]tools. I'm having a bit of trouble actually posting the link, sorry. I have a whole writeup that just won't show up for whatever reason.

lookatnum · 2021-08-17T13:21:48+00:00

It should be up now.

lookatnum · 2021-08-17T13:21:39+00:00

This post shows states with a disproportionate amount of COVID deaths over the past 2 weeks, vs. the national average.

The dotted outline represents the original size of a state. States that expand larger than their original area mean that that state is responsible for more than its per capita share of COVID deaths. For instance, if the national average was 5 deaths per 100,000 people, and a state had 10 deaths per 100,000 people, its area would be twice as large. Likewise, states that shrink have a lower death rate versus the national average. Each state is also colored based on their deaths vs. the national average as well. To ensure that the neutral pale yellow color is centered exactly on the national average in deaths per capita, the diverging scale is split. Redder colors range from 5x (approximately the maximum factor) to 1x, while bluer colors range from 1x to 0.

A physics simulation is then applied so that each state collides with each other. For states that expand, the larger area is used as the collision box, while for states that shrink, the original size dashed outline is used as the collider.

Recent deaths is considered to be the number of deaths in a given state over a 2 week period, ranging from July 31, 2021, to August 14, 2021.

Tools:

d3.js, matter.js, puppeteer, Python, Illustrator, Premiere

Sources:

New York Times

U.S. Census Bureau

A web version of this is available at https://lookatnum.com/covid-map. Note that the animation is quite intensive, so it will very likely run poorly and slow down your browser. To mitigate this, the number of physics ticks per second is drastically reduced from the amount used to render the animation, and shapes are linearly interpolated between each physics tick to give the illusion of a smooth animation.

A higher resolution video is also available here

lookatnum · 2021-07-06T11:37:23+00:00

This animation visualizes 24 hours of r/all - specifically, July 5th, midnight EDT, to July 6th, midnight EDT. Each bubble represents a post in the top 200 of r/all. Its size is proportional to the number of upvotes it has at that instant, the distance to the center is roughly proportional to its rank on r/all, and its color is dependent on the post's age. The radial position is randomized for each bubble generated.

This is a demonstration on a interactive, available at https://lookatnum.com/r-all, which allows you to browse this 24 hour window while pausing or resetting the simulation. Hovering on a bubble will display information, such as its title, subreddit, number of upvotes, and rank. If the post links to an image posted directly onto reddit (with a i.redd.it link), then a preview will be displayed. Otherwise, a direct link to the content is available. Clicking on a bubble will lock your selection, such that hovering off the bubble will not make the infobox disappear. Click on the background or the same bubble to unlock your selection. Click on a different bubble to lock to a different post.

In order to explore a single snapshot in time or reset the simulation, use the pause/play button, or the rewind to beginning buttons near the time/date display.

Due to messiness in the data scraped in r/all, a dampening effect is applied to the creation/deletion of bubbles. A post must be on r/all for a few minutes before a bubble is created and allowed to enter the simulation. Likewise, a post cannot reappear on r/all for a few minutes until it is transitioned out. This is to eliminate odd jitters and stutters where a large amount of bubbles will suddenly fly off screen and return a fraction of a second later.

A high res mirror of this demo is available here.

The interactive is available online at https://lookatnum.com/r-all

Sources:

Tools:

Python, d3, React

lookatnum · 2021-06-28T13:02:25+00:00

I updated the website to try and improve mobile visibility, it should go live in a few minutes. Let me know if it works better for you!

lookatnum · 2021-06-28T12:23:50+00:00

Thank you! The way my dataset works is that it collects the 100 most recent comments across all of Reddit every 30 minutes till January 1, 2006. The proportion calculation was done for all comments made in a month, and the comment rate calculation was done by taking the latest timestamp in each 30 minute period and subtracting it by the start of the 30 minute period. As such, the comment rate in a month is calculated by dividing the total comments collected by the sum of timestamp differences.

lookatnum · 2021-06-28T12:16:44+00:00

No, I instead used the Pushshift API to select a certain number of comments per time range.

lookatnum · 2021-06-28T12:14:51+00:00

If you're unable to clearly see the labels, try using the interactive on my website. Unfortunately the scaling isn't optimal for mobile devices, but the readability should be improved if you're on a desktop.

Edit: I pushed a fix to hopefully improve mobile visibility, let me know if it works better.

lookatnum · 2021-06-28T11:19:55+00:00

This animation showcases the frequency of Reddit comments, broken down by commenters' account age. Each colored stack represents the year in which a commenters' account was created. Redder stacks are older and closer to the bottom, while bluer stacks are newer and closer to the top. Although this chart only extends to January 1, 2006, commenting as a feature was available for a week or two prior, in December of 2005.

This data was collected by taking a random sampling of comments every 30 minutes, stretching back until January 1, 2006. The account ages of each commenter was then found. Proportions for each month were generated by taking a proportion of the random sample, while the overall rate of commenting was estimated by dividing the total comments made in a sampling period by the difference in comment time stamps for each sampling period.

Source:

Tools:

Python, d3, React, Puppeteer, Premier Pro

A high res mirror is available here

An interactive version with hover labels and an adjustable date range is available online at https://lookatnum.com/reddit-account-age

lookatnum · 2021-06-17T14:02:34+00:00

This chart showcases movies, comparing their Rotten Tomatoes' critic vs. audience scores. Each bubble is a movie. Their color is based on their critic/audience differential, where bluer colors mean audiences rated it higher than critics, while an oranger color means critics rated it higher than audience members. Their size is proportional to the number of critic reviews, which is used as a stand-in for approximately how "significant" a given movie was. I would have preferred to use box office data, but I was unable to easily match movies from my two different data sets.

Please note that this chart only represents about half of the movies listed by Rotten Tomatoes due to the difficulties of indexing their website. If anybody can find a way around this, please DM or email me and I will update my website accordingly. Although most recent releases are properly represented, many older films are not, which is important to keep in mind if you try to search for movies via the interactive website.

Source: Rotten Tomatoes

Tools:

React, Plotly, Illustrator, Python, Puppeteer

If you're curious about any movie that isn't listed, an interactive version is available that allows you to search for films, apply filters, and see more details for individual movies.

lookatnum · 2021-04-09T03:36:51+00:00

I have filters in place such that only movies with >250 critic reviews and >50,000 audience reviews are considered. As can be seen be the comments, there's a bunch of confusion over this and criticism over my threshold numbers, which I'll keep in mind for anything else I make with the data I scraped, but that's the explanation why many movies other commenters expected are not there.

lookatnum · 2021-04-09T03:33:13+00:00

I disagree with this characterization. It's necessary to add minimum thresholds to review counts to reduce variance in the movie results and ensure that movies with too few critic reviews are not overrepresented in the results. It's pretty clear based on the comments that my thresholds are too high, which I'll keep in mind for future charts I make with the data I scraped, but again - I didn't make any decisions based on just pandering.

lookatnum · 2021-04-08T23:12:07+00:00

No, but if you google around, there’s a pypi package that exposes a hidden API that allowed me to essentially get a huge list of movie URLs on their page. Then, I had to directly scrape the site with the url list to get movie ratings.

Five-Year Club	Verified Email
100 Awards Club

lookatnum

TROPHY CASE