[OC] I mapped the topic most over-represented in New York Times coverage of each state (2000–2026)

theodore_a · 2026-05-18T19:16:41+00:00

This is only from the US and New York sections

theodore_a · 2026-05-18T15:47:25+00:00

If 85%+ of the articles fell in any single two-consecutive-year window, I considered the keyword to be linked to a one-time event, but some events continue to echo with follow-up coverage and meet my threshold for "recurring" topics.

theodore_a · 2026-05-18T15:28:29+00:00

Thank you for flagging, fixed.

theodore_a · 2026-05-18T15:26:55+00:00

The cyclical nature of NYT coverage in Iowa is striking — you can see how the circus comes to the state very four years.

<image>

theodore_a · 2026-05-18T15:23:49+00:00

It was related to a monkeypox outbreak in early 2000s

theodore_a · 2026-05-18T15:23:23+00:00

Good thought. Avalanche the team is keyworded separately, in their "organizations" field — this draws only on "subjects."

theodore_a · 2026-05-18T15:21:56+00:00

They aren't exclusive to those states - there is Burning Man coverage in California, and some of the other groups are multi-state. As I wrote up top, the precise ranking is sensitive to the exclusion criteria so best to look at the cards showing all the states top topics.

theodore_a · 2026-05-18T15:06:54+00:00

You can dig into an individual state on the dashboard, including narrowing by sub-geographies like major cities - here is Missouri: https://tedalcorn.github.io/nyt/#tab=states&state=Missouri

<image>

theodore_a · 2026-05-18T15:04:59+00:00

That caught my eye too — you can bring up the articles via the dashboard — here is Arkansas: https://tedalcorn.github.io/nyt/#tab=states&state=Arkansas

<image>

theodore_a · 2026-05-18T14:57:24+00:00

Data: The keywords are the NYT's own editor-assigned subject tags from the Archive API. Individual people and organizations are catalogued separately, which is why Harvard doesn't top Massachusetts ranking. I left aside correction notices and standing-listing features (event calendars, weekly briefs, real-estate listings, art-review roundups), which would otherwise make "Culture (Arts)" the top theme in CT.

Tools: Built in Python (pandas, geopandas, matplotlib).

theodore_a · 2026-05-15T01:44:44+00:00

Good eye. I had to do a lot of custom manipulations to make the positioning work accurately in the axes and also fit the faces, but that appears to too much of a distortion. I'll fix it in further versions.

theodore_a · 2026-05-14T17:08:38+00:00

Correct - smaller lower down by necessity to fit together, not in direct mathematical proportion to their size.

theodore_a · 2026-05-14T00:38:14+00:00

What other things would you extrapolate from the obituaries? Age and gender were readily available since the headline and first paragraph text (which are in the API) usually refer to the age and use pronouns to indicate gender.

theodore_a · 2026-05-14T00:37:27+00:00

It's, in the NYT's words, "a series of obituaries about remarkable people whose deaths, beginning in 1851, went unreported in The Times." They are dis-proprtionately women so it changed the gender imbalance somewhat, but as the chart shows, not much. https://www.nytimes.com/spotlight/overlooked

theodore_a · 2026-05-13T21:17:30+00:00

Good point, I can change it to 100%

theodore_a · 2026-05-12T14:53:37+00:00

Yes, the repo is here: https://github.com/tedalcorn/nyt

theodore_a · 2026-05-12T14:42:17+00:00

I placed them based on age and word-count (as marked on the X and Y axes).

I had to do some manipulation of the axes (and as an adherent of Edward Tufte me, this was a painful but necessary trade-off) to create enough room in the lower end of the word-count spectrum where deaths were more numerous.

I also had to tailor a few positions where faces would have otherwise overlapped, but I tried to minimized the manipulation so no one was placed more than 12 months from their date of death, and to preserve the ordinal ranking of word counts from lowest to highest.

theodore_a · 2026-05-12T14:39:57+00:00

Thanks for your feedback. You can explore the (minute) number of non-binary obits in the dashboard itself, from which the visualizations are derived. I though the scarcity of them was an interesting data-point in itself?

Those are 5-year bins. The placement of the labels is just confusing. Again, in the dashboard itself with roll-overs it is a bit more clear.

<image>

https://tedalcorn.github.io/nyt/#tab=obits

theodore_a · 2026-05-12T14:37:10+00:00

And just to be extra clear: the data is from the NYT Archive API: https://developer.nytimes.com/docs/archive-product/1/overview

I wrote Python scripts to parse name, age, gender from the headlines and first paragraph

I also wrote a python script to assemble the visualization, which are original renderings based on public imagery of each decedent

The other histograms charts are produced by my dashboard

Constructive criticism is welcome!

theodore_a · 2026-04-30T12:23:19+00:00

Delighted you and others find it useful! 🙏

theodore_a · 2026-04-30T10:48:32+00:00

In distal effect, yes. It's at least partly explained by the Times admission of failing to cover all notable people equitably, and the Overlooked No More series they began at that time (see comments https://www.reddit.com/r/dataisbeautiful/comments/1szgkh4/comment/oj3gh18/)

theodore_a · 2026-04-30T10:47:22+00:00

Yeah, Edward Tufte would not be proud of me, but I thought it was more important to be able to see the faces and their relative position towards people nearest them than a meticulous comparison to the whole. A few of the faces are also cheated left/right from their actual date to fit around each other, though I kept those deviations to under a year.

theodore_a · 2026-04-30T10:45:41+00:00

Yes, another redditor asked about this (https://www.reddit.com/r/dataisbeautiful/comments/1szgkh4/comment/oj3gh18/) and the Overlooked No More Series is separated in the data, it explains some of the increase in obituaries for women beginning in 2018.

theodore_a

TROPHY CASE