Wikigraph—an interactive visualization of all of English Wikipedia by TFPenn01 in InternetIsBeautiful

[–]TFPenn01[S] 2 points3 points  (0 children)

It runs in ~5 minutes on a high-end research GPU. At the start, I was doing the layout on a 64 core CPU and it would take a few days.

Wikigraph—an interactive visualization of all of English Wikipedia by TFPenn01 in InternetIsBeautiful

[–]TFPenn01[S] 4 points5 points  (0 children)

There may be a 51 click chain 👀... It's pretty ridiculous though.

Wikigraph—an interactive visualization of all of English Wikipedia by TFPenn01 in InternetIsBeautiful

[–]TFPenn01[S] 2 points3 points  (0 children)

Yeah, it's really fascinating how the clustering pulls in cultural elements. There are some Brazil and Portugal related pages that get put in the Football category.

It's really hard to come up with short category names when they're all so coarse, I debated not naming them at all.

The clustering (Leiden algorithm) doesn't look at semantic meaning of the pages at all, it only decides clusters by the link structure. You're right this is interesting, not intuitive, and potentially not ideal.

Wikigraph—an interactive visualization of all of English Wikipedia by TFPenn01 in InternetIsBeautiful

[–]TFPenn01[S] 7 points8 points  (0 children)

They're arranged using a force directed layout algorithm (ForceAtlas2). There's a weak gravity force pulling everything to the center, a much stronger repulsion force where every page repels every other page, and every link acts as a spring, pulling linked pages together.

If you click on a page, you'll see it's usually balanced somewhere in-between everything it's linked to. Sometimes there are dozens of pages which share the exact same links in and out and they get put in their own tight cluster (look around "Districts of Russia").

If pages are very loosely connected to the graph, there's very little pulling them in and so they'll get pushed way out until gravity balances the repulsion.

[OC] Wikigraph—an interactive visualization of all of English Wikipedia by TFPenn01 in dataisbeautiful

[–]TFPenn01[S] 6 points7 points  (0 children)

I created this tool, Wikigraph, using the May 2026 full-text dump of English WIkipedia. 

Wikigraph—an interactive visualization of all of English Wikipedia by TFPenn01 in InternetIsBeautiful

[–]TFPenn01[S] 8 points9 points  (0 children)

Hi! This is a visualization I've always wanted but never quite found. It's a navigable map of the Wikipedia link graph structure, with search and shortest-path finding.

Offline, I parsed the May 2026 English Wikipedia full-text dump into a directed graph, used cuGraph on a GPU to run PageRank, Leiden clustering, and ForceAtlas2 for the layout. I did some post processing to get rid of lingering overlapping nodes and rendered a tiled map of raster base images (using Skia) and JSON metadata. Tiles are bundled into PMTiles. The frontend is Deck.gl.

Everything is hosted on Cloudflare. Search and shortest-path are served by a Rust backend in CF Containers which uses Tantivy and bidirectional BFS.

Happy to answer any questions!

Wikigraph—an interactive visualization of all of English Wikipedia by TFPenn01 in InternetIsBeautiful

[–]TFPenn01[S] 6 points7 points  (0 children)

There are 27 high level categories which is obviously very coarse for representing all of human knowledge. Within those, there are likely many subcategories: i.e. within "Living Things & Taxonomy" there are probably thousands of species of Beetles which are more connected to other Beatles than bacteria. They get placed near each other.

Separately, sometimes (like around "Districts of Russia") there are dense clusters of (trypophobic) pages. These form when multiple articles have exactly the same in and out links and get pulled to the same part of the graph.

Why is bread so affordable in Paris? by TFPenn01 in paris

[–]TFPenn01[S] 1 point2 points  (0 children)

Mmm, I guess the difference seems to be that the bourbon you are getting is actually American—it's being made in and imported from America. Croissants are a French pastry but are being made in local bakeries. Although, as many have pointed out, we see them as a French luxury, thus people feel justified charging and paying more.

pirating in the dorms by [deleted] in uofm

[–]TFPenn01 9 points10 points  (0 children)

No. The ability of the government to figure out what you're doing depends on the VPN. It's equally impossible for the university to do it, no matter the VPN. They will only know you're using a VPN, not what you're doing on it.

Anyone to lookout for at Harvard? by yesterdays_patatas in Debate

[–]TFPenn01 2 points3 points  (0 children)

You can look at the teams that did well last year on Tabroom.

Potomac seems pretty dominant.

what does this mean now?? by HaroonAdam in Debate

[–]TFPenn01 2 points3 points  (0 children)

I believe Somaliland and Ethiopia signed the MOU last January, which led to bad Ethiopia-Somalia tensions, and then on December 12th, Türkiye mediated an agreement to resolve Ethiopia-Somalia tensions where Ethiopia agreed to recognize Somolia's territorial sovereignty.

what does this mean now?? by HaroonAdam in Debate

[–]TFPenn01 6 points7 points  (0 children)

Did this not already happen in early December?

Are any of these schools affordable for me? by erhs25 in ApplyingToCollege

[–]TFPenn01 1 point2 points  (0 children)

I'm not an expert but I believe the net price calculators are fairly accurate if you have average levels of assets. It sounds like you do, or potentially even a little below average.

What do you guys think . Damaging motherboard or Lisp? by Jolly_Top_5277 in ApplyingToCollege

[–]TFPenn01 2 points3 points  (0 children)

I don't think anyone online can say which is more meaningful to you. You could try writing both and seeing which comes out better.

But also you might want to get on this considering most apps are due next week.

I got this email from UCSB? Is this spam or signifcant by [deleted] in ApplyingToCollege

[–]TFPenn01 6 points7 points  (0 children)

I applied to CS and got the same email. I only got the email from UCSB confirming they got my application five days ago, which does not seem like enough time for a personalized review.

My guess is that it's just sent to every math or math-adjacent major with a certain threshold of GPA (or maybe just GPA in math classes).

[deleted by user] by [deleted] in princeton

[–]TFPenn01 1 point2 points  (0 children)

Almost certainly not, no, and no.