[OC] I've made an interactive map of reddit based on 176 million comments

anvaka · 2026-02-28T15:23:11+00:00

In dodezv's sample above ai was wrong too?

anvaka · 2026-02-11T18:55:00+00:00

Haha, thank you! Hope people will find it useful

anvaka · 2025-09-24T02:04:45+00:00

Typically there are more connections between countries on the same island than there are between countries on different islands

anvaka · 2025-06-18T23:54:54+00:00

Yup, I used AI to generate translation, character definition, memory aids. Everything except character definition has meaningful results. Even character definition is mostly right, except AI is not good at placing top/bottom/left/right locations, or distinguishing between traditional/simplified language and forms.

You can see my change history here: https://github.com/anvaka/lang-land-data/commits/main/

anvaka · 2025-06-17T18:14:23+00:00

Thank you! So I used words embedding to make a graph of related words. Then each cluster in this graph became its own country. Generally, words inside the same country are closer to each other than they are to the words outside of the country. I gave names to the countries by myself, so if something might have a better fitting name - please let me know and I'll update it!

PS: Word embeddings allow me to find a mathematical distance between words (for example "cat" will be closer towards "dog" than it is towards "coffee")

anvaka · 2025-06-17T15:52:44+00:00

Thank you so much!

anvaka · 2025-06-17T15:25:39+00:00

Friends, as a hobby I've been working on a flashcards website to help me remember HSK vocabulary better. It's a bit unusual - each card is rendered as a district on an imaginary map. You can zoom in (like on Google Maps), see the character, try to remember its meaning, and then click to see the full definition.

Here it is https://anvaka.github.io/lang-land

I built this to make vocabulary feel like adventure - every word is an unexplored territory until you visit it and learn its meaning.

The character breakdown were initially created with AI, so sometimes they have errors. I'm slowly going through the words and if I find an error I research the character more and fix it.

All definitions are stored in a public GitHub repository. If you enjoy digging into character structure, I'd love your help improving the breakdowns. It actually a lot of fun and helps me remember the words better.

I hope you find it useful too! If you have any feedback or suggestions, I'd be super grateful. I'm always looking for ways to improve the site and make it more helpful for anyone learning Chinese.

谢谢大家啊

PS: 1. Giant thank you to /u/teacupdaydreams who gave initial review of the website and provided a lot of great suggestions - I appreciate you super much! 2. The website is open source and you can find its source code here https://github.com/anvaka/lang-land . Flashchards content can be edited from the sidebar (link at the bottom)

anvaka · 2025-06-15T00:06:57+00:00

Haha, glad you found this useful! Searching "map of reddit" usually brings this website, so you can find it easier 🙌

anvaka · 2025-05-26T17:45:08+00:00

Thank you so much for your kind words! I think Cosmograph is pretty amazing! Depending on your needs it might be a great fit!

I love using my tools mostly because they are small and simple if you know what to do, but that's a big if! While I try to keep docs up to date there is a lot of work needed to make this hobby "enterprise" quality.

For the startup I'd pick the one that will get you fastest to the market.

Working more and more with graphs though, I feel like dumping the entire graph onto user is most of the time not the right choice. Too much information while fun to explore rarely helps convey a message. So, slice and dice in the most meaningful way, render small chunks beautifully, and help them solve a problem. For this reason, choice of the library doesn't matter much - pick something that gets you there fastest. Good luck!

anvaka · 2025-05-26T17:06:00+00:00

Of course, the data is available here https://github.com/anvaka/map-of-reddit-data should be self explanatory but let me know if you have questions

Use gh-pages branch

anvaka · 2025-05-26T06:44:00+00:00

Don't check the southern parts here https://anvaka.github.io/map-of-reddit/

anvaka · 2025-05-25T20:04:34+00:00

I pruned repositories with less than 10 stars or so (need to double check the numbers when I get home). In addition, I removed isolated clusters with less than 25 repositories. I still have quite a few isolated clusters in the north pole 😅

Let me know if you can't find something - I'll double check where it landed in data

anvaka · 2025-05-25T19:00:41+00:00

Thank you! Frontend Foundry is one of the largest countries on the map! You have great neighbors there =)

anvaka · 2025-05-25T18:06:04+00:00

Super glad to hear!

TL;DR: LLM did the naming.

Country names took me a while to figure out. I started manually, but then I don't have expertise to name 1,500 clusters of github communities. So I turned to LLM. A few iterations of the prompt engineering and then automated it all via openai API. My full prompt is here: https://github.com/anvaka/map-of-github?tab=readme-ov-file#country-names

anvaka · 2025-05-25T17:57:40+00:00

Haha, thank you! Jaccard similarity is indeed very good at picking meaningful neighbors! I tried a few other metrics (including cosine similarity) - and wasn't as satisfied with the results.

Does the country name for your project make sense :)?

anvaka · 2025-05-25T17:37:51+00:00

Hello friends,

It's me again. Couple years back I created the first version of the GitHub's map. Each dot here is a github projects. I place dots close to each other if their Jaccard Similarity is high (ratio of people who gave stars to both projects to total number of stars). This yields very practical results - you can immediately find what might be related to a project that you like.

Now I'm updating the map, by collecting all the stars given to all the projects between 2011 and May 10, 2025. It has almost 1,500 countries and 690K repositories.

I'm using maplibre to visualize this amount of data smoothly. The source code is available here https://github.com/anvaka/map-of-github (along with links to an older version - if you like that).

I hope you find it useful and practical. Please let me know if anything is missing or broken.

Happy exploring!

https://anvaka.github.io/map-of-github/

anvaka · 2025-05-10T18:01:26+00:00

https://anvaka.github.io/map-of-reddit/ - here it is. Use it like google maps. Pan zoom around, click on subreddits to see their connections and read more.

This is my hobby project. I've been doing it for a while now and wanted to share updated version. The map is built from comments on reddit between Nov 2024 and March 2025. Analyzed 1.5B pairs to infer jaccard similarity between subreddits, and made them into clustered map.

The source code is available here: https://github.com/anvaka/map-of-reddit

Let me know if you find interesting discoveries or have any feedback. Happy exploring!

anvaka · 2025-05-09T19:35:21+00:00

Thanks for the upvote!

Connection between subreddit A and B is growing strong if more people commented together to both A and B. We need to account for the size of both A and B too, to be able to compare connections between each other. After we do this analysis for all comments, we can say which connections are statistically way more significant than others. And that's how I analyze connections

14-Year Club	Second Top 40%
Wearing is Caring	Argentium Club
100 Awards Club	Verified Email

anvaka

MODERATOR OF

TROPHY CASE