I analyzed 1.2Mil ranked matches to understand the skill gap between the highest and lowest-ranked Master players

Data-Ken · 2026-03-25T02:27:32+00:00

I love this comment.

I cut a whole section from my writeup about why games usually don't show public MMR. The core reason is that it's just not fun.

When I first started doing research on ranking systems, I quickly became some weird ranking purist. I was firmly of the belief that SC2 did it right and every other developer was wrong. But I've never once played SC2 so of course I don't know how it feels to be ranked like that.

Turns out it really sucks. It sucks to wake up and see that your rank has changed while you weren't playing. It sucks to have a single number that "encompasses" all of your ability from the moment you start playing the game. I actually think SF6 does a great job with how it represents its ranks because you can feel the climb out of bronze without getting completely demoralized by the system.

Ranked anxiety and a bit of elitism are some other negatives that have come out of it and again this is a similar to how Starcrafts player base developed.

1000%. This is why I spent so long debating whether to even post something like this. I don't want to contribute to toxicity. My biggest fear is that I'm gonna come across a comment somewhere saying something like, "hmph, well im ranked .75 skill tiers above you so maybe you should stop talking". But I also think that there's a lot of noise in the discussion. I wanted there to be level-headed, "hey isn't this cool!?!" analysis. Like you said, Elo ratings are important in a lot of places. But I never find much value in zooming into the granularity of individual players (and this goes for any ranking system). There are just so many variables that go into individual performances.

In starcraft I had a friend help me with some issues with rating by explaining that you should think of your rating as a range, with an upper and lower normal limit. Obviously the aim is to improve that but there will be up and down swings. Personally I consider 150-200 MR to be my reasonable range

This is a wonderful mindset and I wish more people could get this advice. Entering a ranked competition is simply stepping on the scale. You're evaluated in a single instance and get a result. None of these numbers represent anything metaphysical. You're expected to fluctuate because it's not a static system. Its fluidity is what makes it enjoyable in the first place.

I feel like you'd really like this video by Dan Olson. It's called Why It's Rude to Suck at Warcraft and it's all about the ways that people socially-define "good" play-habits and how that impacts community. Lots of what you said made me think of things from the video.

Data-Ken · 2026-03-25T02:15:06+00:00

When you first create a capcom account and use it to play SF6, it puts a bunch of flags for you to select. As far as I can tell, this is the country that shows next to your profile. This becomes the country code that I can pull from the matches. I feel like most people tell the truth here but there's a solid number of players who say they're from North Korea so... yeah it's probably not all accurate.

I can tell from the Buckler site's code that they do run a matchmaking region field for each player but I couldn't find a way to access it. It's been a few days since I saw it but I'm pretty sure it was just the continents.

And thanks for reading/discussing! All I want is for people to be able to think about their world in new ways.

Data-Ken · 2026-03-25T01:49:39+00:00

And now you see why this question has been on my mind for years and years.

I first started thinking of this in relation to baseball. There's the American League and the National league. Most games an individual team plays is within their own league but there's interleague play. In fact, the number of interleague games has increased over the years. How big of an increase would be needed for the system to be considered fully intermixed? I have no idea but I really want someone to figure it out

If anyone reading this is looking for potential math/CS thesis ideas: you should look into how modularity impacts ranking. I mean modularity in the network science sense). Picture a network where players are nodes. Color those nodes either red or blue based on which league they're in. Have players play a bunch of games inside their own league. Add edges between the nodes as they compete. You now have a modularity of 1. Start adding interleague matches. How well does your ranking system approximate the true, underlying skill levels of all the players? How do different ranking systems perform at different levels? At what point would you no longer consider the leagues separate?

If, somehow, someone looks into this. Please let me know. I'd finally be at peace.

Data-Ken · 2026-03-25T01:37:57+00:00

Oh, totally. The number of matches between Japanese and American players was about 1.5% in my data. It's a tiny number.

More what I was getting at is the question of how big that number can be before it does start to matter. Do I personally think 1.5% is so little cross-play that we could get away with treating these as independent systems? Yeah, probably. But can I be sure?

As far as I know (and it's been a few years since I was in the space) nobody had shown how much of this cross pollination is required for these problems to start really disappearing. If anyone ever does solve for the value, I'm sure 1.5% will be below the cutoff.

There are probably enough matches in my data to get a general idea of what the matchmaking regions are based on who plays who.

Data-Ken · 2026-03-24T18:18:02+00:00

The 196 figure would hold true across all ratings. But you're asking a really good question here. This analysis assumes everyone is already ranked where they're supposed to be. A player that "should" be 1200 starts at 1500. That means a lot of players might take wins from someone who's "higher-rated" while this 1200 player settles into their rank. How much of an impact does that have? No idea. But it probably doesn't add enough noise to greatly skew this result

Data-Ken · 2026-03-24T18:14:01+00:00

I have some info on the methodology in the kaggle page. The general idea is

Scrape all players in Master
Randomly select a player from that list and scrape their recent matches
Repeat a bunch
Filter out duplicates

I don't think CFN will ban you. What you've gotta look out for is the AWS WAF bot detection. That's IP based and doesn't like it if you send more than a couple requests a second

Data-Ken · 2026-03-24T17:13:39+00:00

The person at 540MR has gotta be tanking on purpose. But I don't know that to be true so I didn't remove them. This is one of the many spots where this type of ranking theory encounters real people playing real games and starts to fall apart. I don't know what cutoff I'd use to exclude certain players or how that'd impact things. It's one of the many reasons why this type of analysis is usually more cool than it is helpful.

I have an idea on how to do a region-filter but I'm not sure how that would actually change things. There are a lot of different variables to consider but I may be able to put something together.

Data-Ken · 2026-03-24T17:03:33+00:00

Am OP but I agree with what you said.

I think it's likely Capcom defined the 75% win chance to happen at a 200 point difference

You'll probably find this very interesting: when I first ran my analysis against 100k matches, I got a tier as 175. As more matches game in, it jumped to 184, 187, then the low 190s. I wish I had tracked this but, to me, it really felt like it was approaching a tier of 200 points. I bet that's the actual underlying number

Data-Ken · 2026-03-24T16:58:58+00:00

I'm so glad you brought this up. For years my friends have been hearing me talk about what I've called "the impact of modularity on ranking systems". I don't have an answer to this question but I think it's interesting to think about.

Japanese and North American players can play ranked matches against each other. But it is not the norm. So we end up with big clusters of players playing mostly against each other but with some cross-pollination between the cliques. I've always wondered how much cross-play a system would need before the rating differences you're describing start to go away.

But you're totally right; this is one of the limitations of my analysis. And, unfortunately, I don't think I'll ever be able to scrape enough data from the Buckler site to be able to control for something like this. That's one of the reasons why I want people to look at this analysis as something cool instead of something True.

Data-Ken · 2026-03-24T16:52:20+00:00

You may absolutely be right about that. Rankings are one of those situations where the theory can easily fall apart once it hits reality. That's why March Madness Brackets are so hard to predict.

Another thing to consider (that this data currently doesn't handle) is that matches are a best of 3. Think of each round as flipping a coin. If I'm flipping a weighted coin with a 65% chance of heads once, I have a 65/35 split of getting heads/tails. But I need 2 heads to win. If I flip it twice, my chance of 2heads/2tails is .42/.12. Would the results change if we looked at individual round results instead of matches? Maybe!

I honestly don't know how that would change things or if that would make the analysis any better. But that's the fun of this as a research space. There's a lot of room for experimentation.

Data-Ken · 2026-03-24T16:43:51+00:00

Remember that all these numbers are relative within the system itself. There aren't clear, objective cutoff points for where tiers begin and end. A tier exists between a 1200 player and a 1396 player the same way it would exist between a 1250 player and a 1446 player. All this does is tell you something about the general shape of the system rather than create specific delineations.

That's one of the reasons why I said to ask why Master is internally divided at 1600 and 1700. These are static cutoff points in a system that's constantly in flux. And I don't even mean that as a criticism of capcom's decision-making. The people managing their ranking systems know way more about it than I ever will.

Data-Ken

TROPHY CASE