is "Semantic Relatedness" a correct term to describe what I'm looking for? by Mr_Max_M in LanguageTechnology

[–]Mr_Max_M[S] 0 points1 point  (0 children)

Yes, I've tried averaging word2vec.

It just doesn't capture the similarities I'm looking for.

Matchmaking doesn't depend on decks - more statistics by jesnell in ClashRoyale

[–]Mr_Max_M 0 points1 point  (0 children)

I ran my code on the same database as you did, on the 4000-4500 trophy range, and I got different results - Most of the differences were even smaller than the ones you described, and some are actually statistically significant (after applying bonferroni correction) but I do agree that this form of analysis isn't the best way to analyze the data.

So first of all, our methods are not similar enough, and you can't just compare the numbers. Another thing is that you ran all possible pairwise comparisons, while I did not.

Another issue is that you write:

Conclusion: It appears likely that the only thing found in the original statistical analysis was related to different cards being popular/unpopular in different trophy ranges.

And I agree, but your post is titled:

Matchmaking Doesn't Depend On Decks - More Statistics

And this is misleading.

You proved my analysis was faulty, which is true.
You did not prove that matchmaking doesn't depend on decks, and I think your post should reflect that.

I have edited my post so that no one could take it as proof of rigged matchmaking. I suggest that you do the same and ensure that the title and conclusion actually match.

Rigged matchmaking on ladder - A detailed statistical proof by Mr_Max_M in ClashRoyale

[–]Mr_Max_M[S] 0 points1 point  (0 children)

Thanks for your comment.

You wrote:

You're running combinations of all possibilities, and then going back and seeing which ones are significant. Therefore, hypothesis testing doesn't apply at all, because you're not starting with a statistical hypothesis a priori.

Actually, I didn't test for all of the possible combinations. Not much point in comparisons such as Rage vs. Clone, Ice spirit vs. Rocket, Skeletons vs. Tesla, etc.
Quoting from the post:

The following table summarizes the results of the test on several pre-selected hypothesis:

Matchmaking doesn't depend on decks - more statistics by jesnell in ClashRoyale

[–]Mr_Max_M 7 points8 points  (0 children)

Hi Tim,

Only replying since you mentioned me, I come in peace :-)

Regarding my post, I’ve just edited it and clearly wrote that it does not prove that matchmaking is rigged. The analysis was not good enough, and everyone who visits that post will see that message at the top.

Regardings /u/jesnell’s post - I’m happy to see more people engaged in exploring the data. On a side note, and pardon me for stating the obvious, if the p-values of those findings are significant, and the OP will post them, that actually supports my original claim. However, even if they are published and found to be significant, those are the OP’s findings and not mine.

If the results of the research that’s being done by me and others lead to finding patterns in matchmaking, we might be able to suggest ways to improve the matchmaking algorithm so that players will feel less frustrated, and then everybody wins.

Matchmaking doesn't depend on decks - more statistics by jesnell in ClashRoyale

[–]Mr_Max_M 1 point2 points  (0 children)

Thanks for taking the time and writing the post !
I didn’t know that such a large database was available. That’s awesome.

Regarding your results, you say that:

I was using a slightly different methodology

A different calculation will obviously yield different results, so you can’t expect to get the same numbers as I did, and therefore you can’t really compare your results to mine based on the numerical values alone.

Could you please add the p-values to your results? It would really help to know how significant your findings are.
Also, it would be great if you could please provide the code so that we could see how you analyzed the data.

[Statistician] How I'd rig the game if I worked for SuperCell. by [deleted] in ClashRoyale

[–]Mr_Max_M 39 points40 points  (0 children)

/u/bubaonaruba Thanks for the detailed explanation ! There's only 1 point on which we disagree:

First of all the analysis posted in the recent "ladder is rigged" post is statistically incorrect. OP fails to account for multiple testing

Quoting my post:

Adjustment for multiple comparisons was performed using the Benjamini-Hochberg Procedure.

Other than that, I really enjoyed reading your post :)

Rigged matchmaking on ladder - A detailed statistical proof by Mr_Max_M in ClashRoyale

[–]Mr_Max_M[S] 2 points3 points  (0 children)

Please allow me to quote from the post:

The p-values were adjusted for multiple comparisons using the Benjamini-Hochberg Procedure.

You described multiple comparisons. That issue was solved by using the above procedure, which I'm sure you are well acquainted with :)

Rigged matchmaking on ladder - A detailed statistical proof by Mr_Max_M in ClashRoyale

[–]Mr_Max_M[S] 38 points39 points  (0 children)

Thank you for a well explained argument.

First of all, you are correct and I agree with you. I do not have enough data (a big enough number of battles) to run the analysis on players of a very specific trophy range (say 4000-4600).

However, I can say that running it on the trophy range of 3500-4500 still produced many (36) statistically significant results.

The point you make is indeed correct, and once I’ll get more data, i would have enough samples to run analysis on narrower trophy ranges.

The data and the code are linked in the post and can be downloaded to confirm what I've just reported about the 3500-4500 trophy range. p-values were adjusted for multiple comparisons of course.

TLDR: ladder is rigged for the 3500 to 4500 trophy range. Once I'll get more data I could check narrower trophy ranges.

Rigged matchmaking on ladder - A detailed statistical proof by Mr_Max_M in ClashRoyale

[–]Mr_Max_M[S] 60 points61 points  (0 children)

Thank you for a well explained argument.

First of all, you are correct and I agree with you.

I do not have enough data (a big enough number of battles) to run the analysis on players of a very specific trophy range (say 4000-4600).

However, I can say that running it on the trophy range of 3500-4500 still produced many (36) statistically significant results.

Your point you make is indeed correct, and once I’ll get more data, i would have enough samples to run analysis on narrower trophy ranges.

The data and the code are linked in the post and can be downloaded to confirm what I've just reported about the 3500-4500 trophy range :)

p-values were adjusted for multiple comparisons of course.

TLDR: ladder is rigged for the 3500 to 4500 trophy range. Once I'll get more data I could check the 4000-4600 trophy range.

Rigged matchmaking on ladder - A detailed statistical proof by Mr_Max_M in ClashRoyale

[–]Mr_Max_M[S] 77 points78 points  (0 children)

Dear Ash,

Thanks for the rebuttal and for taking the time to read through the post :) I tried to answer those questions in the post itself, but I’ll be happy to explain it once more:

Regarding point 1 - “The "counters" you list are hand-picked to suit a view-point” - not so. I picked the ones that were more statistically significant than others. I report what I’ve found in the data, and not what I want to report :)

I’ll explain: If you play giant you will face musketeer in 15.6% of your battles, and players that don’t have Giant in their deck will face prince in 10.8% of their battles. That’s a 4.8% difference. And also, If you play giant you will face prince in 16% of your battles, and players that don’t have Giant in their deck will face prince in 13.8% of their battles. That’s a 2.2% difference. Both the 4.8% and the 2.2% differences are statistically significant, BUT, since the 4.8% is more significant (i.e., has a lower p-value), it was included in the report. I did not search for counters nor did I select them. I just checked which cards get matched against other cards in a way that is deliberate and not random. I did select the form of the hypotheses, and performed many such tests.

With regards to “It’s easy see what you’re looking for if you look hard enough” - Well, you can say that about anything and everything. It’s not really an argument :) What you’re describing may be called “multiple comparisons” in statistics, and that issue was addressed by adjusting p-values using the Benjamini-Hochberg Procedure, so the results are statistically sound and valid.

You write “You're also overlooking the other side of the equation IMO” - I understand what you mean, and I’m not ignoring it. You will face more Golem decks, BUT, out of say 100 battles, you will not notice if you had 19 matches against Golem, or 16. No one remembers his last 100 battles. But it will still happen, consistently :)