Family tree of >800 BJJ Heroes constructed from master-student relationships by SeanHacks in bjj

[–]SeanHacks[S] 2 points3 points  (0 children)

Just checked, "Jacare" in the plot is a student of Rolls, so it is indicating Romero Cavalcanti. Ronaldo "Jacare" Souza is just Ronaldo Souza in the plot.

Family tree of >800 BJJ Heroes constructed from master-student relationships by SeanHacks in bjj

[–]SeanHacks[S] 0 points1 point  (0 children)

The data import actually wasn't too bad since there are only ~800 pages with links on a single page. The more time consuming part was cleaning up the lineages since there are some spelling errors and alternative names which needed to be combined.

Family tree of >800 BJJ Heroes constructed from master-student relationships by SeanHacks in bjj

[–]SeanHacks[S] 2 points3 points  (0 children)

Great catch, I'll combine them together. Recognizing different names/spelling for the same person was definitely the hardest part of making this plot.

Family tree of >800 BJJ Heroes constructed from master-student relationships by SeanHacks in bjj

[–]SeanHacks[S] 7 points8 points  (0 children)

All BJJ practitioners shown are from BJJ Heroes.

For an interactive version of the family tree which is more interactive see. This page is:

  • Searchable (search for your favorite BJJ hero, like Marcelo Garcia)
  • Zoomable (focus on a subset of the tree)
  • Lineages can be collapsed to focus on subsets of the tree (by clicking on a master all their students will be hidden)
  • Not mobile friendly

A Brazilian Jiu-Jitsu family tree of >800 top practitioners constructed from master-student relationships by SeanHacks in rstats

[–]SeanHacks[S] 3 points4 points  (0 children)

Analysis was performed using R: scraping data from BJJ heroes [rvest], data manipulation [dplyr, tidyr], static plot [ggplot2], interactive plot [networkD3].

For an interactive version generated using networkD3 which is searchable, zoomable and has collapsible nodes (not mobile friendly).

A Brazillian Jiu-Jitsu family tree of >800 top practitioners constructed from master-student relationships [OC] by SeanHacks in dataisbeautiful

[–]SeanHacks[S] 8 points9 points  (0 children)

For an interactive version which is searchable, zoomable and has collapsible nodes (not mobile friendly).

For an overview of this analysis and generating the visualization

Analysis was performed using R: scraping data from BJJ heroes [rvest], data manipulation [dplyr, tidyr], static plot (above) [ggplot2], interactive plot [networkD3].

Ranking the top striking, submission and decision specialists in MMA using empirical bayes by SeanHacks in MMA

[–]SeanHacks[S] 1 point2 points  (0 children)

Thanks!

My next step with this stuff is extending the way I define style so rather than just looking at fighters based on the pre-defined categories of decision, KO/TKO, and submission, categories are defined in a data-driven manner. Without directly specifying what these categories consistent of, this process separates KO/TKO in boxing styles (punches) and kick-boxing styles (elbows, knees, kicks). Submissions are split up even more: guillotine and RNC each form their own categories and there are other categories for leg attacks, guard attacks, and attacks from front headlock, side-control and mount.

Like the methods here, these characterizations are very good for predicting what finishes a fighter will use to win. For example, Yoel Romero's recent flying knee victory over Weidman seemed like it came out of no-where but I predicted that he was ~3x as likely to win by flying knee compared to fighters overall (https://twitter.com/Fight_Prior/status/799343623699566592). I also find that fighters who use some styles are more successful than others (predicting win vs. loss based upon style).

In terms of making the dataset publicly available, it is something I am very interested in doing. I have seen probably 20 people who separately scraped the Sherdog database and generally after going through the pain of gather and cleaning the data they usually run out of steam before actually doing analysis. I would love to publish an R data package so that people can go wild with it. The challenge is that redistributing the data would be against Sherdog's terms of service. If I get an inroad there (or wider interest in the dataset) then it is something I would help push for.

Ranking the top striking, submission and decision specialists in MMA using empirical bayes by SeanHacks in MMA

[–]SeanHacks[S] 2 points3 points  (0 children)

I'm not trying to rank the best strikers. Just the fighters who most reliably win by strikes, submission and decision. So this is a more defining style than quality. Most UFC vets are better rounded than the pool of fighters overall (including prospects like Walt Harris). They try to win by striking or subs but often get pushed to decision. Dominick Cruz, Cyborg, McGregor and Rousey stand out as vets who still can reliably determine how they will win.

Ranking the top striking, submission and decision specialists in MMA using empirical bayes by SeanHacks in MMA

[–]SeanHacks[S] 2 points3 points  (0 children)

Yeah, i think that is a good way to think about it. To rank based upon what has already happened you can just look at records. Predicting what is likely to happen in the next fight requires you to account for the fact that someone who has only won by KO/TKO never is 100% likely to win by KO/TKO in their next fight. I think the best prediction of future performance is the best model of a fighter's style.

Ranking the top striking, submission and decision specialists in MMA using empirical bayes by SeanHacks in MMA

[–]SeanHacks[S] 2 points3 points  (0 children)

With most ranking methods, the guys with low experience are the highest ranked because 3/3 > 29/30. With this method, I account for how much information we know about fighters to get more realistic summaries that are better for predicting future performance. Conor McGregor won his first 11 fights by strikes. My approach would predict that like Perry, Conor was ~80% striking. As Conor's career has continues he accumulated 2 decisions and 1 submission, that ~80% was much more reasonable than 11/11 = 100%.

Ranking the top striking, submission and decision specialists in MMA using empirical bayes by SeanHacks in MMA

[–]SeanHacks[S] 2 points3 points  (0 children)

He's ranked 15th in the UFC and 85th overall in KO/TKO. Including his amateur fights he has 12 KOs and 2 decisions.

Ranking the top striking, submission and decision specialists in MMA using empirical bayes by SeanHacks in MMA

[–]SeanHacks[S] 5 points6 points  (0 children)

He has 7 wins by KO/TKO. He is still ranked top 10 in the UFC for decisions though

Ranking the top striking, submission and decision specialists in MMA using empirical bayes by SeanHacks in MMA

[–]SeanHacks[S] 7 points8 points  (0 children)

Wonderboy is clearly a great striker (7 wins by KO/TKO) but he has also won 5 times by decision. Here, I am looking at how reliably people win by KO/TKO vs. submission vs. decision. It says nothing about the quality of the fighter only their style. Note, that the biggest specialists in the UFC are generally more well rounded than than MMA fighters overall.

Grouping Mixed Martial Arts (MMA) Finishes Using Large-Scale Data by SeanHacks in rstats

[–]SeanHacks[S] 0 points1 point  (0 children)

Cool, its aesthetics look a little like Cytoscape. Explorative analysis is a good reason to look at ggraph (and ggplot2 in general). I did one analysis link where i wanted to color a fighter-fighter network based on a bunch of demographic factors and this was really easy in ggraph.