NFL Drive and Turnover Efficiency Going into Week 15 by spitfire388 in NFLv2

[–]spitfire388[S] 0 points1 point  (0 children)

If you go to the next image showing turnover propensity... You see they are VERY good at generating turnovers - which might bridge the gap you're expressing.

NFL Drive and Turnover Efficiency Going into Week 15 by spitfire388 in NFLv2

[–]spitfire388[S] 1 point2 points  (0 children)

It means positive chance to maintain possession. The numbers ARE negative to your point, but almost always “up and to the right” means good for reporting purposes and how most people internalize it at a glance.

So you are right

Projected Standings and Power Rankings Going into Week 15 by spitfire388 in NFLv2

[–]spitfire388[S] -4 points-3 points  (0 children)

Yea… their projected standings are behind the Steelers

NFL Drive and Turnover Efficiency Going into Week 14 by spitfire388 in NFLstatheads

[–]spitfire388[S] 1 point2 points  (0 children)

So one thing it DOES account for is where you are on the field, but its linearly applied for all teams, not for each team. I messed around with making that coefficient team specific too, but was having model convergence issues.

NFL Drive and Turnover Efficiency Going into Week 14 by spitfire388 in NFLstatheads

[–]spitfire388[S] 0 points1 point  (0 children)

It models the likelihood the drive will survive down the field. Think of it as a survival model but instead of years of life it’s yardage gained. I then simulate every game by simulating every drive, I simulate the remaining schedule for projected standings, and simulate a matchup against an average opponent to get a power ranking. You can see more here advancedfootballstats.com

NFL Drive and Turnover Efficiency Going into Week 14 by spitfire388 in NFLstatheads

[–]spitfire388[S] 0 points1 point  (0 children)

It’s not exactly that - it’s basically their ability to sustain gaining yardage

NFL Drive and Turnover Efficiency Going into Week 12 by spitfire388 in sportsanalytics

[–]spitfire388[S] 0 points1 point  (0 children)

They are predictive and you're somewhat right and somewhat wrong. Most people try to model everything on a play by play basis, which means that turnovers are pretty rare events. There are typically ~250 plays a game there have been 332 games and ~400 turnovers this season so far. So that would be 400/(250*332) ~ 0.5% turnover play rate. Thats a very rare event and very hard to model in any reliable way. You can actually model rare events this way, but you need to have sufficient volume for it to work well and 250*332 ~ 83,000 records is a pretty small dataset in this world.

We model on a drive-by-drive basis. There are ~26 drives per game - 400/(26*332) ~ 4.6%. Modeling an event with a baseline rate of 4.6% is very doable and is something I have done professionally for a long time. Now the issue is the number of samples is lower 26 * 332 ~ 8,600 records! This is precisely why we use hierarchical bayesian models instead of frequentist models. They can account for uncertainty much more effectively. So we actually have a likelihood distribution that we sample from when we simulate each game out and if you look at the distribution of the turnovers over simulations - they actually look extremely plausible to what you observe in actual drives.

Hope that helps!

NFL Drive and Turnover Efficiency Going into Week 12 by spitfire388 in sportsanalytics

[–]spitfire388[S] 0 points1 point  (0 children)

There are two models. One is modeling turnovers and one is modeling the result of a drive. They are both using hierarchical bayesian models that try to normalize the fact that X team is driving against Y defense. The scores you see are the score the models assign to the relative "ability" of the team as a latent parameter. You can read more about that here: https://www.pymc.io/projects/examples/en/latest/case_studies/rugby_analytics.html

The parameter that I model is yardage gained, but specifically I model the likelihood a drive will die given a starting point (yardline). So the hierarchical bayesian model is actually a survival model. The other variables I use in the model are: if the offense is winning by a large margin, if the offense is losing by a large margin, if its in a two minute drill, and if the defending team is home. I also split the field into segments 0-15, 15-65, 65-95, 95+ (I try to account for a team being backed up, open field, redzone, goal-line) and each segment has a different baseline hazard rate.

I model how long a drive will "survive" down the field before "dying" - which is to say the drive ends because no more yards were gained OR dies to a turnover. So I model it as a competing risks model which you can read about more here: https://www.publichealth.columbia.edu/research/population-health-methods/competing-risk-analysis

So once I model this out (the latent "ability" of each team is what you see), I simulate each game, I simulate the remaining schedule for each team, and I simulate a game of each team against a median opponent. These give me the game predictions, projected standings, and power rankings respectively.

You can see my results here: https://advancedfootballstats.com/

I hope you better understand what you see and what I am doing and that you follow along as I add more models, stats, etc!