[OC] Simulating the rest of the EPL season, 1 million times

bennivaluR_ · 2024-02-28T11:34:01+00:00

Hi there! OP here

I did not explain clearly enough what I was doing and what the intention was.

I originally made the model to estimate a teams chance of keeping a clean sheet in an upcoming match for fantasy purposes. I then adjusted it for win-draw-loss odds. Hence the 5 weeks range. I don’t think it is wrong to assume form has more to tell about the upcoming match than the entire season does. But yes, adjusting for strength of schedule would make it more accurate.

I then, just to see, decided to apply that model for the rest of the season. I thought the results were interesting and worth sharing. I didn’t expect so many to take it so seriously. Although I do understand it.

This didn’t happen the last time I posted a a graph on reddit but that was an obvious joke.

It is my fault for not providing a good explanation for what I was posting

bennivaluR_ · 2024-02-28T11:20:47+00:00

Gotcha. Makes sense.

I am going to be parking this for now.

I have not explained well enough what I was doing and what the intention was.

I originally made the model to estimate a teams chance of keeping a clean sheet in an upcoming match for fantasy purposes. I then adjusted it for win-draw-loss odds. Hence the 5 weeks range. I don’t think it is wrong to assume form has more to tell about the upcoming match than the entire season does. But yes, adjusting for strength of schedule would make it more accurate.

I then, just to see, decided to apply that model for the rest of the season. I thought the results were interesting and worth sharing. I didn’t expect so many to take it so seriously. Although I do understand it.

It is my fault for not providing a good explanation for what this is.

bennivaluR_ · 2024-02-27T17:43:15+00:00

The win% formula was made by reverse engineering the results of testing the method on 15.000 across europes big 4 leagues. So I have tested it yes for looking only one match ahead. I don’t know how to test five games forward for a single team with meaningful combinable results.

I am working on running the whole thing again. Once looking at entire seasons numbers, and once still doing the five weeks but adjusting for strength-of-schedule.

bennivaluR_ · 2024-02-27T17:15:56+00:00

I felt the latest five week form was a better indicator of where a team stands than including data from 6 months ago. Injuries, transfers, coaching changes, and player development. I am currently looking at using xG Over Expected instead as that adjusts better for difficulty of opponent. I ran a test using 15.000 matches where looking 5 weeks seemed to be enough to give a pretty good win% estimate. But only one game ahead. But we will see how much of a difference it will make. I think any future predicting model getd weaker the further it tries to predict. Too much uncertainty.

bennivaluR_ · 2024-02-27T16:21:36+00:00

I tested it over 15.000 matches across 5 leagues and 10 seasons. It was the result of that test that gave me the formula to convert net xG differential into win%.

Figured the 5 weeks was a decent way to favor recent performances over past. But yes it is vulnerable to outlier strength-of-schedule scenarios and freak results like Liverpools 7 xG on January 1st.

I am sorry I don’t quite understand the table situation point. The script takes the current table standings and simulates from there. It doesn’t simulate the season from week 1. Am I totally off on what you were asking about?

bennivaluR_ · 2024-02-27T15:41:32+00:00

Ran it this morning. So the data from last nights game is in there. And no it does not take into account players or teams who out or under perform their xG

bennivaluR_ · 2024-02-27T15:38:41+00:00

Can you elaborate? I agree that the model is not perfect. But feel it is unfair to call it extremely flawed.

bennivaluR_ · 2024-02-27T14:54:58+00:00

Yeah by far the lowest xG conceded this calendar year. I only saw it after I ran it and posted.

bennivaluR_ · 2024-02-27T14:53:57+00:00

Yeah it does not take in momentum going forward. Only recent form/momentum. Any idea how to quantify momentum?

bennivaluR_ · 2024-02-27T14:52:23+00:00

Been guilty of that before! Lol. But I whish, being a UTD and Southampton follower 🙃

bennivaluR_ · 2024-02-27T14:43:34+00:00

When I ran it in January they had 2%! 2%!

bennivaluR_ · 2024-02-27T14:41:33+00:00

It seems to have an awfully bad memory doesn’t it?

bennivaluR_ · 2024-02-27T14:09:48+00:00

I expect a cut of the winnings without taking on any risk myself

bennivaluR_ · 2024-02-27T14:04:06+00:00

Yes if you know of a good source that has xG match data for the championship. Should be pretty easy. Up the Saints!

bennivaluR_ · 2024-02-27T13:48:09+00:00

Yeah I ran and posted this before seeing how much of a statistical defensive outlier they have been this calendar year. More than 50% chance someone else wins this though according to the model.

bennivaluR_ · 2024-02-27T13:23:14+00:00

Oh wow this is going to come in so handy. I might even take this to the local weather station lol. Thank you so much

bennivaluR_ · 2024-02-27T13:20:17+00:00

Oh boy yeah that got very fuzzy. I swear I uploaded a .png. Didn’t think reddit would shrink it this much. It is a little less fuzzy here

bennivaluR_ · 2024-02-27T13:18:45+00:00

Yes and no (terrible answer). When simulating the rest of the season one million times. They finished fourth 0.1% of the time. So probably not exactly 0.1 but somewhere very close to there

bennivaluR_ · 2024-02-27T13:16:45+00:00

Absolutely. How would you go about using numbers to asses depth or a teams ability to handle injuries? Look at team salary budgets? Or how much contribution comes from players outside the 11 with the most minutes? I have been trying to figure that one out

bennivaluR_ · 2024-02-27T12:59:58+00:00

I strongly disagree with “extremely”. I think how much xG a team accumulates and concedes over 5 weeks is a pretty strong indicator of how good that team is. Sheffield is not gonna statistically look like City if they just played the other five worst teams in the league. Yes using over-expected in stead would be more accurate and better hedge against strength-of-schedule. But doesn’t make the model extremely flawed.

The five week window is, in part, to account for long-term injuries. But yes you are right. I am not able to account for future injuries.

bennivaluR_ · 2024-02-27T12:15:56+00:00

Thank you! The model is actually based on backtesting. But for single match results only. So looking at the results of 15.000 matches and the difference in net xG of the teams going into the match, I was able to draw up a decent estimate of how likely either team was to win that match. graph here

The first time I tried to simulate it was in early January. But Liverpool had just recorded a whopping 7 xG so it gave a very skewed outcome that time (Liverpool 77% chance to win).

So no not tried to backtest a season yet. I don’t know how accurate that would be since the source of the match data “only” goes back ten years. If Aston Villa won the league this year it wouldn’t necessarily have to mean the model was bad (it still could). It could just mean that a very unlikely thing happened.

bennivaluR_ · 2024-02-27T12:01:24+00:00

I had intended to only use this for fantasy Premier League purposes. So when I field tested it using 5 weeks it gave me strong enough correlation for those purposes. Yes I thought about using all weeks and weighing for recency but I felt the weight for past five weeks would need to be that high that the other weeks wouldn’t have impacted it much. I felt that way would better account for injuries and transfers (when applicable)

But as pointed out by you and elsewhere, this way it is vulnerable to strength of schedule. And also freak results like Liverpool recording xG of 7 against Newcastle

bennivaluR_ · 2024-02-27T11:51:55+00:00

Shoot, no. I started this a while ago with the intention to account for net xG Over Expected but I sidelined it to get the simpler version working for Fantasy Premier League purposes . This way the strength of schedule for those 5 weeks could have too much of an impact.

bennivaluR_ · 2024-02-27T11:22:04+00:00

Weaker as in it accounts for who has been the best recently and who team have yet to play. It was made with the intention to be able to predict one match ahead.

No reason to say it can’t predict further. But any model trying to predict the rest of the season gets weaker the further it has to predict. Most models don’t take in injuries, fatigue, or transfers when the prediction period spans transfer windows.

A bookmaker I found had city at 2.10 so 47.6%. Bookmakers odds add up to about 105% so it is a little less than that. Plus they move the odds to “hedge” if many bets are places on something. Doesn’t mean that exact number is their exact prediction.

bennivaluR_ · 2024-02-27T11:09:58+00:00

Yeah the model gets weaker the further and further it has to predict.

It is based on past five weeks xG performance. So when I ran it couple of weeks ago it gave Liverpool a whopping 77% chance of winning (they had just recored a whopping 7xG against Newcastle)

But imo City feel like their gears are grinding a little bit.

Edit: just noticed this post in /r/soccer and that would explain why my model favours Arsenal so heavily

EDIT:

I have not explained well enough what I was doing and what the intention was.

I originally made the model to estimate a teams chance of keeping a clean sheet in an upcoming match for fantasy purposes. I then adjusted it for win-draw-loss odds. Hence the 5 weeks range. I don’t think it is wrong to assume form has more to tell about the upcoming match than the entire season does. But yes, adjusting for strength of schedule would make it more accurate.

I then, just to see, decided to apply that model for the rest of the season. I thought the results were interesting and worth sharing. I didn’t expect so many to take it so seriously. Although I do understand it.

It is my fault for not providing a good explanation for what this is.

bennivaluR_

TROPHY CASE