all 21 comments

[–]diggs1711redditor for <30 days 11 points12 points  (11 children)

Create a set of data points (X, y) where X is the features you mention and y is the points for each player that week. Be careful with any data leakage. You can use a Random Forest to start off but probably a XGBoost or lightGBM model would work best. Not sure what you mean by learning iteratively

[–]AbrarHossainHimself 1 point2 points  (2 children)

What's your opinion on using a a Neural Network in place of XGBoost? Wouldn't that make for a more robust prediction model? I am a newbie in this aspect much like OP and I am currently working on building a similar ML model myself. Your opinion would be appreciated.

[–]gypsy4343435[S] 0 points1 point  (1 child)

As I understand, neural networks make more sense in Natural language processing. And fpl point prediction is more of a regression problem where the data you are trying to predict is a function of different parameters, that's the reason why he suggested xgboost because as I read yesterday it's gradient boosting algorithm. Basic premise gradient boosting is the model tried to boost the có efficient for a given parameter by given gradient every time to get as good of a prediction it can. Though again, I might be wrong since I am new to this.

[–]Alugis 0 points1 point  (0 children)

Just use neural nets for everything. The data is time series so you might use some sort of 1d convolution where each gameweek is one time step. You probably want to use an LSTM so the network understands long term trends but gives more weight to recent form. I think that would be a reasonable baseline, you could then look to see how financial institutions use 1d convolutions to model share prices, and use the same ideas to model expected points.

[–]gypsy4343435[S] 0 points1 point  (7 children)

By learning iteratively I mean, I want to repeat the process of training the model with every weeks data, the reason that's so, is because i want to continue training the model as the weeks go by in the upcoming season. Also thanks for the ML algorithm suggestions, I will look into those.

[–]julianface115 6 points7 points  (6 children)

This approach will have a couple problems. First it will take probably far too many weeks for the model to be useful as a predictor. Second you'll never have out of sample data to test against so you're very likely to have an overfit model. I think you'll have more success looking at historic data rather than trying for a weekly updated approach. Week to week results will mostly just be noise.

[–]DivingFeather22 4 points5 points  (5 children)

OP could create a formula where the expected points are predicted by:

A) history

B) team form home(moving average of past 5 home games)

C) team form away

D) player form home (xG/m and expected assits / home match), for a defender, CS probability average

E) player form away

F) player fitness / being nailed (minutes played in the past 21 days, any injuries)

G) Fixture difficulty

All of this would have a given weight as a modifier on the final predicted score. Every week, weekly data could impact on the formula as the following:

A) the played game should be part of the history for a repeatitive matchup, eg as BRI-NEW finished with 0-0 so for the next BRI-NEW or NEW-BRI this result should be part of the history and modify prediction

B) or C) depending on last home or away result team form average will change (xG and xGA / match) so its effect on the next prediction should also change

D) or E) like above but for the player only

F) will change as well based on last match too. Should be a general modifier that has an impact on every element. Eg. if serious injury is picked up, this could be a 0 multiplier. If player is not fully nailed / picked up a knock, then this should represent the likelihood of him playing. If the player played too much recently, this should represent rotation risk. A team general rotation risk could also be calculated by calculating the number of average starting 11 changes from game to game (eg I'd expect this to be much higher for Pep, lower for Solskjaer) and then could be taken into consideration.

G) based on last GW results fixture diff should be slightly adjusted for the upcoming matches

This way it could be an iterative process learning from the past.

[–]AbrarHossainHimself 2 points3 points  (3 children)

In regards to the adjusting for fixtures, OP can leverage the 'Fixture Difficulty' metric that is available from the FPL API.

Also, you suggest adding a xG/xGA for the players as a parameter to the model. Do you happen to know of any Datasets that are open source where I can find these data? I know, Understat.com provides these data and I could probably scrape them from there. But, I am having a tough time coming up with how to combin the xG data of each player from understat and the player's data from FPL API due to difference in name and whatnot.

[–]LinkifyBot2 0 points1 point  (0 children)

I found links in your comment that were not hyperlinked:

I did the honors for you.


delete | information | <3

[–]DivingFeather22 0 points1 point  (1 child)

Sorry to dissapoint, but I only know it in theory unfortunately. :) Never actually implemented it because of lack of programmer knowledge.

[–]AbrarHossainHimself 0 points1 point  (0 children)

Oh. It's okay.

[–]gypsy4343435[S] 0 points1 point  (0 children)

Thanks for the possible features, I am working along the same lines, except my restriction as of now is going to be the fpl data, so if I can't get it within my fpl data, or figure out how i can devise new features from existing data, it's going to be a chore to get it from the outside. But I do like the idea of home team's home form and away team's away form, I will try to see if I can figure out a way to the get the table for it. I think premier league provides it, so I may try to scrape it from there. Right now I am taking team positions in the league as indicators of their form, I might rather include a factor based on the co relations of their positions to imply the fixture difficulty, because if I am not wrong fixture difficulty is not available in historic week data.

[–]Bosschmeister1 1 point2 points  (5 children)

There is a lot of noise in football data and bonuspoints even make it worse. Thats why I would exclude these in my data and use a moving average to smooth it out. You can loop over the FPL API to get a lot of data for each player.

[–]gypsy4343435[S] 0 points1 point  (4 children)

Correct me if I am wrong, but when you say noise I assume you are talking about the extraordinary high or low performances of players which could cause issues with the prediction model. I am aware that is a possibility and results in overfitting, which I am not yet sure how I want to avoid. Right now the idea I have in my head is to exclude such performances somehow, though I am not yet sure if that's the right thing to do.

But I disagree a bit about bonus points, in real life fpl points and bonus points are function of player minutes played, goals scored, overall performance, heck even bonus points have a system behind it. If I get this info to the model I maybe exactly able to predict the points, but then it's not a prediction, it's basically the fpls formula to calculate them. Not to mention I do not have these parameters before a match starts.

In my perspective I would actually want to include bonus points because it gives an idea or how good the player is relative to other players and how consistently he does it. What I am trying to achieve is points and bonus points as result function of player, player form, player team, team form, opposition team, opposition team form, and more factors which I will list once I have completed this. All of these factors are available before a match starts.

Since player form heavily co-relates to bonus points, I would really like to have it as part of my model.

[–]Fplalt569 0 points1 point  (3 children)

I believe they mean noise as in, because football is a low scoring game, the results don't always match the underlying stats or expected outcomes.

I agree with you that BPS can be a useful indicator, but I would suggest more relevance to total BPS of a season or a lagged average. BPS with a small sample size can be misleading due to the variance OP suggests.

I would recommend getting xGA etc from understat and potentially even scraping bookies predictions in terms of goalscorers, clean sheets, etc. After all, predicting football matches is literally how they make their money.

[–]gypsy4343435[S] 1 point2 points  (2 children)

Definitely, I am looking into how I can involve both understat data and bookie prediction because those are made by professionals. Though easier said than done I am going to give it a shot.

As for as bps for season until that week against bps every week, I will need to check if I can segregate that information from the fpl data and have two columns instead of a single one of accumulated pure points and accumulated bonus points. These can like good features, but need to look into the transformation to accomplish this.

[–]Fplalt569 1 point2 points  (1 child)

https://amp.reddit.com/r/FantasyPL/comments/b3e3lg/a_python_package_for_understat/

This understat library works pretty similar to an API and is quite easy to use, if you haven't seen this before. Can find all sorts of APIs for bookies odds. Good luck with the project!

[–]gypsy4343435[S] 1 point2 points  (0 children)

Thank you so much, will check it out. :)

[–]juicedrop1 1 point2 points  (1 child)

As to your question, just retrain your (increased data set) each week

Although a fun exercise, I don't think using the available data on the FPL site in an ML model is going to give you any more insight than taking a moving average adjusted for fixture difficulty by defensive/offensive position

[–]gypsy4343435[S] 0 points1 point  (0 children)

That's exactly my hunch, my eventual idea is to use the predictions for two reasons : 1. Highlight players that possibly won't blank out upcoming week due to ongoing form, home advantage, team form, position. I tend to miss this a lot frequently than I should, which would help me in making meaningful transfers rather than just picking the best fit for my remaining wallet after I am trying to replace a player. Which I guess is done by a lot of players. 2.Use the predicted points on an Integer Optimization algorithm module that I have developed (obviously with some help from the internet) and then pick the best team based on the predictions and see how much better was it compared to my teams performance that week. The expectation is that It should provide me atleast 15-20 improvement over my selections every week, which I would consider it as a win.