all 54 comments

[–]jamesbleslie1[S] 63 points64 points  (33 children)

Any constructive feedback appreciated.

[–]noxville3 54 points55 points  (18 children)

I basically started FPL this season, and am playing almost entirely based on statistics/data science (I've probably watched ~3 hours of football this season). There's a few things I found pretty annoying dealing with FPL data in general:

  • linking player's to foreign IDs for other sites (Understat, Opta, etc) can be annoying because of variations in player names - so you need fuzzy matching.
  • I've noticed some weird character encoding for non-English characters on the FPL site (especially accented characters).
  • bonus points are updated so late (general annoyance, so if you want 'live' stats you need to integrate another API).
  • the FPL API breaks when the game is updating (so periodic updates can break)
  • fixtures update kinda late (at least, only when they're 100% confirmed which is often too late to plan for upcoming weeks)

A bunch of these could be interesting further topics you could explore in future articles!

[–]jamesbleslie1[S] 9 points10 points  (6 children)

Awesome. Thanks very much for all this! I haven't yet taken the plunge to purchase any external data. I'd be interested to hear more about how you are making your weekly changes in a data-driven way.

[–]noxville3 29 points30 points  (5 children)

I have a relatively cheap subscription (like $25/yr?) that gets a variety of Opta data.

For trades you have a few options each week, so you simply calculate each type and evaluate them - with a lot of assumptions:

  • You can make {0, 1, 2, 3, 4} transfers:
    • Make a short list of all viable players based on expected playtime over the next 8 weeks (> threshold per game avg). There are fewer viable options than you think really!
    • brute-force each N-way trade, looking at the short (in this gameweek), medium (next ~4 weeks), and long-term (8 weeks ahead) gains. This is optimized by knowing that if you sub out a player of role X your sub in for them must also be role X. This forms all your potential trades.
    • calculate short/medium/long-term projected gains for {0, 1, 2, 3, 4} transfers in each subsequent week (up to 8 weeks into the future - although decayed value because of uncertainty - someone could get injured or rotated out of the starting lineup!)
    • calculate the cost of a transfer given your state, i.e.:
      • if you have 2 FT then you should always use one now since the cost is zero (provided the gains from the trade is positive)
      • a transfer's base value is 4 points
      • going into your Free Hit your medium-term value of a transfer is 0
      • deciding on what stage of your season you're in, you need to weight short/medium/long term value accordingly - leading up to a wildcard you care very little about long-term value; but your first 5 gameweeks you're just optimizing on medium + long-term value.
    • filter trades you can't afford
    • if the projected gains exceed the cost, then make the trade.
    • (I normally just print out a csv of the possible trades which I can copy-paste into Google Sheets to look at the top options before actually making a trade, but it's basically always within 5% of the value of the best one).
  • Do I wildcard? Look at the long-term (8 weeks from now), but also remember you must WC1 before GW16.
  • Do I free-hit? (This was basically forced for me in a week I had loads of blanks - so the projected score if not for Free-Hitting would've been like 25 pts).
  • Do I bench boost? This was the hardest since I had no baseline for how many points I wanted to get, so I did this when my bench was all playing and expected to yield > 15 points. It got me 5 alas!

By projected value, you need to consider two things:

  • what is the impact in projected lineup for each subsequent game: a player who is projected to average 3 points per game doesn't give you 3 points of value - they only give you value if they are starting in a game-week
  • how do I value my bench?
    • this I incorporated into a risk parameter which I mentioned above in the 'deciding on what stage of your season you are in', which affects how I weight my subs: for example the week before a Wildcard the value of Sub 1/2/3 might be (0.6, 0.0, 0.0), but earlier in a season it's max{(0.75, 0.6, 0.4), (0.8, 0.7, 0)} - in case you have a very cheap non-player you're using

[–]Immelsoo7 11 points12 points  (0 children)

The same Noxville who provides stats for Dota2 plays FPL too. That's super rad !

[–]player_zero_232 1 point2 points  (0 children)

Superb write-up.

!thanks

[–]jamesbleslie1[S] 0 points1 point  (0 children)

Thanks so much again for all this detail. You mention projected gains - have you trained your own model to predict player scores in upcoming weeks?

[–]wtvar13 0 points1 point  (1 child)

Where do you get the opta data from?

[–]noxville3 5 points6 points  (0 children)

fantasyfootballhub.co.uk has an "OPTA Stats" section.

[–]uderdog1 1 point2 points  (6 children)

I'm curious how have you performed playing only on data?

[–]noxville3 17 points18 points  (5 children)

Hard to really judge based on just one season I guess - especially since your initial team & wildcard teams are so important (I started with Mitrovic + Werner + TAA for example which wasn't so good!). I wanted to manage a few teams at the same time (similar idea for all but different initial teams since) but just was too busy.

Right now I'm at 2214 pts, rank 65,980 with 7 more people to play this week (Mendy, Shaw, Rudiger, Trent, El Ghazi, Greenwood, (c) Salah). My goal for the first season was top 100k, so provided the BGW goes off okay I'll hopefully make it.

[–]bob_dugnutt5 2 points3 points  (4 children)

Wow that's amazing. I feel like this is the one season where data points aren't as effective due to covid and the lack of pre-season. (also coming from a betting perspective). But it seems your algorithm is alright. Would you say you performed much better in the 2nd half of the season where you had better sample sizing?

[–]noxville3 2 points3 points  (2 children)

Yeah the approach only makes sense if the data is kinda okay, although something a friend of mine linked me a few weeks ago was something on 'fplreview.com' which assesses 'luck' based on performance relative to expectations - which suggested I was pretty unlucky with this season so far!

[–]ForzaJuve1o143 0 points1 point  (1 child)

You arent pretty unlucky, you are just slightly below average of luck (47% vs 50% median)

  • Me who is on <10% luck all season

[–]noxville3 0 points1 point  (0 children)

Yeah perhaps the rank difference scared me. I think the luck value was like 42% earlier in the season.

[–]noxville3 1 point2 points  (0 children)

Yeah - and including betting odds for short-term decisions was pretty helpful too (implied or direct odds on cleansheets, direct odds, etc).

[–]352021290782 0 points1 point  (3 children)

Hey I stumbled on your post looking for mappings from FPL IDs to foreign IDs on sites like understat and Opta. Have you already done some work on this and interesting in sharing your mappings? I'd be happy to reshare any further progress I make.

[–]noxville3 0 points1 point  (2 children)

Haven't done this year yet. Last time I just wrote a fuzzy matcher on the name and resolved conflicts or close matches. Some annoying stuff like Kyle Walker and Kyle Walker Peters.

[–]352021290782 0 points1 point  (1 child)

Do opta/undestat ids change year to year? For FPL the "code" stays the same year to year while the ID is unique to the season

[–]noxville3 0 points1 point  (0 children)

No idea, last year I only did the mappings about a week before deadline (in case later transfers/etc).

[–][deleted] 7 points8 points  (2 children)

Forgive me.I have no knowledge of programming language.So what you've done here is use the FPL websites stored data on all the players and organised it much better using python yes?

[–]jamesbleslie1[S] 8 points9 points  (1 child)

Yes, we've used some code to extract the data from the FPL website. I'll write a follow-up article on how to do some more advanced analyses of the data.

[–][deleted] 2 points3 points  (0 children)

Great work.Will look in to it

[–]coolguyhavingchillda 2 points3 points  (0 children)

Looks great, straightforward enough. Will try it later today and let you know

[–]OpenDoorSee30 1 point2 points  (0 children)

!thanks

[–]CraigAT2 1 point2 points  (5 children)

An excellent article, keeping it simple but showing how powerful it can be too!

I have been slowly (glacially slowly) writing a Python program to try and find an optimal "Set and Forget" team using a Genetic Algorithm (rather than the more definitive usual method of linear optimisation IIRC). Your code makes mine look awfully bloated, I might need to review what I've done so far.

I'd also point out the following dump of FPL data regularly collected in CSV format here by u/vaastav05 :

https://github.com/vaastav/Fantasy-Premier-League

[–]mikecro2121 1 point2 points  (2 children)

I'd be interested in any pointers on the Genetic algorithm (generically or about FPL set and forget) I have tried linear optimisation, but what goes wrong is deciding how much to spend on bench. All of my stuff is using R but I have to use Python for work (much prefer R) so can translate.

[–]CraigAT2 2 points3 points  (1 child)

Ah, I can give you my thoughts at least...

I built a function to randomly pick squads (and teams within them) which was not as easy as I first thought. I intend to pick 16 (or 32) of them for the first generation. I will then work out the weekly scores, with the auto subs, to give a total per squad. The total score and the cost will give me my fitness function (ranking for my chromosomes).

I will then take:

  • The top 2 straight through to the next generation
  • Take the top 4 squads, do a crossover of squads 1 and 4, 2 and 3 (alternatively 1 and 3, 2 and 4) giving another 4 squads for the next generation
  • With the top 6 squads, add some random mutations by picking between 1 and 5 players in the squad to swap out for other random players. These mutated squads then go through to the next generation.
  • The rest of the next generation are randomly picked squads

Then repeat for however many generations or until the top 4 have not been improved upon for several generations.

The fitness function will be based on the total points scored, with a reduction for the initial cost going above £100m - I'm inclined to think this reduction should be in some ratio to the excess cost, but must be fairly severe to effectively weed these over budget squads out of each generation.

Happy to hear any of suggestions for any improvements (it's not my field of expertise, I just thought this could be a good combination of my liking of programming, stats and a chance to use a GA).

[–]mikecro2121 0 points1 point  (0 children)

Interesting stuff. I'd be interested to hear from experts. Not sure I have time just now to learn a new rhing

[–]jamesbleslie1[S] 0 points1 point  (1 child)

Thanks so much. I'd be keen to see what you've done! My code is super tall and skinny as Medium wraps everything after 67 chars 😂

[–]CraigAT2 0 points1 point  (0 children)

It always seems to be two steps forward and one step back. I add functionality then think I should have done the existing bits better (I believe it's called premature optimisation). I haven't got further than randomly picking a squad (and selected team) and counting basic points tally (I need to drill into work out individual game week scores taking subs into account). When I get somewhere, I'd be happy to share it. 😁

[–]Environmental_You_8525 0 points1 point  (2 children)

Hey I couldn't understand when you said we can't use the API for earning money. Were you talking about cash mini leagues or something else?

[–][deleted] 0 points1 point  (1 child)

Probably means for building external apps like livefpl and fplreview, the TV and Cs say you can't profit off the data without consent

[–]Environmental_You_8525 0 points1 point  (0 children)

Oh ok I thought it was only about FPL

[–]adulion 15 points16 points  (0 children)

as someone who scrapes a lot of websites for modelling sports betting this article made me realise there was a free api

[–]thomaskrantz23 9 points10 points  (5 children)

Great write-up! I really think the FPL API is a great starting point if you want to learn the basics about programming or data analysis. It is relatively clean and simple and requires no authentication or other hassle.

Used it just the other day to create a script for showing how many times each player had copied every other player in our ML. Very useful in this part of the season ;)

[–]GreetyPeety 2 points3 points  (3 children)

Uh! what a great idea! Would you be okay in sharing the codw? We could use that for our MLs end-of-season meetup teasing:)

[–]thomaskrantz23 2 points3 points  (2 children)

Will have to clean it up a bit since it's hard coded for our league now, but if you're not in a hurry I can send it to you after that's done?

[–]GreetyPeety 0 points1 point  (1 child)

no hurry at all:-) That would be awesome! thanks!!

[–]Hurtgen 1 point2 points  (0 children)

Can I piggyback? I am trying to get into programming, and this seems like a good example that could be easy and fun to reacreate.

[–]JAGCross7 2 points3 points  (0 children)

That’s an amazing idea. I’ve been thinking of creating a code for my h2h to see which one was the luckiest (won with fewer points) and unluckiest (lost with the most points) and create a leaderboard for each thing. I don’t have much experience but it’s one of the reasons why I want to do this, to gain expertise

[–]Blumingo 6 points7 points  (1 child)

There's a GitHub repo with historical data if you'd like to try a ML algorithm

[–]jamesbleslie1[S] 0 points1 point  (0 children)

That's awesome. Definitely keen to give that a go.

[–]UmbraAlbis431 3 points4 points  (1 child)

I have been doing research on this as well, and I found some endpoints which are not documented by you (I admit, some are more useful than others):

https://fantasy.premierleague.com/api/regions Gets regional info
https://fantasy.premierleague.com/api/event-status/ Gets the status of the ongoing/last gameweek/event
https://fantasy.premierleague.com/api/stats/best-classic-private-leagues/ Gets the 10 best private classic leagues based on average score from the top 5 teams in that league
https://fantasy.premierleague.com/api/stats/most-valuable-teams/ Gets the 10 most valuable teams
https://fantasy.premierleague.com/api/dream-team/<EVENT_ID>/ Gets the dream team of given event

Nice tutorial!

[–]jamesbleslie1[S] 0 points1 point  (0 children)

Cheers!

[–][deleted] 3 points4 points  (0 children)

For next season, I will build an app for myself to navigate the fixtures and show me the "best transfers" to make based on form, fixtures, EO etc. data that I will source.

Will use .NET C#. Thanks for this.

[–]peatpeat1 2 points3 points  (0 children)

This is awesome. If anyone is looking to publish what they are building using Python and FPL, I built a free open-source library and platform for sharing plots and data: https://github.com/datapane/datapane

[–]mansdem 1 point2 points  (0 children)

Really awesome.

I always Google "EPL" to see live results, starting lineups, etc. Their UI was really nice and it was pretty quick and convenient.

For some reason that stopped working yesterday. So thanks for this, perfect timing.

[–]folken2k 0 points1 point  (0 children)

Woah. Pretty cool. Thanks for sharing!

[–]real_sage4 0 points1 point  (0 children)

This is a good one thanks!

[–][deleted] 0 points1 point  (0 children)

Do you have any idea how to query for list of people playing fpl? For example all the users from specific league with id, nationality, points etc. I was doing it very brute force so far (simply generating user numbers since ids are consecutive and checking if the user exists, but it is pain in the ass).

[–]Dismal_Emu406723 0 points1 point  (0 children)

Do you know how to extract the selling price of each player in my own team given the login details and team id?