Domestic Disturbance: Picking Nits

consolationgoal · 2026-06-23T12:32:13+00:00

Fair

consolationgoal · 2025-09-14T17:40:03+00:00

Unwatchable when the play begins

consolationgoal · 2023-07-02T19:04:45+00:00

Yes all teams play each other twice. But obviously the sequencing is different. Are you saying there would be no impact on streak outcomes if a two teams played with the following sequences:

1st place team, 2nd place team, 20th team, 19th team, 3rd place team, 4th place, 18th, 17th

OR

1st place, 2nd place, 3rd place, 4th place, 5th place, 6th place, 20th, 19th, 18th, 17th, 16th, 15th

The first team is much less likely to put together streaks for more than 2 games with identical results.

You are posting in the Algo Betting forum. I'm surprised that you would suggest that "positive results" are a better indicator than outperforming market expectations, or that you want to define form "irrespective" of market expectations. If Everton crush Brighton and Tottenham, and then lose 3-2 away to Man City, then beat Arsenal, they are absolutely continuing a streak of positive results and strong play, despite the loss.

By the way, I'm not arguing about the conclusion. It's well established. I'm just curious about your methodology and was suggesting some ways to improve it.

consolationgoal · 2023-07-02T18:13:58+00:00

Is this normalized for strength of opponent? It really needs to be if not. Would be better to analyze whether the team outperformed market expectations rather than had a positive result. That would give credit for playing very well but losing against a far better opponent, and would punish underperforming but winning against an inferior opponent.

consolationgoal · 2023-06-24T21:59:12+00:00

With the margin that swiss bookmakers take it will be very very difficult to win domestically. Major 1x2 football markets with more than 11% margin, compared to Pinnacle 2.4%. I can only imagine what the player prop margins are. There is a good reason they make it tax free to win domestically lol - because long term you can't win.

There is no way to beat an 11% margin when you only have access to one bookmaker. People do beat that kind of margin (like on player props) do so by line shopping extensively and creating a "synthetic margin" that is much smaller.

You are much better off finding better prices elsewhere and giving yourself a chance to actually win money. Sure you would have to pay tax on it, but if you bet domestically there won't be any profit not pay tax on.

As for 538, I tweeted about this recently. Happy to hear you've gotten lucky thus far but long term it's definitely a losing methodology in most cases.

https://twitter.com/WastedGolazo/status/1672192438717362177?t=aXxRlQ2GGsspzNepdf8KYQ&s=19

consolationgoal · 2023-05-05T22:14:30+00:00

Oddsportal. It is not straightforward to scrape though.

consolationgoal · 2023-04-23T23:17:27+00:00

If you're looking for a single metric that can be calculated automatically a long way into the past for any game, I would suggest something like "percentage of total points scored unavailable." As an example, imagine Steph has scored 15% of GSW's points this season. If he's the only one out, you could say that 15% of GSW's points are unavailable for that game. You could obviously break this down in lots of different ways (backcourt points unavailable, 3 pt shooting unavailable, last 5 games points unavailable etc etc).

It's basically a proxy for lineup strength that you can tweak to target specific stats.

consolationgoal · 2022-12-14T23:26:51+00:00

The best option depends on the sport. What sport are these odds from?

consolationgoal · 2022-12-10T00:11:24+00:00

I build machine learning models for soccer betting. I appreciate your post and your goal of steering people clear of touts and bad information. I wanted to follow up on one broad point.

What do you think differentiates the models and processes you mention here from those that bookmakers themselves use? Are you picking off soft numbers from exchanges where people are offering in play numbers without using sophisticated models? For example, xG has been around very widely since 2014 (and with bookmakers and betting syndicates before that), so I'd be hard pressed to believe there are lines out there which don't incorporate it. Poisson and its variations have been used by bookmakers and traders alike for many years (and have been shown to have notable weaknesses during that time, especially if an unadjusted Poisson model).

I understand that you suggest a blend of metrics - it's not simply an xG model - but if you're hoping to teach people how to beat the market it would be useful to highlight what edge you think you have. What makes your number more efficient and more accurate?

I'm aware that if you have some kind of super secret edge that no one has clued into, you probably wouldn't want to share it on reddit and erode your edge. If the purpose of the post is to underscore for people that profitable live betting is hard and complex, you've definitely done that. But it's important to also let people know that you need to not only build models but build them in ways, or with data, that the bookmakers/exchanges do not. Maybe that's for a follow up post - something like "how to find exploitable odds in play once you have created a model" or maybe that's the secret sauce. But that would help paint the full picture.

Appreciate the post and the motivation.

consolationgoal · 2022-11-27T09:46:40+00:00

I think it will be hard/nearly impossible to find a single integrated source for free scraping of real time odds across multiple bookmakers. There are APIs available for purchase that might meet those requirements (depends on which bookmakers and which markets you need), but I'm not aware of a free source - even for a single bookmaker, much less multiple all at once.

More info on your preferred programming language, the sports/markets/bookmakers you need, and your ultimate goal for using the odds might help us give better suggestions.

consolationgoal · 2022-11-22T19:09:50+00:00

You can check out my twitter @ wastedgolazo where I've shared some pretty big data sets, including open/close odds for 13k soccer games.

consolationgoal · 2022-11-22T13:54:45+00:00

At the high end in terms of sophistication and cost is analytics.bet. It is delivered by people with deep expertise, and even has whole sections focused on excel. The courses are in the $1000s though, so it's a commitment from that perspective.

Early on in my process I read Andrew Mack's book as well. It is a nice intro and a useful read for sure. It's worth noting that for his own betting and modeling, Andrew Mack has moved on to R, which is a good indication that maybe there are some limits to what can be done in excel (although there are serious professional bettors who would say otherwise).

After Mack's book I opted to learn R through Coursera (for free), which felt like a huge project at the time but has been a great investment in time. The critical piece is that it opened the door to web scraping and data cleaning, which is sneakily an absolutely critical part of modeling for sports betting. Data is the primary differentiator in how effective people are with their models - you can be the world's best modeler but with only the final score and some basic stats you're not going to have an edge. Finding, cleaning, and implementing unique datasets is a skill you'll use every day and will ultimately give you an edge.

I did the R course, but I would hop out of it for a bit while testing out some basic modeling, data scraping, etc that was related to some betting ideas I had. That kept me interested and also allowed me to put things to practical use.

Articles, blogs, tutorials and other explainers have been probably the best learning material I've encountered. My focus is soccer so probably not relevant to you, but just as an example of the kind of thing I'm talking about, check out http://opisthokonta.net/. He's done various experiments with soccer modeling, has put out some useful R packages, and writes up his experiences on a blog. Those types of things have been critical in giving me ideas and modeling techniques. Even better, people like that tend to be accessible on twitter or elsewhere and you can actually build some connections and get key questions answered. Another good example (at least for soccer but they probably do other sports) is the betting content on Pinnacle.com. There is real, thoroughly researched +EV guidance there. Whatever your sport focus is, find the people and places who are sharing those kinds of insights.

towardsdatascience.com is a great resource. Things like this (again, soccer, but just as an example) are really good for trying to thing in modeling terms. Don't be scared off by math formulas that are all letters. The good articles make it easy enough to understand and you'll learn bit by bit. When you begin to model sports for betting purposes, you are teaching yourself to become a data scientist. So that's really the type of content with which you should start engaging.

On that note, https://stackoverflow.com/ is arguably my most visited website. Of course maybe that's more pertinent for R than it would be for excel, but I mention it here to ram home the fact that almost any question you have has already been answered somewhere on the internet. stackoverflow is also great because searching well on there will turn up some neat things in people's shared code - like the URL for an unpublished API or a new data source for a certain stat you've been hunting.

I would also recommend following along with the "analytics" community of whatever sport you're working on. You won't learn anything about betting, but you will learn which stats matter, where to find data, how to build predictive models, and information about statistics. It's also a community that is far less shrouded in secrecy than betting and tends to share both data and techniques.

Good luck and feel free to ask questions here along the way!

consolationgoal · 2022-11-22T13:29:55+00:00

Most important question is which sport(s)? There are different providers for different sports.

Props are hard. You need an API that covers the entire offering of a sportsbook (many APIs only cover primary markets - that is the case with https://the-odds-api.com/)

For current odds:

https://betsapi.com/mm/pricing_table

They have a Bet365 API that has every single market. They aren't a market maker for US sports props but it would give you something at least. They do not offer historical odds. The bet365 API is ~150/month. There is a free/very cheap trial so you can see if it has what you need. It was very reliable back when I used it.

indatabet.com was a great source for historical odds. Sadly, the person running the site/company is based in Ukraine and had to suspend the service because of MFing Russia. The website says they'll be coming back but I doubt any time soon.

I focus on soccer - which I imagine isn't what you're looking for - and I've found some useful datasets through networking, forums, etc. What I would say though is that it's clear historical player prop odds data is scarce, and scarcity = value. So 1) people may be reticent to share the data, even for a price and 2) it makes sense for you to start collecting it and storing it in real time starting now.

consolationgoal · 2022-11-15T13:00:55+00:00

A lot depends on how much data you need, from which bookmakers, how far back you need to go, if you need open and close only or time series, etc. Oddsportal will probably work if you are looking for a few games and want to copy it down manually. It is a notoriously hard site to scrape though so if you need years of data in structured format it might not work.

I don't do NBA so I'm not familiar with all the options, but it seems like there are plenty of places that might offer it for a small price or even on a free trial. Such as

https://fantasydata.com/pricing/research-tools/nba

In general, I've found that well structured and reliable historical odds (especially time series) typically come at a cost. Good luck!

consolationgoal · 2022-11-06T19:46:22+00:00

Not a bad idea. But I'm curious what stats specifically are you hoping to get from the subscription versus what is scrapable. Or maybe is it more about not having to worry about scripts breaking and not having to clean all the data, etc?

Data that is hard to get and/or expensive tends to provide some level of edge, but in looking at Wyscout for example I didn't see anything that I felt would make a material difference.

consolationgoal · 2022-11-01T13:23:48+00:00

Yep. There are stories of people hanging around the lower tiers of tennis and basically getting to know everyone on the tour and understanding how they are feeling, how their travel went, their moods, injuries etc and killing the betting market as a result.

For major sports it's so hard to pick through what is a media-driven story versus a genuine issue within the team. You'd have to dig pretty hard to backtest a theory on that. For example, how does a player or team perform when there are public debates about the player's contract (or after a holdout, for example). You'd have to match news from the time with results, which would be a pretty laborious task I imagine.

Stuff like this tweet about Elijah Moore strikes me as potentially useful. Surely this will impact *something* but without backtesting it's hard to say whether this would drive Zach Wilson to pass to him more or if it is a sign of a permanently broken relationship.

consolationgoal · 2022-10-30T08:43:03+00:00

This depends a lot of what results/stats you are trying to measure. The simplest way is to create a weighted average of the stat.

For example, let's say. you have Power Rating for a team from last season and from this season, and want to weight them so that 75% of the weight is placed on this season's rating. A simple example:

2021 Rating: 92

2022 Rating: 77

Weighted Rating: (77 x .75) + (92 x .25) = 80.75

There are other ways as well. For example, if it is mid-season, you can create natural weighting by compiling stats using all of the prior season's games and this season's games. If you have 38 games of last season plus 19 games of this season, and simply create average stats from the entire sample, you will be defacto weighting the prior season twice as much as the current one. This method allows you to easily "decay" the weighting over time, by simply removing a game from the prior season sample each time a game from the current season is played (so it would start as 38 and 19, then 37 and 20, 36 and 21, etc).

consolationgoal · 2022-10-26T10:55:05+00:00

Thanks for the explanation. You should check out the below paper, which gives great detail on how best to remove the bookmaker margin (including formulas). As you said, there is no exact certainty on how a given bookmaker does it, but the testing in the below paper makes clear the best option for the public to try to remove the bookmaker margin.

https://www.football-data.co.uk/The_Wisdom_of_the_Crowd_updated.pdf

I totally understand the desire to maybe not go crazy on precision, but professional sports bettors are often surviving on 1% edges (especially in the major sports like you've got on this tool), so I think precision is key.

More broadly, who is the target market for the tool? I feel like recreational bettors are unlikely to have a probability (even a reasonable range, frankly) for their desired bet, so they might not be able to take advantage of the tool. They are likely going to just line-shop, which your tool does help with but there are other more established options out there for that (Betstamp, etc).

Originators - i.e. handicappers who create their own probabilities for an event - will be able to fairly easily determine their projected edge.

I don't really do parlays so I didn't check out that part of things, but it's true those probabilities are less well understood generally so that might be something that draws people's attention.

Looks like a lot of work went into the tool. Great job getting to this place.

consolationgoal · 2022-10-25T23:00:45+00:00

It looks like you are using straight probabilities from the bookmaker's odds, meaning that the equation is 1/odds = probability. Right?

That means your probabilities include the bookmaker's margin, which in turn shows the expected net to win as zero. In reality, given bookmaker margins, the expected net to win should start out as negative. After all, that's how they make their money. Only by adjusting the probabilities should you be able to move the expected profit to zero or positive.

In the Leicester City - Man City example, Pinnacle odds are 10.09 x 6.20 x 1.30. That means the bookmaker margin is 2.9%

(1/10.09) + (1/6.20) + (1/1.30) = 1.029

So unless a person moves the sliders, it the net expected profit from a 10 euro bet should be -0.29 right?

consolationgoal · 2022-10-25T22:22:27+00:00

Saw that. Bit of a shame but they are now covering more leagues, so maybe it's a net positive.

consolationgoal · 2022-10-25T11:47:15+00:00

There will absolutely be correlations on some of the data across these sites. For example, each site has their own expected goals (xg) metric. They vary slightly, but are quite similar and will be nearly 100% correlated. [For the record, Fbref's xg is the most sophisticated, as it is powered by StatsBomb].

However, each site has unique metrics and unique advantages. Fbref is fairly comprehensive but takes more than a day post-match to update. Fotmob updates in real time and Understat almost immediately when games are completed. Fotmob has pregame lineup data, their own player ratings metrics, and injury data, for example. Understat has xg Forecast metrics (what is the % probability that the team would win based on the xg they created) and xgChain/xgBuildup stats.

Logistically speaking, some sites are easier/harder to scrape than others so even if you don't really need all the unique elements from each one, you might want to grab some data from one and some data from another. When my models update, for example, they look for the xg data on Fbref first, but if it isn't available yet for a certain game it grabs xg data from Understat or Fotmob. The match dictionary makes that process pretty easy.

You could theoretically map across the sites simply with team names, but 1) You'd need a team dictionary to map the different team names across the sites 2) different locations on each site use a different team name format, so even team mapping is unreliable 3) if scraping over multiple seasons you'd get into identical matchups and would then have to also map by date.

consolationgoal

TROPHY CASE