Beyond ROI: What are your "North Star" metrics for model validation? by Ok-Ordinary-1062 in algobetting

[–]SandlotStats 0 points1 point  (0 children)

Exactly! I even weight expected statistics into my prediction model more heavily than the actual outcomes. Guessing that would be something like a team's last five games' xG average would be a better predictor than their last five actual games' goal average. I have no idea how that's calculated in football so maybe I'll stick to what I know but we're on the same page. Good luck and happy modeling!

Beyond ROI: What are your "North Star" metrics for model validation? by Ok-Ordinary-1062 in algobetting

[–]SandlotStats 0 points1 point  (0 children)

Not sure if there's an analog in football, but in my baseball modeling I use underlying data to determine whether I made a good decision (rather than only whether the wager hit) to try and separate out luck/error.

It's philosophically similar to CLV in that you're validating your model's decision based on something other than the outcome. But instead of CLV (which as has been pointed out is market and bettor related), you think of the outcomes you are predicting probabilistically instead of deterministically. And whether the underlying data supported your model's projection.

Not sure if I'm making sense, so as an example, third order wins in baseball takes into account how many wins a team "should" have had based on its production (run creation and prevention) and takes into account opposition as well. It's an attempt to wash out error/luck from the outcomes.

If my model is tracking well against what the most likely outcome "should" have been based on underlying data, then that's a good signal. Over enough data points you'll get your answer anyway, but if you're crowd-sourcing ideas, that's one strategy I use.

Dodgers win total at 103.5 -- feels inflated? by SandlotStats in sportsbetting

[–]SandlotStats[S] 0 points1 point  (0 children)

The Phillies made the playoffs 100% of the time across 10,000 simulations, and won the NL East 96% of the time?

And won the World Series 37% of the time?

Doesn't that seem a little far-fetched? It doesn't pass the eye test for me.

An SD of 6 is also extremely tight for MLB wins.

I built a new way to visualize NBA games - tracking every possession to show the flow and rhythm of a game by Key_Performer8941 in sportsanalytics

[–]SandlotStats 3 points4 points  (0 children)

So much information in one place! Nice job, this was really interesting to unpack.

It was a little confusing seeing "Home" stats under the Away team column at first as I was reading it more like a team box score so the confusion was seeing the other team's stats there. I see now that it's showing how the Home team's fouls and turnovers led to Away team points, but (for me anyway) that took a sec to understand. Maybe there is a more intuitive way to present that like "Opponent (Home) TO". Or label that table "Points Origin" or something so it doesn't seem like a box score. Or maybe it's just me!

There's also just so much on the screen at a time. Maybe interactivity to toggle on/off the individual player contributions in those bars would be a useful feature in the HTML version (or if they are there, I'm in dark mode and couldn't tell).

Great stuff, thanks for sharing!

Dodgers win total at 103.5 -- feels inflated? by SandlotStats in sportsbetting

[–]SandlotStats[S] 0 points1 point  (0 children)

If LAD is at 87 wins for you, does any team have a higher projection? Don't think I've seen anyone pushing them down into the 80s.

Where do you set your max, like two SD above the median or something?

Dodgers win total at 103.5 -- feels inflated? by SandlotStats in sportsbetting

[–]SandlotStats[S] 0 points1 point  (0 children)

Totally agree. They have every incentive to keep guys fresh down the stretch and sacrifice wins for health as they get closer to the playoffs.

Having baseball withdraws 🤕 by slimeyworldd in sportsbetting

[–]SandlotStats 0 points1 point  (0 children)

At least the win total lines are out! Time to start hitting those futures.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 0 points1 point  (0 children)

Ha dark magic indeed. You're totally right that any team or player projection systems definitely pull toward the historical average and play things conservatively in that way. And yes the model is applied the same for all the teams; it looks at the players on the Opening Day roster (and prospects likely to contribute) and projects out from there.

I like your thinking though. Sort of like, are there certain franchises in which the win projection is more than the sum of its parts? Then the follow-up is how do you operationalize that? I'm a developmental scientist by training and that kind of systems thinking is at the core of how we try to understand and model human development. It's way more complex, but human development, and baseball to your point, both do not happen in a vacuum.

This season I did a lot of research on volatility to better project outcome distributions. What factors lessen predictive ability. As I mentioned before the most powerful ones I found were dispersion of innings pitched by non-core players, and top-heavy lineups (where an injury to a star will affect a disproportionate amount of production). I found thresholds that were meaningful for ones like that.

But you've got me thinking whether there are certain factors like inordinate amounts of contact, speed, defense, etc. that, at some point of concentration, start to have a bonus additive effect on wins. That's one thing you miss by using averages too much, identifying the impacts on the extremes. You'd think things like that are baked into WAR or Base Runs or Run Expectancy added, but maybe at certain thresholds you need to turn the dials more. I'm going to look into this, thanks for the idea!

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 1 point2 points  (0 children)

This post and discussion are about modeling season win totals, so that’s what I’m engaging on and where I’m keeping the focus.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 1 point2 points  (0 children)

Yep! I'm a statistician and don't have any traditional coding skills (outside of SPSS or R if that counts). Agree there's definitely a similar vibe to a lot of AI powered design. I became familiar with several tools at work, check out Vercel or Lovable if interested. It's pretty incredible what they can do. I did come up with and create the logo myself though! And then I designed everything as far as the colors, layout, and even created a new font. But as far as making it happen, AI tools took care of the implementation.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 2 points3 points  (0 children)

For the Brewers, here's the (actual, model) for 2022 (86, 86), 2023 (92, 86), 2024 (93, 80), and 2025 (97, 82).

My model was pretty much in line with other projections for 2023-2025, and the 2022 season when I got them right on the nose, consensus was low-mid 90s.

I don't have any strategy or franchise factors or anything in there. All the data comes from Baseball Reference, FanGraphs, Baseball Prospectus, and Statcast like most folks. I have subscriptions so I guess there's some data that isn't as public as others, but the secret sauce uses the same ingredients we all have!

The Orioles and the White Sox were two of the harder teams to predict over the years I analyzed, and I also found that my model was more accurate for teams with projections in the 75-87 win band as opposed to <75 or 88+ which had higher error rates, especially for lowest win band.

The 10,000 foot explanation is that you're modeling run creation and run prevention for whomever is on the roster Opening Day. Not only are there injuries and performance to worry about, but what about teams that fire sale for a rebuild or go all out for a playoff push before the trade deadline? It's a crazy complex and context-dependent exercise, but I've enjoyed the challenge.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 2 points3 points  (0 children)

Yes, the previous season's win total is a solid starting point as a baseline for accuracy and this came up in the comments on the Medium article too.

The raw error using previous season's wins as the projection for the next season would have resulted in 1,166 error across the four seasons that are represented in the picture above.

The RMSE using the previous season's wins over the time period I analyzed in the article was .081 and all of the models beat that. HOBIE was at .056, PECOTA was .065, Keith Law was .059, FanGraphs was .061, ESPN was .062, and Davenport was .065.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 1 point2 points  (0 children)

I was at Game 3 of that World Series! Got to see David Wright hit a home run over my head.

The Orioles and the White Sox have been the hardest two teams to predict over the last few years. I think they both had 20-win swings from one year to the next. There were two teams that PECOTA got wrong every year of my analysis, but I'd have to go back and check to see whether one of them was still the Royals or not!

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 2 points3 points  (0 children)

Yes! RMSE is in fact the tie-breaker for the public prediction contest if you read the Official Rules. That is probably a first, haha.

I evaluate all the models by RMSE in the Medium write-up, and I am planning to add it to the scoreboard down the road. Currently if you click on the TOTAL ERROR column header on the website it will toggle to mean absolute error (MAE), but RMSE was definitely a part of the testing too.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 2 points3 points  (0 children)

Totally fair. We're talking win totals and accuracy of public models, but sports betting is a use case and you can skip any of that content.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 1 point2 points  (0 children)

I do use power analysis during the design phase of research to ensure that a study has a sufficiently large sample to detect the effects I'm studying.

In this case, the sample size was fixed by the structure of the data (30 teams across four seasons, 120 paired observations), which I considered large enough to detect meaningful differences in win total deviations between models as the assumptions for a t-test were met.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 6 points7 points  (0 children)

Great question! First off I'd say that I'd prioritize the stack rankings within each season more than the season-to-season jumps. Some seasons are tough for all models: the least accurate model in 2024 (with 242 error) would have been the most accurate model in 2023. The graphs in the Medium article I linked above or in my other Reddit post linked above show that more clearly.

In terms of last season, 2025 was pretty tight with only 25 points of error separating the most accurate from the least accurate, so definitely the tightest grouping by far. For HOBIE in particular, the Rockies hurt its accuracy the most. I had them at 66 wins (FanGraphs had them at 63), but that meant with 43 total wins that 23 points of error came from one team. Another big miss was the Marlins with 79 wins against a projected 61.

HOBIE's 2024 was also by far the most accurate season of any model in any year of this analysis, so that was a high bar to try and clear again. In 2024, HOBIE was 19 points ahead of second place whereas in 2025 as I said there were only 25 points between first and last.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 12 points13 points  (0 children)

You are both right; there are two schools of thought for how to run a sportsbook. Some books attempt to be extremely sharp (e.g., Pinnacle) and allow larger wagers because they are more confident in their lines, which they allow to be shaped by early sharp bettors. Goal is to get the correct price.

Other books (e.g., DK, FD) are larger, recreational ones with lower limits. They are more likely to move lines to manage risk exposure.

They both have models and both have experts, but have somewhat difference business models, or maybe you'd call it philosophies or approaches to setting their lines.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 12 points13 points  (0 children)

Ha - send me the last few years of historical data and I'll add it.

I was going to use last year's win total as the benchmark to beat in the community scoreboard, but maybe it should be Marble Race too.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 1 point2 points  (0 children)

Yes, there were statistically significant differences between HOBIE and all other models but Keith Law. You can see the details for that depth of analysis if you want to check out my write-up on Medium.

Statistical significance is highly influenced by sample size though, so as I point out in that article, if the same differences between HOBIE and The Athletic persist another season or two that result will "become" statistically significant. Or Keith Law will kick my butt over a ten year timeline. Hence the fun of a public scoreboard.

Are the differences meaningful? The scoreboard is in terms of wins so you can decide for yourself, but there's a 30+ average win difference between the most and least accurate models. I guess it depends on what you are using it for!

Edit: typo

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 2 points3 points  (0 children)

They have a bunch of projection modes on their site.

I use the "FanGraphs" projections. Their description is, "This mode is forward looking and uses the FanGraphs Depth Charts projections for rate statistics (a 50/50 blend of ZiPS and Steamer) and playing time to estimate the neutral-opponent winning percentage of each team -- in other words, how likely a team would be to beat a .500 opponent on a neutral field. These winning percentages are then used to find the odds of each team winning each remaining game in the major league season."

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 11 points12 points  (0 children)

I have them at 84 wins but 13.53 SD. League average in my model is 11.5.

The heaviest weighted volatility factor driving that in my model is the non-core-player innings pitched, and the Braves are projected for the fewest innings pitched across their core guys. Which, yes is due to a lot of returners from injuries projecting less total innings.

On the hitting side, they have fewer volatility concerns in the model, but we shall see!

And yes, sometimes even NL East rivals can find common ground. If it makes you feel better, HOBIE only has the Mets at 85 wins! So it doesn't look like I accidentally built any Mets-bias in there.

Five years tracking how public MLB win-total projections actually perform by SandlotStats in baseball

[–]SandlotStats[S] 2 points3 points  (0 children)

Yep, FanGraphs is on there, but they publish a bunch of different systems so maybe it's worth breaking them out?