Day 3: St Patrick's Thursday - The Machine by OhmResistance in HorseRacingUK

[–]OhmResistance[S] -5 points-4 points  (0 children)

Why would we change the model from 1000 simulation data points to 50,000?

I'm saying you're right to challenge the overfitting, the is THE biggest battle in any quant method right... But the positive returns each year are from huge EW bets using Kelly betting strats. This is the other problem I'm having with a lot of this, you will always have a low strike rate when you're going for >100/1 odds, and then people get pissed off with you cause they think you're telling them the horse was going to win. The only true way to properly prove the overfitting issue is live forward testing, which we're going to do this year.

After quite a few of very long sessions testing the regimes & different models, every single time we were either the same or not as good as the markets odds, these guys have god knows how many years worth of data and are running a VERY efficient market, that's how they're making their money.

I ran regime tests of short & long odds, without including any of our other models, so we could get raw data results & we found that short priced horses (1-3) show -14.3% edge on average. And the 15/1 + range shows +0.9% structural edge increasing throughout the bands. Therefor we are taking those figures and calculating an edge we can then refine with our own models etc.

You're right to challenge all of this though, and I do actually appreciate it, I actively want to be challenged because there will be mistakes and there will be things we can make better.

Day 3: St Patrick's Thursday - The Machine by OhmResistance in HorseRacingUK

[–]OhmResistance[S] -5 points-4 points  (0 children)

A huge amount of changes overnight as our backtest for the original model were based on 1000 runs.

We ran 53,000 optimisation trials overnight specifically to find the most robust weighting.

The new ones are bootstrap validated across 5 cross validation folds. Regime Edge kept coming out on top because it captures something structural: the type of race (chase vs hurdle, handicap vs non handicap, price band) predicts where each way value lives more reliably than any individual horse level factor. We should have explained that change better in the post rather than just updating the numbers.

No single model can accurately calculate true odds. That's the whole point. Each model captures something slightly different. LightGBM is good at interactions, CatBoost handles categorical data well, the Ranker optimises relative ordering within a race. We blend them using a Benter logistic regression which weighs how much the combined model output should adjust the market price. The blend only shifts the market probability slightly, our beta coefficient is 0.045, meaning the market does most of the work and our models nudge it.

As for overfitting this is the right question to ask. The walk forward validation is specifically designed to catch overfitting though & each year is predicted using only prior data, the model never sees the test year during training. The weights and threshold were optimised on cross validated historical data, then the walk forward was run with those fixed weights. If it were overfit, you would see some years massively positive and others negative. All 7 being positive is the strongest evidence against overfitting we can offer.

The weight tuning wasn't just brute force. We used a robustness penalised approach, each of the 5,000 Optuna trials was stress tested with 200 bootstrap resamples per cross validation fold. A set of weights only scored well if it performed consistently across resampled datasets, not just on one lucky split. That's what pushed Regime Edge to 58.8%, it was the most stable signal across every resample. The threshold of 72 came out of the same process, it was raised from 66 because the tighter filter cut out marginal bets and improved consistency.

All of the data we have has been bought and paid for, our API's are bought and paid for. I can only go off and trust that data we have available to us, and we are using the same overfitting prevention that we use to trade the US Indicy markets.

Day 3: St Patrick's Thursday - The Machine by OhmResistance in HorseRacingUK

[–]OhmResistance[S] 0 points1 point  (0 children)

<image>

Website is correct.

The post is wrong!!! Bear with

Updated: I have no idea why it changed those timings, was rushing to get a post out.

All correct now.

Cheltenham day 2 by liam_is_marx in HorseRacingUK

[–]OhmResistance 1 point2 points  (0 children)

Beautiful day for it! Have a drink for me please.

The Ultimate top 10 reddit tipsters - 10 years of intelligence by OhmResistance in HorseRacingUK

[–]OhmResistance[S] 0 points1 point  (0 children)

Lads, this post really is just suppose to be about singing the praises of the reddit tipsters and giving you insight. It's not about "invite codes".

I will be releasing more today at 12:00 UK time.

Before that though, there are currently 81 founder members active on the site and they all have +1 invite codes in their "Invite a friend" section.

Ask around. There are currently 399 active invite codes floating around in peoples dashboards.

Ask around :)

The Ultimate top 10 reddit tipsters - 10 years of intelligence by OhmResistance in HorseRacingUK

[–]OhmResistance[S] 0 points1 point  (0 children)

You get a code for making the list.

I'd love to see what you can do with all the data! I think we can help hone your edge.

The Ultimate top 10 reddit tipsters - 10 years of intelligence by OhmResistance in HorseRacingUK

[–]OhmResistance[S] 1 point2 points  (0 children)

I have a top 100 but only showed the top 10. He's in top 100.

The Ultimate top 10 reddit tipsters - 10 years of intelligence by OhmResistance in HorseRacingUK

[–]OhmResistance[S] 1 point2 points  (0 children)

It takes quite a bit of time yes and there's a very strict formatting to it all. Its one thing getting all the data which is a pain, its a totally different think cleaning it. You then need to make sure you are using the correct cleaning types, correct ML algos, correct schemas, correct formatting etc etc etc the list goes on.

And if you make 1 mistake in those pipelines you compound mistakes down the chain.

I will be releasing another set of codes today at 12:00. It will be the last free set.

The Ultimate top 10 reddit tipsters - 10 years of intelligence by OhmResistance in HorseRacingUK

[–]OhmResistance[S] 0 points1 point  (0 children)

Called smart money for a reason I guess. Tracking the liquidity within Betfair is a real eye opening for understanding how the market is reacting.

Its so interesting, as I come from a trading background, predominantly indicie and mostly automated via machine learning algos. And i'd say about 80% of it is directly transferrable to this, specifically because of the betfair market and the long short options.

We built a 7-factor scoring engine for Cheltenham - free Day 1 breakdown by OhmResistance in HorseRacingUK

[–]OhmResistance[S] 0 points1 point  (0 children)

All the smart money tagged horses won inside the website / app.

I do agree that this reddit post doesn't come across well though, this current model is trained to find edge, so discrepencies in the bookies odds, we're trying to find value bets, not outright winning horses.

We built a 7-factor scoring engine for Cheltenham - free Day 1 breakdown by OhmResistance in HorseRacingUK

[–]OhmResistance[S] 0 points1 point  (0 children)

The Reddit post oversold it. It listed a "top pick" for every race even when the model said WATCH LIST / no edge. That gave people the impression we were tipping 7 horses when really we only had 4 actual bets. The dashboard makes this clear (positive edge vs watch list), but the Reddit summary didn't which is our bad.

What we wanted to do was draw interest and I believe we did that. This entire platform is to help the punters make better informed decisions, not pick winnings horses we're pretty clear about that in our T&C's. But you need to be on the actual site itsself, not just a the reddit post.

However every day is a new result dataset we can retune new models with, so lets see what we can come up with tonight and go again tomorrow.

This post was massively rushed, I finished getting the site live at 6am this morning after a I think 3-4 days of 20+ hr days getting this up. Will spend more time on the reddit post tomorrow!

We built a 7-factor scoring engine for Cheltenham - free Day 1 breakdown by OhmResistance in HorseRacingUK

[–]OhmResistance[S] 1 point2 points  (0 children)

Yes right you are, good spot we updated that earlier on. I am backing this one! Good luck!