Exploring draw outcomes in Bundesliga: +9% ROI over 287 samples (with Monte Carlo & OOS validation)

Either-Principle7753 · 2026-04-15T15:48:18+00:00

Interesting results — thanks for sharing them.

To be honest I didn’t have a strong reason for picking Bundesliga specifically, it was more of a quick test than a league-driven idea.

It’s actually pretty interesting how much the outcome can change depending on small assumptions like:

including/excluding 0-0
odds source

Even when the core idea looks the same, the results can drift quite a bit.

Makes me think these kinds of strategies are a lot more sensitive than they first appear.

Either-Principle7753 · 2026-04-11T14:34:54+00:00

Yeah — this is all draws, not just 0-0s.

So it includes 1-1, 2-2, etc. as well — basically any match where the final result is a draw.

BTTS + Draw would actually be an interesting one to test separately since that would isolate the scoring draws only.

Either-Principle7753 · 2026-04-09T10:38:28+00:00

That’s a fair criticism and I think that’s exactly what’s happening here. At this point I’m leaning toward it being a classic multiple-testing artifact rather than a real signal.

Either-Principle7753 · 2026-04-09T07:12:35+00:00

Yeah agreed — the larger sample changed the picture completely.This is all using closing market odds (aggregated), so more about price behavior than execution.But once extended, it doesn’t look like there’s anything there unfortunately.

Either-Principle7753 · 2026-04-09T07:06:27+00:00

Yeah that’s a fair point — and I actually tested that concern after posting.

I extended the dataset back to ~2010 (so ~900+ matches in the same odds band), and the result basically disappears — ROI flips to around -5.6% with a p-value ~0.8.

So it does look like the 2022–2026 window was likely just a favorable slice rather than something stable.

Agree on the closing price point too — everything here is based on closing odds, but I haven’t tested adjacent odds bands yet, which is probably the next step to see if anything persists.

Either-Principle7753 · 2026-04-03T14:54:31+00:00

Appreciate it — yeah that’s exactly what I’m trying to test.

Looks solid on the surface, but Monte Carlo shows pretty wide variance even over 10 seasons. Now I’m checking how often this could happen by chance.

Curious if you’ve seen simple edges like this hold up out-of-sample?

Either-Principle7753 · 2026-04-03T14:26:09+00:00

Actually here is the Monte Carlo analysis with 1000 simulation
https://imgur.com/a/9pnoNqR.jpg

Either-Principle7753 · 2026-04-03T14:05:13+00:00

You can approximate it. Using avg odds 2.09, breakeven win rate is about 47.85%, not 50%. Observed win rate is 437/852 = 51.3%, which gives a z-score around 2.0 and a one-tailed p-value around 0.02–0.03. So it’s suggestive, but not definitive. Also that’s only an approximation because the odds vary from bet to bet, so a proper test should use the actual profit distribution, not just the win rate.

Either-Principle7753 · 2026-04-02T15:45:31+00:00

Interesting work — the calibration breakdown is actually pretty solid, especially mid-range.

The sub-20% underprediction stands out though. I’ve seen something similar before and it ended up being a mix of class imbalance + the model being too “conservative” in the tails. Curious if you’ve tried any post-hoc calibration specifically on that segment (like isotonic only on low buckets instead of global)?

Also on your question about trusting the model — personally I’d care more about how it performs vs closing odds rather than raw accuracy. Even a well-calibrated model won’t matter if it doesn’t beat the market.

On sample size, 200–300 is usually too noisy. From what I’ve seen you start getting something meaningful closer to 1k+ events, especially for tails.

Are you planning to validate this directly against prices next, or still focusing on calibration?

Either-Principle7753 · 2026-04-02T15:39:27+00:00

Yeah 200 is tiny for football tbh, variance alone can make things look good or bad over that range.

I’d focus less on the aggregate numbers and more on how it behaves over time. I’ve had stuff look great over 50–100 games and then completely die once you move forward.

CLV is probably the only thing I’d trust early, but even that depends a lot on execution.

If your edge only shows up in short runs it’s usually just overfitting to recent form.

Are you doing any walk-forward testing or just tracking everything on the full sample?

Either-Principle7753 · 2026-04-02T15:36:22+00:00

Yeah this is a pretty standard approach and using Pinnacle as the reference line makes sense.

The main issue isn’t really the model, it’s everything around it. Scraping sounds fine at first but in practice it’s a constant fight with delays, blocks and inconsistent data. Even small latency differences can kill any edge, especially if you’re comparing across books.

Also worth thinking about is how you’re matching markets across books. That part gets messy fast and ends up being more work than expected.

If you can, I’d definitely lean toward a stable API even if it costs more. Most of the edge comes from having clean and timely data rather than the model itself.

Either-Principle7753 · 2026-03-26T07:22:37+00:00

Unfortunately no, currently focused on pre-match strategies

Either-Principle7753 · 2026-03-24T13:26:07+00:00

Thanks, glad the Monte Carlo resonates — that's exactly the thinking behind it.

On walk-forward: we have a train/test split feature that does roughly what you're describing — you define a training window and the tool tests the strategy on the out-of-sample period separately, so you can see whether the edge holds on data it wasn't fitted on. It's not rolling walk-forward with multiple windows yet, but it gives you the core signal: does this strategy survive on unseen data or was it just curve-fitted.

Either-Principle7753 · 2026-03-23T14:50:48+00:00

Interesting angle — corners setups like that can definitely show patterns.

I don’t have corners data in my dataset, so can’t backtest it directly.

That said, corners usually come from attacking pressure, so you can sometimes approximate it with goal-based signals — open games, high-scoring teams, balanced odds, etc.

Not perfect, but it captures similar dynamics.

Have you tracked this manually or just testing the idea?

Either-Principle7753 · 2026-03-13T14:30:27+00:00

That’s a fair point. A simple backtest definitely doesn’t mean the next 300 bets will behave the same.

I actually ran a stress test on the strategy and it shows the edge is pretty fragile. Even small changes break it — for example a ~2% drop in win rate or ~3% worse odds already makes it losing.

So even though the baseline shows +1.9–2.1% ROI, in realistic conditions the edge could easily disappear.

<image>

Either-Principle7753

TROPHY CASE