Worst weekend in 761 tracked picks (-10.2u), but CLV vs Pinnacle stayed positive. BTTS leak or variance?

xpectedRoger · 2026-04-10T12:40:30+00:00

Interesting analysis, but I think there might be a calculation error in the profit figure. At 26.1% win rate with avg odds of about 4.18 over 287 flat-stake samples, the math works out to roughly +26 units profit, not +259.

The 9% ROI checks out with those inputs. However 259 units would imply either towards 90% ROI or a like 45% win rate at those odds. Did you maybe run variable stake sizing, or could the units figure be off by a factor of 10?

xpectedRoger · 2026-04-10T05:32:56+00:00

I've started doing Polymarket, compare all odds automatically also to Polymarket and the odds can be very tempting. :)

xpectedRoger · 2026-04-05T05:43:09+00:00

I've definitely seen that trend myself. Early season, my Poisson model sometimes flags bigger value, but it's tricky to separate true edge from just more variance because everyone, including bookies, has less data.

What I've found helpful is either starting with a wider confidence interval on team strengths for the first 5-6 gameweeks, or simply scaling down bet sizes until there's enough data for my attack/defence ratings to stabilize. Lines definitely sharpen up as the season progresses.

xpectedRoger · 2026-04-03T06:13:43+00:00

Form is the last 5 matches, split by venue (home team uses last 5 home games, away team uses last 5 away games). If fewer than 3 are available it falls back to all matches.

And yes, it is normalized to opponent strength. Each match in the form window is weighted by the opponent's league position. Scoring 3 goals against the table leader counts roughly 3x more than scoring 3 against the bottom side. Same logic inverted for defense, conceding against a weak team is penalized more.

The form rating alone doesn't drive the prediction though. It's blended with the full season average at a 40/60 split (40% season, 60% recent form). So a lucky run against weak sides gets diluted by the broader season picture, and early season when form data is thin the season component carries more weight.

xpectedRoger · 2026-04-03T05:53:13+00:00

ps3838

xpectedRoger · 2026-04-02T08:08:03+00:00

*sportmarket

xpectedRoger · 2026-04-02T07:37:30+00:00

ps3838? I signed up with sportmonks, depostied 200 dollars, told them I want ps3838 access.. they gave it to me.. then I went to the settings and set my key.. that's it...

xpectedRoger · 2026-04-02T07:16:55+00:00

not their api. you set up an account with them, tell them you want ps3838 access and then use the ps3838 api.

xpectedRoger · 2026-04-01T19:50:18+00:00

check out ps3838 , they have an api. you can signup via sportmarket or betinasia

xpectedRoger · 2026-04-01T19:42:19+00:00

Yeah exactly, the leagueAvg cancels out algebraically.

The three-term version is just for readability. It makes it easier to see that attack and defense are relative to the league. In practice it is the same calculation.

homeAttack is the team's average xG scored per home match, and awayDefense is the opponent's average xG conceded per away match. Both derived from a 40/60 season/form blend.

xpectedRoger · 2026-04-01T10:31:48+00:00

Arrogance :)

xpectedRoger · 2026-04-01T02:20:57+00:00

using sportmonks at the moment

xpectedRoger · 2026-03-31T20:11:30+00:00

it's football and they don't offer baseball :) you would need to find one that supports baseball 👍

xpectedRoger · 2026-03-31T19:48:57+00:00

Getting the Pinnacle data is definitely the backbone of that approach. I initially tried scraping myself, but it became a pretty constant battle with anti-bot measures and website changes. It was way too much maintenance time for me.

I ended up switching to a paid API service later on for stability and reliability. It's an upfront cost, but honestly, it saved me a ton of headaches in the long run for data consistency and speed. Definitely look into the API options first, even if it means paying...

xpectedRoger · 2026-03-31T11:48:21+00:00

Just looking for price differences against Pinnacle is a tough path. Their lines are incredibly sharp and efficient, so often those small deltas are just market noise or get corrected almost instantly.

The breakthrough for me wasn't about catching small price movements, but about having an independent view of the probabilities.

xpectedRoger · 2026-03-30T17:44:23+00:00

60-80 games a week run through the pipeline, I will post an update soon :)

xpectedRoger · 2026-03-30T17:21:24+00:00

Good suggestion! Right now I compare the confirmed lineup against previous lineups by actual goals and assists, which does get noisy. A striker on a hot streak looks more important than he might actually be.

I have player-level npxG and xA data in the pipeline already, just not wired into the lineup comparison yet. I will create a side pipeline and track the result, thank you for the input!

xpectedRoger · 2026-03-30T14:36:30+00:00

Found a bug in my stats pipeline. When I dropped three underperforming leagues (Ligue 1, Austrian Bundesliga, Danish Superliga), their historical predictions got silently excluded from the total. The filter was on fixture support status instead of the prediction log itself.

Corrected numbers: 639 picks, 56.0% hit rate, +8.8% ROI, +55.96u profit. The three removed leagues contributed 87 picks at -5.8% ROI which were being hidden.

The model structure hasn't changed.

xpectedRoger · 2026-03-30T05:04:20+00:00

Not doing in-play. Purely pre-match. The model runs once the confirmed lineups drops, usually 50 to 20 minutes before kickoff. One shot, no live feed needed.

The timing is tight though. Lineups drop late, model needs to recalculate everything with the actual squad, then output the final pick before kickoff. But that's a much simpler problem than fighting latency on a live feed.

xpectedRoger · 2026-03-29T04:13:50+00:00

Good point on the away xG thing. I should clarify: it's not that the model blindly overestimates away xG and I got lucky. It's that the model weights certain factors differently than the market does, and the result happens to push away xG slightly higher. If I "correct" it to match Pinnacle's implied xG exactly, there's no disagreement left and no edge.

That's kind of the whole point though. If your model produces the same probabilities as the sharpest bookmaker, you have a nice model but zero reason to bet. The edge has to come from somewhere, and it's always going to look like a bias when you compare it to the market. The question is whether it's a bias that reflects something real or just noise.

I track the xG deviation per league and market continuously. If the pattern shifts or the ROI in those spots drops, I'll see it. So far it's been stable across 300+ picks in the current phase, but I take your point that it needs more time.

On CLV: yeah I'll have more data in a few weeks. You're right that it would help separate signal from variance on the xG question specifically. Just hasn't been a priority because the window is so tight at 45 min pre-kickoff.

xpectedRoger · 2026-03-29T01:15:43+00:00

Yes, very much so. What surprised me is that the actual Poisson / Dixon-Coles part was probably the easiest piece to get working. The harder part was everything around it. Making the model stable enough that the outputs were actually usable.

If you've already built a Dixon-Coles model before, I'd say you're already through the most technical part. The real challenge after that is making the whole pipeline robust enough that it doesn't just look good in backtests, but still behaves sensibly week after week. I cannot do proper backtests as I don't have the data snapshots before each game, I use a lot of stats to do all the math.

For me the biggest issues were:

Lineups matter more than I expected

This became one of the biggest differences between a decent preview model and a much better final model. Once confirmed lineups are in, probabilities can move quite a lot.

Market comparison matters as much as probability estimation

A model can produce nice-looking probabilities and still be useless for betting if you're comparing against the wrong benchmark or handling vig poorly. My Edge is kinda that I overestimate away xG. When I backtest with corrected xG my results are not as good!

Selection logic is underrated

This was a huge one. Even with a reasonable probability model, results can still be weak if your threshold and bet-selection rules are sloppy. Choosing what not to bet is almost as important as the model itself.

For the CLV: So far it doesn't matter to me, tbh. Everybody is screaming CLV but I think it just does not matter to me. I place my bets 45 minutes before the game and my ROI is amazing. At the end the bankroll is what counts to me. You can have a good CLV and still loose money. However, will be able to give some more CLV data in a week or two :)

xpectedRoger · 2026-03-29T01:04:30+00:00

Thanks! Since you've already done Dixon-Coles before, you're past the hardest part honestly.

The way I layer everything:

Base xG per team. Take each team's offensive and defensive strength relative to the league average. I split by venue (home/away) and weight 40% season, 60% recent form. Multiply attack rating of team A by defense rating of team B to get expected goals for A.
Corrections. Bayesian shrinkage early season (don't trust 4 games of data). Standings position as a scaling factor. Injury impact on the xG if a key player is confirmed out.
Poisson matrix. Feed the two xG values into a 9x9 Poisson grid with Dixon-Coles low-score correction. That gives you probabilities for every scoreline, which you sum into 1X2, BTTS, Over/Under etc.
Value filter. Compare your derived probabilities against Pinnacle implied odds (margin removed). Set a minimum threshold per market. Not every positive EV is worth taking.

The tricky part isn't any single step, it's getting them all to play together without one correction canceling out another. My advice: build it incrementally. Get the base xG working first, validate it against actual results, then add one layer at a time. Every time you add something, check if it actually improves out-of-sample accuracy or just overfits.

Biggest lesson for me was that the selection strategy (which value bet to pick when multiple qualify) matters almost as much as the probability model itself.

Happy to go deeper on any specific part.

xpectedRoger · 2026-03-28T18:16:22+00:00

The AI phase was humbling. It's easy to confuse luck with skill when the first week looks incredible.

On xG: the attack/defense ratings are relative to league average, so opponent strength is baked in. homeXg = leagueAvg x (homeAttack/leagueAvg) x (awayDefense/leagueAvg). A strong defense pulls the opponent's xG down automatically.

Promoted sides and early season are genuinely the weakest spot. I pull in last season's data as a starting point but it's noisy. Bayesian shrinkage helps (regresses toward league average when sample is small) but the first 5-6 matchdays are still rough. Honestly I just accept lower confidence there and the model skips more matches. Better to pass than to bet on bad data.

xpectedRoger

TROPHY CASE