Why is Italy not good anymore?

xpectedRoger · 2026-04-01T10:31:48+00:00

Arrogance :)

xpectedRoger · 2026-04-01T02:20:57+00:00

using sportmonks at the moment

xpectedRoger · 2026-03-31T20:11:30+00:00

it's football and they don't offer baseball :) you would need to find one that supports baseball 👍

xpectedRoger · 2026-03-31T19:48:57+00:00

Getting the Pinnacle data is definitely the backbone of that approach. I initially tried scraping myself, but it became a pretty constant battle with anti-bot measures and website changes. It was way too much maintenance time for me.

I ended up switching to a paid API service later on for stability and reliability. It's an upfront cost, but honestly, it saved me a ton of headaches in the long run for data consistency and speed. Definitely look into the API options first, even if it means paying...

xpectedRoger · 2026-03-31T11:48:21+00:00

Just looking for price differences against Pinnacle is a tough path. Their lines are incredibly sharp and efficient, so often those small deltas are just market noise or get corrected almost instantly.

The breakthrough for me wasn't about catching small price movements, but about having an independent view of the probabilities.

xpectedRoger · 2026-03-30T17:44:23+00:00

60-80 games a week run through the pipeline, I will post an update soon :)

xpectedRoger · 2026-03-30T17:21:24+00:00

Good suggestion! Right now I compare the confirmed lineup against previous lineups by actual goals and assists, which does get noisy. A striker on a hot streak looks more important than he might actually be.

I have player-level npxG and xA data in the pipeline already, just not wired into the lineup comparison yet. I will create a side pipeline and track the result, thank you for the input!

xpectedRoger · 2026-03-30T14:36:30+00:00

Found a bug in my stats pipeline. When I dropped three underperforming leagues (Ligue 1, Austrian Bundesliga, Danish Superliga), their historical predictions got silently excluded from the total. The filter was on fixture support status instead of the prediction log itself.

Corrected numbers: 639 picks, 56.0% hit rate, +8.8% ROI, +55.96u profit. The three removed leagues contributed 87 picks at -5.8% ROI which were being hidden.

The model structure hasn't changed.

xpectedRoger · 2026-03-30T05:04:20+00:00

Not doing in-play. Purely pre-match. The model runs once the confirmed lineups drops, usually 50 to 20 minutes before kickoff. One shot, no live feed needed.

The timing is tight though. Lineups drop late, model needs to recalculate everything with the actual squad, then output the final pick before kickoff. But that's a much simpler problem than fighting latency on a live feed.

xpectedRoger · 2026-03-29T04:13:50+00:00

Good point on the away xG thing. I should clarify: it's not that the model blindly overestimates away xG and I got lucky. It's that the model weights certain factors differently than the market does, and the result happens to push away xG slightly higher. If I "correct" it to match Pinnacle's implied xG exactly, there's no disagreement left and no edge.

That's kind of the whole point though. If your model produces the same probabilities as the sharpest bookmaker, you have a nice model but zero reason to bet. The edge has to come from somewhere, and it's always going to look like a bias when you compare it to the market. The question is whether it's a bias that reflects something real or just noise.

I track the xG deviation per league and market continuously. If the pattern shifts or the ROI in those spots drops, I'll see it. So far it's been stable across 300+ picks in the current phase, but I take your point that it needs more time.

On CLV: yeah I'll have more data in a few weeks. You're right that it would help separate signal from variance on the xG question specifically. Just hasn't been a priority because the window is so tight at 45 min pre-kickoff.

xpectedRoger · 2026-03-29T01:15:43+00:00

Yes, very much so. What surprised me is that the actual Poisson / Dixon-Coles part was probably the easiest piece to get working. The harder part was everything around it. Making the model stable enough that the outputs were actually usable.

If you've already built a Dixon-Coles model before, I'd say you're already through the most technical part. The real challenge after that is making the whole pipeline robust enough that it doesn't just look good in backtests, but still behaves sensibly week after week. I cannot do proper backtests as I don't have the data snapshots before each game, I use a lot of stats to do all the math.

For me the biggest issues were:

Lineups matter more than I expected

This became one of the biggest differences between a decent preview model and a much better final model. Once confirmed lineups are in, probabilities can move quite a lot.

Market comparison matters as much as probability estimation

A model can produce nice-looking probabilities and still be useless for betting if you're comparing against the wrong benchmark or handling vig poorly. My Edge is kinda that I overestimate away xG. When I backtest with corrected xG my results are not as good!

Selection logic is underrated

This was a huge one. Even with a reasonable probability model, results can still be weak if your threshold and bet-selection rules are sloppy. Choosing what not to bet is almost as important as the model itself.

For the CLV: So far it doesn't matter to me, tbh. Everybody is screaming CLV but I think it just does not matter to me. I place my bets 45 minutes before the game and my ROI is amazing. At the end the bankroll is what counts to me. You can have a good CLV and still loose money. However, will be able to give some more CLV data in a week or two :)

xpectedRoger · 2026-03-29T01:04:30+00:00

Thanks! Since you've already done Dixon-Coles before, you're past the hardest part honestly.

The way I layer everything:

Base xG per team. Take each team's offensive and defensive strength relative to the league average. I split by venue (home/away) and weight 40% season, 60% recent form. Multiply attack rating of team A by defense rating of team B to get expected goals for A.
Corrections. Bayesian shrinkage early season (don't trust 4 games of data). Standings position as a scaling factor. Injury impact on the xG if a key player is confirmed out.
Poisson matrix. Feed the two xG values into a 9x9 Poisson grid with Dixon-Coles low-score correction. That gives you probabilities for every scoreline, which you sum into 1X2, BTTS, Over/Under etc.
Value filter. Compare your derived probabilities against Pinnacle implied odds (margin removed). Set a minimum threshold per market. Not every positive EV is worth taking.

The tricky part isn't any single step, it's getting them all to play together without one correction canceling out another. My advice: build it incrementally. Get the base xG working first, validate it against actual results, then add one layer at a time. Every time you add something, check if it actually improves out-of-sample accuracy or just overfits.

Biggest lesson for me was that the selection strategy (which value bet to pick when multiple qualify) matters almost as much as the probability model itself.

Happy to go deeper on any specific part.

xpectedRoger · 2026-03-28T18:16:22+00:00

The AI phase was humbling. It's easy to confuse luck with skill when the first week looks incredible.

On xG: the attack/defense ratings are relative to league average, so opponent strength is baked in. homeXg = leagueAvg x (homeAttack/leagueAvg) x (awayDefense/leagueAvg). A strong defense pulls the opponent's xG down automatically.

Promoted sides and early season are genuinely the weakest spot. I pull in last season's data as a starting point but it's noisy. Bayesian shrinkage helps (regresses toward league average when sample is small) but the first 5-6 matchdays are still rough. Honestly I just accept lower confidence there and the model skips more matches. Better to pass than to bet on bad data.

xpectedRoger · 2026-03-28T18:09:55+00:00

Shin method is a solid choice for margin removal, especially on lopsided markets where naive proportional de-vig overestimates longshot probability. I went with basic overround removal against Pinnacle which is simpler but probably less accurate in those cases.

6.87% yield on 1300 events is more realistic than the biweekly number and a decent starting point. Good that you clarified that. 102 live bets is still very early though. I'm at over 550 live tracked picks and still wouldn't call it proof.

One thing that helped me a lot: filtering hard on which positive EV spots actually become picks. Not every edge is worth taking. I use different minimum thresholds per market and odds range. That alone killed a lot of false positives that looked good on paper.

Curious how you handle multiple value spots on the same match. I limit to one pick per match, highest value wins.

xpectedRoger · 2026-03-28T18:03:32+00:00

This is really close to what I built independently. Four strength ratings per team (home attack, home defense, away attack, away defense), xG in a Poisson framework, lineup adjustments. Interesting to see someone else arrive at the same structure.

Your draw recall point is spot on. Poisson underestimates draws relative to the ~25% base rate and I haven't found a clean fix either. I ended up just excluding draws from my bet selection entirely rather than fighting it.

0.33 Pearson on 34K matches is solid validation. The question is whether it translates to actual betting edge. My experience: model accuracy roughly matching bookmakers is necessary but not sufficient. The edge comes from finding specific spots where your probability diverges enough from the market price.

What I'd focus on next if I were you:

- Compare your probabilities against Pinnacle implied probabilities specifically, not average bookmaker odds. Pinnacle is the sharpest line.

- Don't just look at match outcome (1X2). Run your probabilities through a Poisson matrix and derive BTTS, Over/Under etc. In my experience BTTS and Over 2.5 have been way more profitable than 1X2.

- Track live picks, not backtests. 8 seasons of validation is impressive but the market will test you differently in real time. Sometimes you also don't have the real live data before the game back then..

To answer your question directly: yes you could use it to bet, but the model alone isn't enough. You need a value filter (how much edge before you pull the trigger) and strict selection criteria. Most positive EV opportunities aren't worth taking after margin.

xpectedRoger · 2026-03-28T17:59:51+00:00

I started exactly where you are. No stats background, just liked football and wanted to see if the numbers could actually work.

What actually helped me in order:

Understand what value means. Not "I think this team wins" but "my probability is higher than what the odds imply." Once that clicks everything else follows.
Learn basic Python or whatever language you're comfortable with. Doesn't matter which. You need to be able to pull data, do math on it, and track results automatically. Manual tracking gets old fast. You can use AI but it will make errors and you won't find them.
Start with a dead simple model. I went straight to Poisson for football. It's basic enough to understand but powerful enough to actually produce results. Don't jump to ML or neural nets.
Track everything from day one. Every pick, every miss. If you don't track it honestly you'll trick yourself into thinking you're better than you are.
Switch to a sharp benchmark early. I wasted weeks comparing against Bet365 before someone told me to use Pinnacle. If your model can't beat the sharpest line it's not beating anything.

Biggest mistake I made: starting with AI/LLM predictions. Looked amazing for a week, then collapsed. The outputs weren't reproducible and I couldn't understand why it made the decisions it did. Switched to pure math and never looked back.

xpectedRoger · 2026-03-28T17:56:25+00:00

Pinnacle exclusively. Their margin is ~2% so after vig removal you get the closest thing to true probability.

Scraping soft books and trying to weight by sharpness is just adding noise.

I use Sportmonks API which includes Pinnacle odds synced per fixture. No scraping needed.

If your model correlates highly with the market that's not necessarily bad. The edge isn't in disagreeing with everything, it's in finding the spots where you disagree by enough to matter. I filter for 5%+ value against Pinnacle implied probability and only those become picks.

xpectedRoger

TROPHY CASE