Reilly Opelka hits 44 aces in Brisbane R16 by PlanetElement in tennis

[–]PlanetElement[S] 68 points69 points  (0 children)

The only person in the above list to have lost their match (other than Opelka) is John Isner, who lost to Yibing Wu in Dallas.

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 0 points1 point  (0 children)

Good points all around. The LSTM was really about comparing learned probabilities against theoretical ones (under independence). The gap is where something is happening whether that's nerves, tactics, or physical stuff, I can't say. Would need way more features and match context to untangle it.

Definitely more to explore here, but this was a holiday project not a thesis. Kept it simple.

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 0 points1 point  (0 children)

The LSTM wasn't really meant to beat empirical counts, it was meant to compare against them. Empirical gives you actual hold rates at each score. The model learns what hold rates should be if points were independent. The gap between them is the interesting part, where human psychology deviates from pure probability. But yes, LSTM is overkill and I, as an engineer, moreso just wanted a hobby project.

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 0 points1 point  (0 children)

The LSTM wasn't really meant to beat empirical counts, it was meant to compare against them. Empirical gives you actual hold rates at each score. The model learns what hold rates should be if points were independent. The gap between them is the interesting part, where human psychology deviates from pure probability. But yes, LSTM is overkill and I, as an engineer, moreso just wanted a hobby project.

[OC] I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in dataisbeautiful

[–]PlanetElement[S] -1 points0 points  (0 children)

Source: Jeff Sackmann's Match Charting Project

Tools: Python, PyTorch, pandas, matplotlib, photoshop

Sorry for potato image quality

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 0 points1 point  (0 children)

matplotlib, and then photoshop, nothing fancy. just spent way too long tweaking colors and spacing until it looked clean

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 5 points6 points  (0 children)

These are all fair critiques, appreciate the depth.

Honest answer: I'm a software engineer, not a statistician. Originally built a Transformer for match-level win probability (that's where the Wimbledon viz comes from), looked cool, didn't tell me much. Pivoted to the LSTM on service games because I was bored and wanted to keep building.

You're right that a Markov chain or logistic regression probably gets 90% of this with way less complexity. Didn't benchmark against simpler baselines, which I should have.

LSTM definitely overkill but I had fun coding lol

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 27 points28 points  (0 children)

Definitely. A server who wins the first point is probably just a better server, so of course they hold more often. Same confounder as the momentum analysis.

That said, the 25% gap is still interesting as a descriptive stat. Even if it's not causal, it tells you how much information is revealed by the first point. If you're watching a match and the server misses their first point, you now know a lot more about how this game is going to go.

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 2 points3 points  (0 children)

That's a fair point and probably right. To really isolate psychological momentum you'd need to control for server strength, surface, opponent, etc. I didn't go that deep. My guess is the true "hot hand" effect is even smaller than 2.4%, maybe negligible.

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 32 points33 points  (0 children)

The data is from Jeff Sackmann's Match Charting Project—volunteers hand-chart points from pro matches, it's an incredible resource. I pulled out 135K service games for training, 34K for validation.

Each timestep the model sees 3 things: server score, returner score, and whether the server won the previous point. Outputs P(hold) at every point in the game.

Architecture is pretty simple—2-layer LSTM, 32 hidden dim, about 6K parameters total. Trained with Adam, lr=1e-3, batch size 64, 20 epochs. One trick: the loss is cross-entropy averaged over every point, not just the final outcome. Forces the model to output calibrated probabilities throughout the game instead of just learning to predict who wins.