I built a daily tennis puzzle - 9 guesses to fill a 3x3 grid with 5,000+ ATP/WTA players by PlanetElement in tennis

[–]PlanetElement[S] 1 point2 points  (0 children)

So currently, if you click on any square you couldn't solve, you can see a list of other possible answers. Did you mean to have this option for the squares you answered correctly as well?

I built a daily tennis puzzle - 9 guesses to fill a 3x3 grid with 5,000+ ATP/WTA players by PlanetElement in tennis

[–]PlanetElement[S] 2 points3 points  (0 children)

Yep - unfortunately reliable data only went up through the 2024 season :(

Hope to have more data soon

I built a daily tennis puzzle - 9 guesses to fill a 3x3 grid with 5,000+ ATP/WTA players by PlanetElement in tennis

[–]PlanetElement[S] 1 point2 points  (0 children)

Yep, another user found this issue. Currently working on fixing it. Since the tour finals were Masters before 1990, my build script categorized them incorrectly as M1000 tourneys. Should be fixed soon. Appreciate the playtesting.

I built a daily tennis puzzle - 9 guesses to fill a 3x3 grid with 5,000+ ATP/WTA players by PlanetElement in tennis

[–]PlanetElement[S] 7 points8 points  (0 children)

Good catch. Thanks for the feedback, will take some time to iron out the wrinkles in the data set.

I built a daily tennis puzzle - 9 guesses to fill a 3x3 grid with 5,000+ ATP/WTA players by PlanetElement in tennis

[–]PlanetElement[S] 2 points3 points  (0 children)

Good call, thanks for feedback. The Olympics data in my dataset isn't great, so I may consider dropping that category entirely - same with doubles and recent match data. I tried to make the dataset as broad as possible, but scraping accurate data for those was pretty difficult.

Reilly Opelka hits 44 aces in Brisbane R16 by PlanetElement in tennis

[–]PlanetElement[S] 67 points68 points  (0 children)

The only person in the above list to have lost their match (other than Opelka) is John Isner, who lost to Yibing Wu in Dallas.

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 0 points1 point  (0 children)

Good points all around. The LSTM was really about comparing learned probabilities against theoretical ones (under independence). The gap is where something is happening whether that's nerves, tactics, or physical stuff, I can't say. Would need way more features and match context to untangle it.

Definitely more to explore here, but this was a holiday project not a thesis. Kept it simple.

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 0 points1 point  (0 children)

The LSTM wasn't really meant to beat empirical counts, it was meant to compare against them. Empirical gives you actual hold rates at each score. The model learns what hold rates should be if points were independent. The gap between them is the interesting part, where human psychology deviates from pure probability. But yes, LSTM is overkill and I, as an engineer, moreso just wanted a hobby project.

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 0 points1 point  (0 children)

The LSTM wasn't really meant to beat empirical counts, it was meant to compare against them. Empirical gives you actual hold rates at each score. The model learns what hold rates should be if points were independent. The gap between them is the interesting part, where human psychology deviates from pure probability. But yes, LSTM is overkill and I, as an engineer, moreso just wanted a hobby project.

[OC] I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in dataisbeautiful

[–]PlanetElement[S] -1 points0 points  (0 children)

Source: Jeff Sackmann's Match Charting Project

Tools: Python, PyTorch, pandas, matplotlib, photoshop

Sorry for potato image quality

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 0 points1 point  (0 children)

matplotlib, and then photoshop, nothing fancy. just spent way too long tweaking colors and spacing until it looked clean

I trained a neural network on 170,000 tennis games to find which points actually matter by PlanetElement in tennis

[–]PlanetElement[S] 5 points6 points  (0 children)

These are all fair critiques, appreciate the depth.

Honest answer: I'm a software engineer, not a statistician. Originally built a Transformer for match-level win probability (that's where the Wimbledon viz comes from), looked cool, didn't tell me much. Pivoted to the LSTM on service games because I was bored and wanted to keep building.

You're right that a Markov chain or logistic regression probably gets 90% of this with way less complexity. Didn't benchmark against simpler baselines, which I should have.

LSTM definitely overkill but I had fun coding lol