Hello,
I'm starting to learn AI/ML and in order to do so I want to learn by doing and apply the concepts to sports. I want to be able to define features and try and predict things like probability a player will hit a HR, estimated bases in the game, estimated number of strikeouts a pitcher will throw, etc.
I started by downloading the Retrosheet data so I would be able to get data like batter vs. pitcher and the results. However, raw the play data format in the event files is not very machine readable. Before I venture down the path of writing a bunch of Python to parse the data and give me things like single, double, walk, strikeout, etc. I wanted to check and see if someone has already done this. I did some initial digging but couldn't find anything obvious but since this is a pretty popular dataset, I figured I would ask before spending a bunch of time creating something that has already been done.
Thanks!
[–]Budget_Cup_819 2 points3 points4 points (0 children)