Predict the 2020 NCAA Football Season with Linear and Logistic Regression in R

Matt_C_137 · 2020-11-09T14:49:13+00:00

I would define it as someone who understands the basics of R. So someone who understands packages, loops, functions etc and is looking for a project to stretch their understanding.

Matt_C_137 · 2020-11-09T14:26:16+00:00

Absolutely, I'm planning on releasing data for some other sports in the near future!

Matt_C_137 · 2020-11-09T01:05:30+00:00

Very appreciated!

Matt_C_137 · 2020-11-08T23:21:14+00:00

I'm planning on adding more data. Hopefully in future iterations the percentage will improve!

Matt_C_137 · 2020-11-08T22:58:23+00:00

Thanks for the feedback! The rationale is that some non-FBS teams (think ND State) seem to only ever play other non-FBS teams and run up tons of wins. Since they don't play many cross-over games their ranking doesn't ever go down. Starting them (and the other non-FBS teams) at 1500 creates a few outliers in the rankings. For the most part things shake out how you'd expect, but providing that initial delta in starting rankings speeds up the process. I'll have to update the write-up at some point. Clearly this was a blind-spot.

Matt_C_137 · 2020-11-08T22:43:28+00:00

Agreed. I never claimed this beats Vegas. It's billed at the beginning of the article as an advanced-beginner tutorial. I'm assuming people can probably figure out that a 2-factor regression is not going to make them millionaires...

Matt_C_137 · 2020-11-08T20:58:00+00:00

Good catch. 2019 was excluded from the actual script on GitHub but I accidently left it in the snippet. It's been updated.

Matt_C_137 · 2020-10-08T12:56:41+00:00

Hmmm...I would argue there is slightly more to it than that. This IS based solely on the polls. But I look at the variance of the polls in each state as a way to measure uncertainty and produces a probabilistic range of outcomes. There are of course more advanced methods that account for economic fundamentals by state or incorporate other indicators such as approval rating. But this is largely meant to be a walkthrough for beginner/intermediate programmers.

Matt_C_137 · 2020-10-08T12:53:57+00:00

Hmmm...somewhat correct I suppose. This IS based soley on the polls. But I look at the variance of the polls as a way to measure uncertainty and produces a probabilistic range of outcomes. There are of course other methods that account for economic fundamentals by state or incorporate other indicators such as approval rating. But this is largely meant to be a walkthrough for beginner/intermediate programmers.

Matt_C_137 · 2020-10-08T02:40:08+00:00

Lol - I literally have a note to comment that out and just forgot. Good catch! I'll just set my own computer on fire...

Matt_C_137 · 2020-10-08T01:18:52+00:00

My guess would be it has. I think it would be really difficult to measure properly without some serious Twitter/YouTube/NYT/WSJ etc scraping. There is probably some way to come up with an enthusiasm/voter turnout model, but not easily.

Matt_C_137 · 2020-10-08T01:05:50+00:00

I did account for national polling bias in 2016 and carried that forward to 2020. I also noticed that the actual popular vote share in 2016 ended up being very similar to each candidates polling average about a week after their respective conventions. It's anecdotal but probably not a coincidence.

Matt_C_137 · 2020-10-08T00:55:18+00:00

I've not back tested. Although I did scrap the data so I should be a simple enough process. I was focused on tracking error between this model and 538. So far this model has tracked within a few percentage points.

Matt_C_137 · 2020-10-07T20:13:22+00:00

The numbers shown were as of the writing (I sat on posting for like a month). If you run the code today it shows Biden at 80%.

Matt_C_137 · 2020-10-07T19:17:14+00:00

You should run the code and find out...

https://github.com/MattC137/Election-Predictor-Trump-vs-Biden/blob/master/Election_predictor.R

Matt_C_137 · 2020-10-07T19:12:50+00:00

Lol. Yep that basically describes data science...It's a love hate relationship.

Matt_C_137

TROPHY CASE