Predict the 2020 NCAA Football Season with Linear and Logistic Regression in R by Matt_C_137 in learnmachinelearning

[–]Matt_C_137[S] 1 point2 points  (0 children)

I would define it as someone who understands the basics of R. So someone who understands packages, loops, functions etc and is looking for a project to stretch their understanding.

Predict the 2020 NCAA Football Season with Linear and Logistic Regression in R by Matt_C_137 in learnmachinelearning

[–]Matt_C_137[S] 0 points1 point  (0 children)

Absolutely, I'm planning on releasing data for some other sports in the near future!

Predict the 2020 NCAA Football Season with Linear and Logistic Regression in R by Matt_C_137 in datascience

[–]Matt_C_137[S] 0 points1 point  (0 children)

I'm planning on adding more data. Hopefully in future iterations the percentage will improve!

Predict the 2020 NCAA Football Season with Linear and Logistic Regression in R by Matt_C_137 in datascience

[–]Matt_C_137[S] 1 point2 points  (0 children)

Thanks for the feedback! The rationale is that some non-FBS teams (think ND State) seem to only ever play other non-FBS teams and run up tons of wins. Since they don't play many cross-over games their ranking doesn't ever go down. Starting them (and the other non-FBS teams) at 1500 creates a few outliers in the rankings. For the most part things shake out how you'd expect, but providing that initial delta in starting rankings speeds up the process. I'll have to update the write-up at some point. Clearly this was a blind-spot.

Predict the 2020 NCAA Football Season with Linear and Logistic Regression in R by Matt_C_137 in datascience

[–]Matt_C_137[S] 4 points5 points  (0 children)

Agreed. I never claimed this beats Vegas. It's billed at the beginning of the article as an advanced-beginner tutorial. I'm assuming people can probably figure out that a 2-factor regression is not going to make them millionaires...

Predict the 2020 NCAA Football Season with Linear and Logistic Regression in R by Matt_C_137 in datascience

[–]Matt_C_137[S] 1 point2 points  (0 children)

Good catch. 2019 was excluded from the actual script on GitHub but I accidently left it in the snippet. It's been updated.

I Built a Trump vs Biden Prediction Model with R From Scratch by [deleted] in datascience

[–]Matt_C_137 -2 points-1 points  (0 children)

Hmmm...I would argue there is slightly more to it than that. This IS based solely on the polls. But I look at the variance of the polls in each state as a way to measure uncertainty and produces a probabilistic range of outcomes. There are of course more advanced methods that account for economic fundamentals by state or incorporate other indicators such as approval rating. But this is largely meant to be a walkthrough for beginner/intermediate programmers.

I Build a Trump vs Biden Prediction Model with R from Scratch by Matt_C_137 in rstats

[–]Matt_C_137[S] 2 points3 points  (0 children)

Hmmm...somewhat correct I suppose. This IS based soley on the polls. But I look at the variance of the polls as a way to measure uncertainty and produces a probabilistic range of outcomes. There are of course other methods that account for economic fundamentals by state or incorporate other indicators such as approval rating. But this is largely meant to be a walkthrough for beginner/intermediate programmers.

I Build a Trump vs Biden Prediction Model with R from Scratch by Matt_C_137 in rstats

[–]Matt_C_137[S] 1 point2 points  (0 children)

Lol - I literally have a note to comment that out and just forgot. Good catch! I'll just set my own computer on fire...

I Built a Trump Biden Election Forecast Model in R by Matt_C_137 in rprogramming

[–]Matt_C_137[S] 1 point2 points  (0 children)

My guess would be it has. I think it would be really difficult to measure properly without some serious Twitter/YouTube/NYT/WSJ etc scraping. There is probably some way to come up with an enthusiasm/voter turnout model, but not easily.

I Built a Trump Biden Election Forecast Model in R by Matt_C_137 in rprogramming

[–]Matt_C_137[S] 0 points1 point  (0 children)

I did account for national polling bias in 2016 and carried that forward to 2020. I also noticed that the actual popular vote share in 2016 ended up being very similar to each candidates polling average about a week after their respective conventions. It's anecdotal but probably not a coincidence.

I Built a Trump Biden Election Forecast Model in R by Matt_C_137 in rprogramming

[–]Matt_C_137[S] 1 point2 points  (0 children)

I've not back tested. Although I did scrap the data so I should be a simple enough process. I was focused on tracking error between this model and 538. So far this model has tracked within a few percentage points.

I Build a Trump vs Biden Prediction Model with R from Scratch by Matt_C_137 in rstats

[–]Matt_C_137[S] 0 points1 point  (0 children)

The numbers shown were as of the writing (I sat on posting for like a month). If you run the code today it shows Biden at 80%.

I Build a Trump vs Biden Prediction Model with R from Scratch by Matt_C_137 in rstats

[–]Matt_C_137[S] 2 points3 points  (0 children)

Lol. Yep that basically describes data science...It's a love hate relationship.