Playoffs and Probabilities: Predicting the College Football Bowl Games with DataRobot

This blog provides a unique take on using machine learning to predict the college football bowl games.

 

With the NCAA college football post-season in full swing, many die-hard fans are either preparing to root for their alma mater or have already vested their energy towards cheering for them. With many big games still on the table (e.g., Alabama vs. Michigan, Georgia vs. Baylor, and of course LSU vs. Clemson), many wonder who will end up taking a bowl win back home. 

In today’s world where analytics are becoming ubiquitous in sports, one can consider what the numbers say in terms of whether their team will be victorious, either for solidarity purposes or to compete with their friends in a fun betting pool. One such measure that people can use is ESPN’s Football Power Index (FPI), which is a measure of a team’s strength compared to the “average” team. Using this metric, ESPN releases win probabilities for various match-ups throughout the regular and post-season. Many people, including myself, enjoy looking at these to get a better understanding of who’s likely to win from a statistical perspective.  

Having been interested in blending machine learning and college football in the past, I was curious to see how similar ESPN’s prediction would be to one generated from a machine learning model. After collecting some data around the Vegas betting lines, team season statistics (wins, losses, point differentials, etc.), and home-field advantage for almost 10,000 games dating back to 2007, I leveraged DataRobot and its automated machine learning capabilities to determine the likelihood that the favorite team (according to Vegas) would win for some of the remaining bowl games.

With the help of DataRobot’s R API to generate hundreds of candidate models, I chose the most accurate model (pictured above) and made predictions. The table below presents the match-up and the corresponding predictions ranked from most likely to win to least likely based on DataRobot.

Favorite

DataRobot

ESPN FPI

Underdog

ULLafayette

0.843

0.811

MiamiOhio

Cincinnati

0.698

0.663

BostonCollege

Ohio

0.693

0.705

Nevada

#12 Auburn

0.692

0.721

#18 Minnesota

#13 Alabama

0.683

0.693

#14 Michigan

Tulane

0.682

0.632

SouthernMiss

#5 Georgia

0.677

0.710

#7 Baylor

#1 LSU

0.639

0.445

#3 Clemson

#8 Wisconsin

0.567

0.461

#6 Oregon

Tennessee

0.541

0.522

Indiana

Taking a look at the predictions from both ESPN and DataRobot, they largely agree: the Louisiana Ragin’ Cajuns should win big and the Indiana Hoosiers should put up quite a fight for the Tennessee Volunteers. However, we can see two games where ESPN and DataRobot predict opposite outcomes: LSU vs. Clemson and Wisconsin vs. Oregon. According to ESPN, Clemson and Oregon have a better shot at clinching their respective bowl games. Given Clemson’s impressive come-from-behind win over Ohio State and Oregon’s dominant win over Utah in the Pac-12 Championship game, this seems reasonable. 

But what drove DataRobot to respond differently? Thanks to DataRobot interpretability suite, we can actually understand what the model values most when making a prediction. Below is a plot of feature impact, which tells us which variables are most important. 

The first five factors all revolve around Vegas information (opening, current, and money lines), followed by who the team is, how well they play at home, and point differential throughout the season. According to the initial estimates from Vegas, LSU is a six point favorite while Wisconsin is a three point favorite. DataRobot uses this insight along with the other information to produce a win probability… one that’s different from ESPN’s. 

Something to keep in mind is that there’s a difference between who should win and who will actually win. Due to all the uncertainty that can happen within a game, even teams like UL Lafayette could be upset, despite having >80% chance of winning. I know I’ll be tuning into these games, especially to watch Alabama beat Michigan (Roll Tide!).

 

New call-to-action

 

About the Author:

Taylor Larkin is a data scientist at DataRobot. Based out of Atlanta, he's currently responsible for executing data science projects as well as enabling customers to do data science work. He has worked on machine learning projects and research articles in a variety of realms including geomagnetic storm prediction, healthcare, renewable energy, sports analytics, and wine preference. Prior to joining DataRobot, Taylor graduated from The University of Alabama with a PhD in Business Analytics and an MS in Applied Statistics.