The 2019 NCAA March Madness tournament has arrived! This is one of the most famous annual sports events in the United States, bringing together the best Division 1 men’s and women’s college basketball teams from 68 schools to compete against each other for the NCAA Champion title.
Perhaps the most exciting aspect of March Madness is being able to build your very own tournament bracket to see how well you can predict every game of the tournament. Whether you’re building a bracket for fun or competing for a prize, this ritual brings fans closer to the game, while also adding an element of risk and excitement as each match-up comes and goes. Families, friends, and coworkers gather to compete against each other while rooting for their top team to become the next NCAA Champion.
So, how do you build a bracket? Brackets are constructed based on team rankings, personal loyalties and, of course, data. This tournament was created in 1939 by the National Association of Basketball Coaches, which means that there’s plenty of data to dive into when constructing a bracket. Using this wealth of information, we can identify the teams most likely to win the tournament. For example, in the Men’s Tournament, Michigan State and Tennessee all have higher probabilities of winning as a No. 2 seed than North Carolina as a No. 1 seed.
Who Will Win in 2019?
I am a huge sports fan and have built my own March Madness brackets over the years. I built a historical dataset of team ratings, betting odds, and tournaments seeds and put it into the DataRobot platform to determine the best model for predicting tournament outcomes based on this information. I then used this model to rank the 68 competing teams by their likelihood to win the championship title, and I also built a full bracket for the top team.
Below is a table that ranks the top 16 men’s tournament teams starting with the team that’s most likely to win the 2019 NCAA Championship, which is Duke with a 24.5% chance of winning.
Below is a table that ranks the top 16 women’s tournament teams starting with the team that’s most likely to win the 2019 NCAA Championship, which is Baylor with a 28.3% chance of winning.
Data Collection and Prediction Methodology
Based on the understanding that more data can lead to better predictions, I collected data for my bracket from several different resources. I looked into the tournament seeds, the betting line for the games, Ken Pomeroy's ratings, regular season box scores, and previous tournament matchups and outcomes.
For the predictions, I used the regular season data to build my own power ratings for every team. I then used the historic tournament data to build a logistic regression model with four inputs:
- The difference in team’s seeds
- The betting line for the game
- The difference in the team’s ratings from Ken Pomeroy
- The difference in the team’s power ratings from my own analysis
For the men’s tournament, I constructed a full bracket using a Monte Carlo simulation based on the logistic regression model. During this process, I simulated 10,000 possible tournament outcomes and came up with Duke as the overall most likely winner.
For the women’s tournament, I constructed a full bracket using a Monte Carlo simulation based on the logistic regression model. During this process, I simulated 10,000 possible tournament outcomes and came up with Baylor as the overall most likely winner.
The Unlikely Upset
Upsets are hard to predict. For example, last year Virginia was the first-ever No. 1 seed to lose a first-round game. However, it would be unreasonable to predict that Virginia will lose again this year— there’s only a 1% chance of this happening again! Instead of picking upsets, we trusted the data and our model. Most statistical models for forecasting the tournament will also tend to predict few upsets.
Now that the initial rush of creating my bracket is over, I can sit back and watch the tournament unfold (face inches from the TV while biting my nails). Good luck to all the teams and brackets this year! (We’re looking at you, Duke and Baylor).
Interested in learning more about sports analytics and the NBA? Here are some additional resources:
- Download this on-demand webinar: Automated Machine Learning: A Game-Changer for Sports Analytics
- Read this blog: Using Automated Machine Learning to Predict NBA Player Performance
- Read this blog: 4 Insights in 4 Minutes about Kobe Bryant’s Career
About the Author:
Zachary Deane-Mayer is the Director of Data Science at DataRobot, where he runs the Core Modeling Team that’s responsible for all of DataRobot’s algorithms and meta-algorithms. Zach studied Ecology at Dartmouth College and has been doing data science for over a decade. Zach is passionate about his 10-month-old daughter, data-driven decision making, and automating boring tasks that no one wants to do.