With the American football season fast upon us, there are plenty of folks here at DataRobot who are busy gathering historical football stats and brushing up on their models in anticipation of fantasy football. In this post, we asked four of our hardcore fantasy experts to share how they geek out by applying data science to football. Their approaches cover both NCAA (college) and NFL (professional) football.
Predicting Player Performance
Ben Miller (Customer Facing Data Science)
Ben is a data scientist and a former professional basketball player. Ben has written on the topic of evaluating player performance in the NBA (see his blog post here). And, when it comes to football, Ben takes a similar approach:
It all begins with gathering historical football data. Ben uses fantasydata NFL as one source of historical player statistics. If he is playing DraftKings, he would use the data from fantasydata NFL to predict the number of DraftKings points each player will score by setting that as the target in his DataRobot project. This idea here is to build a predictive model that understands the relationships between our historical statistics and DraftKings fantasy points.
Once you have your historical dataset and prediction target (DraftKings fantasy points), lean on DataRobot’s modeling engine, which automatically engineers hundreds of features that are derived from the historical data. The features created include a set of simple historical lags like means, standard deviations, medians, minimums, and maximums. These features are created over many different time periods for each statistic.
The ability to make predictions for the future is how you can utilize the top model to help predict how players will perform next week.
DataRobot also automatically engineers more complicated time series features, like Bollinger bands and rolling entropy. DataRobot will then build and test many models and sort each model by predictive accuracy using out-of-sample validation data. Once modeling is complete, each model is immediately available to make predictions on new data. The ability to make predictions for the future is how you can utilize the top model to help predict how players will perform next week.
Ben uses a little secret sauce when he builds models by adding text data. DataRobot includes a text mining engine, which automatically extracts predictive information from unstructured text. Adding some player fantasy news to your data will add valuable information. For example, if your QB is “nursing a grade 3 hamstring pull,” the text mining engine will extract that information and regress it against the target. If this text is associated with degraded performance, DataRobot’s models will utilize the information to downgrade the prediction for points scored.
Finally, Ben also includes other sophisticated data in the form of Las Vegas and Fantasy Sports site predictions. These types of data rely on highly sophisticated analytics and other modeling techniques. Adding this data to your model will essentially “ensemble” their models with your model and create highly accurate models. Try experimenting with these and other data to quickly see if your models improve.
Picking College Football Winners
Taylor Larkin (Data Science Evangelist)
Taylor is a data scientist with DataRobot University. Having graduated from The University of Alabama (Roll Tide!), Taylor is a huge college football fan, and enjoys rooting for his alma mater as well as participating in fantasy competitions like ESPN College Pick’em.
The idea for College Pick’em is pretty simple: each week you’re given 10 college football games and you pick who you think is going to win. And, depending on the format, you may need to rank how confident you are about your decision. As the weeks progress, you earn points based on how many games you picked correctly and at what confidence level.
The trick to gaining an advantage is to mitigate your risk better than everyone else.
Using publicly available information about the point spread for a given game, Taylor manually builds models to determine how to best rank these games in a risk-averse way. With the opportunity to use DataRobot this past year, he turned his more manual process into an automated machine learning workflow. Over the weeks during last year’s competition, Taylor tried out several different strategies for predicting how the games should be ranked using DataRobot. He found the most success in using DataRobot’s frequency-severity models.
Typically used for insurance pricing, these models operate in two stages, estimating both operational risk and loss in the same model. For Pick'ems, he used these models to (1) predict the likelihood that a team will win (2) given their chance to win, by how many points will they win by? This output signified a predicted number of points a team would win by, weighted by that team’s chance to win a particular game.
Below is an entry that Taylor made based purely on ranked predictions from DataRobot, which placed in the 99th percentile.
Taylor wanted to see if he could improve his predictions by using his domain knowledge about college football. By using his understanding of the game, Taylor was able to boost his performance into the 100th percentile, proving just how important subject matter expertise is in data science. A simple example of this is when Taylor saw Alabama ranked by DataRobot as a 7 point game, his domain knowledge (or some would say blind allegiance to Alabama) led him to change it to a 10 point game. With his added insights, Taylor improved his picks to the 100th percentile.
While predicting the outcomes in sports is challenging due to the amount of variance, it does not mean that it’s a frivolous venture. Whether you’re competing against your friends or against Vegas, we all experience the same uncertainty when watching a game. The trick to gaining an advantage is to mitigate your risk better than everyone else.
Identifying Historical Trends in Data
Gareth Goh (Customer Marketing Manager)
While Taylor is busy analyzing data, Gareth -- a Customer Marketing Manager by day and Fantasy Football semi-professional by night -- takes another approach, stepping back in time to consider historical trends. As we have seen with Taylor, being able to add your own insights can give you an edge.
American football is an exercise in variance-filled small sample size theater. An NFL season consists of just 16 regular season games (compared to 82 in the NBA and 162 in the MLB), and a typical NFL team will run only about 70 offensive plays per game on average (compared to over 100 possessions per NBA team). Coupled with the generally random and unpredictable nature of the sport, it’s difficult to get enough of a sample size in order to draw conclusive results.
Yet, despite the relatively small samples of data, analyzing historical data can reveal some interesting trends and findings that can be applied by savvy fantasy football players to gain an edge over the rest of their league. The key is to take your findings with a grain of salt; are there enough data points in a particular sample size to separate the signal from the noise?
Take your findings with a grain of salt, throw out any data points that might unfairly skew your trend, and go beyond the surface level analysis to dive deeper, and you just might get the edge you need to win your fantasy football league.
For example, professional athletes across all sports tend to perform better with home field advantage. Whether it’s due to the support of the home fans, the comforts of routine, or avoiding the physical toil of travel, there is enough data to suggest that players at home will play at a higher level than they do on the road.
Ben Roethlisberger, the quarterback of the Pittsburgh Steelers, takes home/road splits to the extreme. Over the past four NFL seasons, Roethlisberger has played 28 home games and averaged 26.2 fantasy points per game. During that same time period, he has played 32 road games and averaged only 15.69 fantasy points per game. This is a massive difference, one that has born out over a large enough sample size to represent statistical significance. On weeks when the Steelers are heading on the road, you might want to sub in your backup QB for big Ben.
The team that owns Ben Roethlisberger in your fantasy league will be interested in this little nugget, but other owners can benefit as well. It’s always tough to bench your fantasy studs, but if you have a better option, it might be wise to sit wide receiver Antonio Brown too. After all, if Ben is struggling to throw well, Antonio might find fewer opportunities to accrue receptions and touchdowns. If you own a defense that is facing off against the Steelers on the road, that might be a great opportunity as Roethlisberger tends to throw more interceptions in road games.
Just because sample sizes are small, doesn’t mean they can’t still be revelatory. Take your findings with a grain of salt, throw out any data points that might unfairly skew your trend, and go beyond the surface level analysis to dive deeper, and you just might get the edge you need to win your fantasy football league. The way to really take advantage of these historical trends is to then combine your own insights with those created by models. Once you start using models, then you can find these trends across multiple players by using insights like Feature Impact to quantify what are the most useful statistics.
Better Manage Your Fantasy Football Team
Rajiv Shah (Data Scientist)
Rajiv is a customer facing data scientist who works directly with DataRobot’s customers to make them successful. In the past, Rajiv has used his data science knowledge and techniques to help people better manage their fantasy football teams. Many people participate in fantasy football competitions on sites like FanDuel where they have to put together the best performing team using a salary cap. While you can use analytics to assess player performance as Ben does, you can also use analytics to optimize the best team for a given salary cap. This type of optimization problem is known as the “knapsack problem” or an assignment problem.
As a starting point, each football player has a price and there is a salary cap limit. The challenge is to optimize your team to produce the highest total points while staying within a salary cap limit. A simple optimization strategy will put the highest overall number of points within a salary cap on a team. A slightly sophisticated approach will start to consider a strategy called stacking, where you ensure your QB and WR are on the same team.
There are even some cutting edge optimization approaches that try and predict the player composition of your fantasy opponent in a head-to-head matchup league. After all, there are some players that are much more popular. Using this knowledge, you can predict the likely teams that will oppose your team. Here are some links to more info.
The Final Whistle
There are many factors to consider when integrating advanced analytics analytics with your fantasy football efforts, and we hope that this blog gives you insights into how to choose a winning lineup for your fantasy team, or to pick the teams that will win with confidence. Building predictive models and combining them with your own knowledge provides an edge over your competitors. Playing fantasy sports is fun, but winning makes it even better!
About the Author:
Rajiv Shah is a data scientist at DataRobot, where he works with customers to make and implement predictions. Previously, Rajiv has been part of data science teams at Caterpillar and State Farm. He enjoys data science and spends time mentoring data scientists, speaking at events, and having fun with blog posts. He has a PhD from the University of Illinois at Urbana Champaign.