This blog is meant to be a fun and unique take on predicting Outstanding Drama and Comedy Series for the 2019 Primetime Emmy Awards.
The 71st Primetime Emmy Awards ceremony is taking place on Sunday, September 22nd in Los Angeles. This awards ceremony celebrates excellence within various areas of television and emerging media. This year’s nominations are lead by several fan favorites including Game of Thrones, When They See Us, The Marvelous Mrs. Maisel, and more. We’ve tackled interesting and fun use cases before (ex: Gammy 2019 “Song of the Year” prediction and Wimbledon 2019 prediction blog posts) and are excited to apply these methods to the Emmy Awards as well to predict which nominees are most likely to win for the Outstanding Drama Series and Outstanding Comedy Series categories.
How We Made the Predictions
To come up with a prediction for who is most likely to win for Outstanding Drama Series and Outstanding Comedy Series, I pulled historical data from the OMDb API (The Open Movie Database) on the past nominees in both categories dating back to 1966.
Some of the fields I collected for each nomination include:
- The network the show aired on
- Main Actors
- Latest IMDB Rating
- Previous Nominations & Previous Wins
A major step in DataRobot’s automated model building process is performing a univariate analysis on each field or “feature” with the target being whether or not the show won a nomination in the past. For example, we can see below, the feature “has_won_before” has a good chance to be important in helping us determine whether a particular show is likely to win.
To decide which modeling approach to use, I leveraged DataRobot's out-of-time validation framework. This allows us to see which models are most robust through time, pitting them against one another to see which predict future Emmy winners the best. After setting this up and pressing the start button, DataRobot automatically generated and trained 48 different modeling approaches, which finished in less than 10 minutes.
For each blueprint in DataRobot, we can see what features ended up being most impactful in making our predictions. When we go to rank this group of nominees, variables such as Writer, Latest_IMDB_Rating, Has_Won_Before, Network, etc. will be the most important in determining the outcome for this year’s Emmy nominations.
Understanding a DataRobot Model
Now that DataRobot produced a leaderboard of models for me to choose from, typically I want to know how and why these models are coming up with their predictions. DataRobot provides various tools for interpreting models, and the two different insights I found interesting for this particular project were the Partial Dependence plot and the Word Cloud.
Looking at the Partial Dependence plot, we can see how the chance of winning a nomination varies depending on which network it was produced on. Shows created on CBS and ABC historically have been less likely to win nominations in these two categories while TNT, USA, PBS, and Netflix have resulted in a higher chance to win.
For all free-form text features, DataRobot creates a text-model that can be used to determine if certain words are predictive. In this case, we saw the writer of a show was the most impactful feature in determining the likelihood of winning an Emmy nomination. By looking at the word cloud for this particular feature, we can see that having writers such as James Brooks and Aaron Sorkin (names in dark red) increases your chances of taking home an Emmy. Larry David, on the other hand, has not been so lucky. (names in dark blue)
One thing to note here is that the show and the producer are attached at the hip, so if a writer has a show that wins multiple years in a row, it will be seen as a good indicator. But in reality, the writer may just be lucky and has made one good show. In Larry Davids’ case, he has written multiple shows that have been nominated but has only won once in 1993 for Seinfeld. Does that mean he’s a bad writer? This is why with model building it is always good to look at multiple models and compare how different features produce different outcomes.
You also might wonder why “nan” is so front and center? That just means that there was no recorded writer back in the mid-1900s, so DataRobot automatically filled in the blanks with “nan” and somebody had to win.
And the winners are…
Game of Thrones is most likely to win the Outstanding Drama Series category:
|Game of Thrones||1|
|This Is Us||3|
|Better Call Saul||4|
Veep is most likely to win the Outstanding Comedy Series category:
|The Marvelous Mrs. Maisel||2|
|The Good Place||4|
With this blog post, we’re demonstrating that machine learning can not only be fun but can also have applications well beyond the traditional ones we are used to seeing in fields such as banking or insurance.
About the Author:
Miles Adkins is a Customer-Facing Data Scientist at DataRobot and initially joined as part of the Applied Data Science Associate program. Miles has five years of experience in quantitative investment management and has his Bachelors of Science in Finance from Illinois State University.