This blog is meant to be a fun and unique take on predicting which summer Disney 2019 movie will be the most popular and is a guest piece from our partners at Datatechnology, showcasing the Qlik and DataRobot integration.
Summer is almost here, and the school year is winding down. On the days when the heat is just too much, the movie theater is the place to be. Summer movies bring in millions of viewers, money, and data. So, what makes a movie a blockbuster hit? And, for the summer of 2019, which film will come out on top? For this blog post, I will leverage movie data to predict the success of three highly-anticipated Disney films: Toy Story 4, Aladdin, and The Lion King.
These films are three Disney classics, so the race to the number one spot will be a close one. Toy Story 4 is the newest installment in an already famous and successful franchise. All three of the previous Toy Story movies made the top 20 list of the most successful Pixar movies at the box office (BusinessInsider). The other two movie contenders are live-action remakes of two classics. Aladdin and The Lion King will feature some of the top superstar celebrities (e.g., Will Smith as Genie in Aladdin and Beyonce as Nala in The Lion King), which will surely boost their popularity. Disney is bringing back major classics in new ways, so these films are bound to draw in huge crowds of fans, old and new.
I started this project by extracting, cleaning, visualizing, and modeling the data in Qlik. I dove into four databases (IMDB, TMDB, Movie Len, and YouTube) for information like film revenue, budget, popularity, trailer views, ratings, and reviews. This is the training data used for my model and will help with my prediction.
The following screenshots show the current visualizations that are available within the current application. The idea was to visualize the data so that we can better understand patterns and correlations within different feature aspects.
Discover your dataset: Below are screenshots of the data collected in Qlik for this movie prediction project that covers information such as film crew, genre, actors, and year.
Now it’s time to use the connector in Qlik for uploading the data to DataRobot. I was able to seamlessly and securely select and send the movie data I collected from Qlik Sense® to DataRobot directly from an application and without requiring any manual exports:
DataRobot then automatically built over 50 different models allowing me to select the highest performing model. Once I selected the model, I was able to use our interpretability features to better understand the predictions coming out of the model. In the figure below, the prediction explanations show the movies with the highest and lowest box office earnings inside our training data. It is possible to see some patterns in the explanations with the budget and vote count being two very influential factors. For example, the most important factor for several of the highest performing movies was the fact that their budget was over $250 million. While a high budget doesn’t guarantee a box office win, it is a very useful indicator.
And the Results are in...
The Lion King has the greatest chance of being the most popular Disney movie for summer 2019! Toy Story 4 takes the second spot and Aladdin comes in third.
Whether you’re celebrating the circle of life, going on adventures with Buzz Lightyear, or freeing a genie from a golden lamp, these summer blockbusters are sure to make a huge splash this summer. This is going to be a close race. Which movie do you think will take the top spot?
Learn more about the DataRobot and Qlik partnership here.
About the Author
Suky Dhak works at Datatechnology as a senior enterprise solutions architect with over 15 years of experience in integration technology and business intelligence. He’s worked with over 100 successful project implementations in a number of domains. Suky is skilled in project leadership, requirements analysis, agile methodologies, training, consulting, presales, and software evaluation.