Like many people around the world, I enjoy watching the Olympics. And some of the most memorable moments involve intense rivalries driving competitors to set new world records.
Usain Bolt - Jamaican sprinter
“Competition is always a good thing. It forces us to do our best. A monopoly renders people complacent and satisfied with mediocrity.” Nancy Pearcy (American Author)
In the data science community there’s often a lot of hype surrounding the latest algorithm, whether it be about “deep learning” or a “decision jungle.” All of this hype and attention can unknowingly trigger human bias. The people building AI solutions are human, and just like everyone else, they are subject to biases such as the availability heuristic and the bandwagon effect. Typically, companies are trying to roll out projects quickly leading to tight timeframes, and all too often people will end up trying only one algorithm.
The lack of diversity in model building usually leads to substandard results. A recent benchmarking exercise on a wide range of business use cases concluded “The diversity of algorithms earning top accuracy rankings demonstrates the need to test as many different algorithms as possible to find the best one for your data.”
You may have your favorites, but step aside and let competition choose the best algorithm for your needs just as the Olympics brings the best athletes in the world to compete for gold in various sporting events. Your own “Algorithm Olympics” should look like this:
Heats: Start with a short list of 10 to 40 algorithms that look the most promising for your specific needs. Train them on a subset of your data— say one in every six rows. Don’t waste time training every algorithm on all of your data because some of those algorithms will quickly reveal themselves to be more promising than others.
Semi-Finals: Separate the algorithms ranked in the top half from the heats and give them the chance to improve on their performances by training them on twice the amount of data. This will help weed out the ones that were just lucky the first time through.
Finals: At this point, all that you are left with is the best of the best. Let these top algorithms retrain on twice the amount of data that they had for the semi-finals. This is their final chance to prove their worth, improve on their semi-final performances, and earn the top ranking.
In this way you are comparing each algorithm, like-for-like. It is a fair and efficient way to find the best algorithm to power the artificial intelligence for your business. You probably already follow an interview process just like this when you hire new staff!
So, why isn’t this always the process? In the past, it was complex manual work to assemble and train machine learning algorithms. There simply wasn’t enough time to try out so many different algorithms. Then, there were the system compatibility problems, as many of the relevant open source machine learning libraries have different system requirements. It was all too much trouble!
This is where automated machine learning steps in as a game-changer. Instead of requiring users to have deep knowledge of algorithms and manual coding, automated machine learning will do all of the following for you:
choose a short list of 10 to 40 machine learning algorithms that look promising,
choose only algorithms that make sense for your data,
train dozens of algorithms, through heats, semi-finals, and finals,
compare the results in a competition leaderboard and rank the best algorithm for your needs, and
compare algorithms for accuracy and speed.
Automated Machine Learning
Head-to-head model competitions is one of the ten key steps in automated machine learning. With automated machine learning, an expert system automates the data science workflow for you. DataRobot is the pioneer of automated machine learning, and not only provides head-to-head model competitions, but automates all ten steps of building models.
About the Author:
Colin Priest is the Director of Product Marketing for DataRobot, where he advises businesses on how to build business cases and successfully manage data science projects. Colin has held a number of CEO and general management roles, where he has championed data science initiatives in financial services, healthcare, security, oil and gas, government and marketing. Colin is a firm believer in data-based decision making and applying automation to improve customer experience. He is passionate about the science of healthcare and does pro-bono work to support cancer research.