I like reading and I’m always looking for new books and new authors that match my preferred styles. But with a new book being published every 5 minutes, I can’t keep up with the number of new books. That’s where artificial intelligence comes in handy. I can go to my favorite online book seller to recommend books that they think are good matches for me. This is done via an algorithm that tracks what I have read in the past, my reviews of which books I liked and disliked, and then searches through the millions of available books in order to show me a short list of recommended books that people like me are most likely to enjoy.
I want artificial intelligence to recommend a short list of algorithms for me to try on my data.
So, when I need artificial intelligence to automate my business, why can’t I get recommendations for which machine learning algorithm best suits my individual needs? Every business is unique, and there are hundreds of algorithms available, each one with individual strengths and weaknesses. Just like I don’t look at every individual book when choosing which one to read, I don’t have the time, resources, or knowledge to try out each and every algorithm. I want artificial intelligence to recommend a short list of algorithms for me to try on my data.
Below is the list of the top 5 things an artificial intelligence should know when recommending algorithms for me.
1: Which Algorithms Won’t Work on My Data
Some algorithms specialize in classification (estimating probabilities of a match e.g. probability that a borrower will default on a loan), while others specialize in regression (estimating an amount e.g. how long before your flight arrives at your destination). Some algorithms require lots of data, while others don’t work on large volumes of data at all. Some algorithms specialize in unstructured data types like text, while others only support numeric data.
I want my artificial intelligence to look at the characteristics of my data and then automatically filter out algorithms that it knows won’t work on my data. Those unused algorithms can be saved for use on another day and a different dataset.
2: Which Algorithms Require My Data to be Specially Prepared
Some algorithms automatically deal with missing values (e.g. when a person’s age is missing), while others don’t. Some algorithms allow for categorical data (values that come from a fixed list, such as gender), while others do not. Some algorithms work best when your numeric data is normalized (rescaled to have the same averages and ranges), while other algorithms give the same answers regardless of normalization.
I want my artificial intelligence to know which algorithms require which special data preparation and automatically add that data pre-processing to the processing pipeline.
3: Which Accuracy Metric Makes the Most Sense for My Data
I want the ability to compare the accuracy of each and every algorithm, but these are not “one size fits all” problems. Different types of data require different accuracy metrics. If I am predicting a probability, I will use a different metric compared to when I predict an amount. If I am predicting the a rare event, then I don’t want an accuracy metric that measures against the most likely outcome — that would just predict that nothing ever happens!
I want my artificial intelligence to look at the characteristics of my data, particularly the set of values that I am predicting, and choose the most appropriate accurate metric for me.
4: Which Algorithms Are Most Likely to be Accurate on My Data
While we don’t know in advance which algorithm will be the best for any dataset, and benchmarking shows that no single algorithm is always the best, we often have a good idea of which algorithms are more likely than others to do better on a particular dataset. That is because we have seen how different algorithms perform on datasets with many rows or few rows, many columns of data or few columns of data, and which tend to perform best with numeric features, categorical features or text features.
I want my artificial intelligence to use knowledge of the historical performance of a diverse set of algorithms across many datasets, and to select the algorithms that are a best match to the individual characteristics of my dataset.
5: Which Algorithms Support Special Requirements
Some specialist business problems have special requirements. For example, insurance pricing models need to allow for “exposure” by enforcing pro-rata scaling of the predictions against the length of the insurance policy. Other times we want to include preexisting knowledge or business rules, like enforcing an extra mortality risk for smokers. Some algorithms support these special requirements while others do not.
I want my artificial intelligence to know which algorithms support these special requirements and which do not.
Automated Machine Learning
Algorithm recommendations is one of the ten key steps in automated machine learning. With automated machine learning, an expert system automates the data science workflow for you. DataRobot is the pioneer of automated machine learning, and not only provides algorithm recommendations, but automates all ten steps of building machine learning models.
About the Author:
Colin Priest is the Director of Product Marketing for DataRobot, where he advises businesses on how to build business cases and successfully manage data science projects. Colin has held a number of CEO and general management roles, where he has championed data science initiatives in financial services, healthcare, security, oil and gas, government and marketing. Colin is a firm believer in data-based decision making and applying automation to improve customer experience. He is passionate about the science of healthcare and does pro-bono work to support cancer research.