Give Me One Good Reason To Trust Artificial Intelligence

As reported in a Wall Street Journal article, DARPA has announced an international effort to overcome what many say is the biggest obstacle to widespread adoption of artificial intelligence: teaching algorithms to explain their decision-making to humans.

 

For the last couple of years here at DataRobot, we’ve been building machine learning models that can be explained to humans, without sacrificing accuracy. 

 

Albert Einstein once said, “If you can't explain it simply, you don't understand it well enough”. Over the past few years machine learning and artificial intelligence have made massive strides forward in predictive power, but at the price of complexity. When asked how a machine learning model makes its decisions, data scientists have been known to start describing the mathematics and equations to an already dumbfounded audience of business managers. While the data scientist’s answer is mathematically correct, it doesn’t answer the real question, which is whether we can trust the model. It’s also not how we would answer if it were a human who was making the decisions, and we were asked to describe how that human made their decisions!

Consider a company that lends money to consumers, let’s say it is called ManualLoanCo. It could be a bank or a FinTech company, but the process is the same.ManualLoanCo. has a group of specialist staff, called underwriters, who decide which people are the good risks and can get loans, and which people are the bad risks and don’t get loans. If you were asked to explain how the underwriter made their decisions, would you take them to a hospital, get their brain scanned, and describe in detail how a ManualLoanCo. underwriter’s brain is wired?

You wouldn’t. Instead you would ask the underwriter three simple questions:

  1. Which information on the loan application form is most important to your decision?
  2. How do you use this information, what values indicate good or bad risks?
  3. How did you decide to accept or reject some specific examples of loan applications?

The underwriter would then give you simple human explanations that a normal human could understand. For example, they may say the most important information is to compare the cost of the loan repayments versus the applicant’s income. If the loan repayments are more than a third of the income, then they won’t accept that loan application. Based upon such explanations, ManualLoanCo.’s managers could then decide whether they trust the underwriter’s way of thinking.

So, why don’t we ask the same questions of the algorithms making similar decisions? Let’s apply this thinking to data from Lending Club, a peer-to-peer lender.

Which information on the loan application form is most important to your decision?

ENET Blender Feature Impact 64 Sample Size-1.png

Here, using a technique called feature impact, we can see that the revolving utility balance, the applicant’s income, and the purpose of the loan are the top three most important pieces of information to my algorithm.

How do you use this information, what values indicate good or bad risks?

Gradient Boosted Greedy Trees Classifier with Early Stopping (Validation) (annual_inc) Model X-Ray (Omitted Data Percent_ 0) (Data is Capped).png

As shown above, using a technique known as partial dependence, we see that the algorithm scores higher income loan applicants with higher incomes as having a much lower risk. In fact we can see a structural break at annual incomes around $45,000 due to the historical underwriting rules that were in place that had treated applicants above and below $45,000 quite differently.

How did you decide to accept or reject some specific examples of loan applications?

reason codes.png

Using a technique known as reason codes, we can see the most important factors in the estimate of each loan applicant’s details. For example, for loan ID 6257, the top reason of the applicant’s description of why they want the loan. It turns out that they want money to fund a new small business venture to film documentaries, and that’s the top reason why the algorithm says that there is a 50.2% probability that this loan will go bad!

 

With consumer activism on the rise and more and more regulatory requirements giving consumers the right to human explanations of algorithmic decisions, it isn’t just DARPA who is looking for a solution to explainable models. For the last couple of years here at DataRobot, we’ve been building machine learning models that can be explained to humans, without sacrificing accuracy. Our customers can do it with just one click of a button…

 

New Call-to-action

 

About the Author:

Colin Priest is the Director of Product Marketing for DataRobot, where he advises businesses on how to build business cases and successfully manage data science projects. Colin has held a number of CEO and general management roles, where he has championed data science initiatives in financial services, healthcare, security, oil and gas, government and marketing. Colin is a firm believer in data-based decision making and applying automation to improve customer experience. He is passionate about the science of healthcare and does pro-bono work to support cancer research.