A recent survey of US consumers found that 75% prefer to use credit cards and debit cards to make payments, compared to only 11% of consumers who prefer cash. This quickly equates to billions of credit card transactions every day, which is precisely the type of big data that banks and fintechs are starting to use to better understand their customers.
What did you purchase in the past week? What does it say about you? My own personal credit card transactions for the past week included payments for:
- Hotel and flight bookings
- Supermarket groceries
- Japanese restaurant dinner
- Chinese food home delivery
- Dry cleaning home delivery
- Thai restaurant lunch
This is pretty reflective of my usual monthly credit card statement, which routinely includes all of these items. Just from these broad descriptions, an astute marketer can already start to understand me and what products I am likely to be interested in. For example, I travel a lot on business and I am busy as evidenced by my use of home delivery services, so it is clear that I will like new convenience services that save me time. I like Asian food and eat out frequently, so I will like to know about new Asian restaurants located near me.
But the information in credit card data tells more than just my consumer spending preferences. It can be a powerful predictor of my creditworthiness – and banks and fintechs are starting to pay attention to these details and use them to predict which loans will result in default. In a new predictive model I developed using transaction descriptions, which have become a powerful predictor in our customers’ credit risk models, I have been able to exceed the accuracy of traditionally designed credit risk models by up to 100%.
Consider the word cloud above, which visualizes the transaction descriptions in a credit risk model. The predictive algorithm finds words and phrases that are correlated with bad loans. Bright red words and phrases are associated with higher risk of loan default. Dark blue is associated with lower risk of loan default. The larger the words and phrases, the more often they appear in the transaction descriptions of customers. For this customer base, healthcare, airfare, and fashion purchases indicate lower credit risk. Interest charges, fast food, and cash advances indicate higher credit risk. Armed with this information, a lender is able to better determine which customers are safer borrowers. Two customers may be the same age, have similar jobs and incomes, and live in the same neighborhood, but if their spending behavior is different then the lender will have more information about the differing levels of risk.
Until recently, most banks and fintechs didn’t use transaction descriptions for scoring credit risk. It was too difficult: the traditional statistical techniques that banks used were not able to read text, and there weren’t enough data scientists to do the work. But things have changed in recent years. Computing power has become cheaper and easier to access, and open source machine learning algorithms have been published that can automatically identify words and phrases. And now there is automated machine learning, expert software that automatically builds complex machine learning algorithms from historical data, enabling credit risk modeling staff to quickly ramp up their techniques.