What to Do With the Wealth of Information in Credit Card Transaction Descriptions

October 30, 2017

by

· 3 min read

A recent survey of US consumers found that 75% prefer to use credit cards and debit cards to make payments, compared to only 11% of consumers who prefer cash. This quickly equates to billions of credit card transactions every day, which is precisely the type of big data that banks and fintechs are starting to use to better understand their customers.

What did you purchase in the past week? What does it say about you? My own personal credit card transactions for the past week included payments for:

Taxis
Hotel and flight bookings
Supermarket groceries
Japanese restaurant dinner
Chinese food home delivery
Dry cleaning home delivery
Thai restaurant lunch
eBooks

This is pretty reflective of my usual monthly credit card statement, which routinely includes all of these items. Just from these broad descriptions, an astute marketer can already start to understand me and what products I am likely to be interested in. For example, I travel a lot on business and I am busy as evidenced by my use of home delivery services, so it is clear that I will like new convenience services that save me time. I like Asian food and eat out frequently, so I will like to know about new Asian restaurants located near me.

But the information in credit card data tells more than just my consumer spending preferences. It can be a powerful predictor of my creditworthiness – and banks and fintechs are starting to pay attention to these details and use them to predict which loans will result in default. In a new predictive model I developed using transaction descriptions, which have become a powerful predictor in our customers’ credit risk models, I have been able to exceed the accuracy of traditionally designed credit risk models by up to 100%.

Consider the word cloud above, which visualizes the transaction descriptions in a credit risk model. The predictive algorithm finds words and phrases that are correlated with bad loans. Bright red words and phrases are associated with higher risk of loan default. Dark blue is associated with lower risk of loan default. The larger the words and phrases, the more often they appear in the transaction descriptions of customers. For this customer base, healthcare, airfare, and fashion purchases indicate lower credit risk. Interest charges, fast food, and cash advances indicate higher credit risk. Armed with this information, a lender is able to better determine which customers are safer borrowers. Two customers may be the same age, have similar jobs and incomes, and live in the same neighborhood, but if their spending behavior is different then the lender will have more information about the differing levels of risk.

Until recently, most banks and fintechs didn’t use transaction descriptions for scoring credit risk. It was too difficult: the traditional statistical techniques that banks used were not able to read text, and there weren’t enough data scientists to do the work. But things have changed in recent years. Computing power has become cheaper and easier to access, and open source machine learning algorithms have been published that can automatically identify words and phrases. And now there is automated machine learning, expert software that automatically builds complex machine learning algorithms from historical data, enabling credit risk modeling staff to quickly ramp up their techniques.

About the author

Colin Priest

VP, AI Strategy, DataRobot

Colin Priest is the VP of AI Strategy for DataRobot, where he advises businesses on how to build business cases and successfully manage data science projects. Colin has held a number of CEO and general management roles, where he has championed data science initiatives in financial services, healthcare, security, oil and gas, government and marketing. Colin is a firm believer in data-based decision making and applying automation to improve customer experience. He is passionate about the science of healthcare and does pro-bono work to support cancer research.

Meet Colin Priest

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Email

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Email

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

What to Do With the Wealth of Information in Credit Card Transaction Descriptions

How to Choose the Right LLM for Your Use Case

Belong @ DataRobot: Celebrating 2024 Women’s History Month with DataRobot AI Legends

Choosing the Right Vector Embedding Model for Your Generative AI Use Case

Related Posts

Thanks! Check your inbox to confirm your subscription.