The second hardest part of becoming an AI-driven enterprise is deploying models to production. The hardest part is actually developing the models, but fortunately, advanced tools, like DataRobot, are creating ways to automate the process of building machine learning models. That means that moving to production is becoming the primary roadblock to getting value out of AI.
There are a number of reasons that moving to production is hard:
- Legacy systems have been developed to support a single (often suboptimal) approach.
- Complex change control and production processes make speed-to-market very, very slow.
- Existing systems do not have any existing mechanism for supporting predictions, so companies frequently have to develop new applications and pathways to support new workflows,
I once had a conversation with a business head at a large financial institution, who said something along the lines of, “I can build it internally for $5,000,000; I can hire Deloitte to do it for $500,000; or I can watch a startup do it for $50,000 in a tenth of the time.” It isn’t new for large organizations to have to scramble to keep up with smaller, more nimble companies, but I think both the scale and intensity of innovation at start-up companies is new.
Just like AI is forcing organizations to think differently about how they make decisions and run their business, these same solutions will force IT organizations and technical staff to think differently about how to deploy them.
The purpose of this blog is to lay out the different approaches to deploying AI solutions. Just like AI is forcing organizations to think differently about how they make decisions and run their business, these same solutions will force IT organizations and technical staff to think differently about how to deploy them. Their toolkit needs to be both broad and flexible, which is no small challenge, if they want to be able to support the onslaught of innovation that the influx of data will produce in the next several years.
Before we talk about how to deploy AI solutions, I think it’s important to agree on some basic terminology. The sheer volume of hype in the AI space means that terms are used ambiguously, so the following definitions – while not strictly accurate – are necessary for the conversation.
Data Science is the science and art of solving business problems with data. Data science is a two-stage process. In the first stage, machine learning is used to build models from historical data. These models turn the various data inputs into a prediction; e.g., the details of a transaction might be used to predict the likelihood that that transaction is fraudulent. In the second stage, the prediction from the model is used to make or influence a decision or a process. In the fraud example, a transaction with a high probability of being fraudulent might be blocked at the point of sale.
Machine learning, used in stage 1, is the means by which historical data is used to train or teach a model how to make predictions.
Artificial intelligence (AI) is the system that combines data, business logic, and machine learning models in order to influence or automate a process. While this isn’t, strictly speaking, the academic definition of AI, it is a useful one for the purposes of this conversation.
It’s real-life data being used to solve a real-life problem.
Production is a term that carries with it a ton of baggage, particularly in IT shops, but for the purposes of this conversation, a production system is simply a system that is being used in real life. It’s not a POC to test whether or not something will work or an experiment performed on sample data. It’s real-life data being used to solve a real-life problem. Typically in an enterprise there are all kinds of safeguards and SLAs that are enforced when a system is “moved to production,” but in reality there are hundreds of Excel workbooks and (gasp) Access databases that are driving “production” use cases every day. They may not be production-grade applications supported by IT, but they are in production in the sense that they are impacting real business cases and impacting earnings.
Production means different things for different projects
All use cases are different, and the deployment methodologies need to be different in order to accommodate them all.
I could name dozens of organizations where, apart from building the models, the chief task was figuring out how to fit their new models into their old process for deploying models. Other organizations have come asking for “the way” to deploy machine learning models into production. My answer to them all is that there is no single method for deploying models that will work in every situation. All use cases are different, and the deployment methodologies need to be different in order to accommodate them all.
For instance, suppose a credit card provider wants to block fraudulent transaction in real-time at the point of sale. That AI system needs to be able to handle a large quantity of transactions in real time from a variety of different sources and score those transactions very quickly in order to return a go-no-go response. That system will also likely need close monitoring to identify new types of fraud, customers that are adversely impacted, and so on.
Compare that with a traditional marketing firm with a large database of potential prospects. For a particular marketing campaign, the firm needs to select the right prospects to contact in order to optimize the ROI for their particular marketing campaign. In this case, the prospects aren’t scored in real time and the speed of scoring isn’t particularly relevant. Instead, the firm needs to be able to score large quantities of data all at once and take action based on the results. The deployment might only occur a single time per marketing campaign, rather than continuously as in the fraud detection use case.
In order to deploy the variety of models that an organization needs in order to remain competitive, tech teams need to have the flexibility to use a variety of approaches and technology stacks.
Elements of a good production system
Despite the fact that every use case is different, there are some basic qualities of a use case and a production deployment that are necessary in order to evaluate a solution. The following are the primary drivers of a good solution.
Insisting on strict availability requirements for projects that don’t need them will slow down the deployment process, whereas availability requirements that are too loose translate to flakey solutions that don’t meet their SLAs.
Perhaps the most important aspect of a production AI system is availability. The output is being used with real data to solve real problems. That means the results need to be available when needed. Highly available systems have built in redundancies to ensure that they will be up and running with great reliability.
Not every production AI system requires the same level of availability. Fraud detection systems simply must be up and running all of the time. If they go down, losses due to fraud start to pile up. On the other hand, if I’m preparing for a marketing campaign, I usually have a lot of advanced notice and plenty of time to finalize the prospect list. I may even have time to evaluate the list and iterate a few times.
Even though the availability requirements are different, it’s important to get them right. Insisting on strict availability requirements for projects that don’t need them will slow down the deployment process, whereas availability requirements that are too loose translate to flakey solutions that don’t meet their SLAs.
Throughput and Latency
Throughput is all about volume, and latency is all about speed. Even though different use cases will utilize different deployment methods, they all have one thing in common: predictions. The number and rate of predictions are key questions for any deployment.
For a real-time use case – e.g., transaction fraud – the process will be scoring individual transactions throughout the day. The quantity of transaction will likely be highly variable, with peak times during the weekends and meal and shopping times. The system needs to be able to handle the peak rate of transactions without breaking.
For batch-style use cases where many predictions are made in bulk, throughput and latency are frequently less important, though there are exceptions to this rule. I worked with a large financial institution that required the ability to score 200,000 predictions per second in batch because of the sheer number of observations to be scored on a nightly basis. Developing an understanding of volume and rate of predictions is a key consideration for any solution.
The custom-made solution might be more flexible and relevant, but the wise organization will weigh the risks and benefits with any solution before making a decision.
One final consideration in developing any production system is risk. Suppose a small start-up was faced with the prospect of deploying a new AI system. On the one hand, they could probably build the system internally from scratch. There are some risks associated with that approach. For instance, what if it breaks? Does the small start-up have the resources to maintain and support it? Another risk is new code. Brand new implementations are buggy, and implementation errors represent a non-trivial risk when deploying models. The custom-made solution might be more flexible and relevant, but the wise organization will weigh the risks and benefits with any solution before making a decision.
Common Paths to Production
There are 4 main approaches to deploying models today that any organization should consider: manual scoring, API-based, batch/distributed methods, and scoring code.
1. Manual Scoring
What it is: Most of the software packages available today provide some manual means of making predictions. This might be a drag-and-drop process or manual scoring using some analytics software. Short story, somebody will take a file, calculate predictions, and then do something with them.
Pros: Non-technical and “fast.”
Cons: Manual, error-prone, and not really fast. If you use this method, be sure to have careful reconciliation and validation in place.
2. API-based approaches
What it is: API-based scoring methods started out with cloud companies that provided black-box models that could be used to make predictions using simple code-based requests. These “pay per prediction” solutions have been upgraded over the past several years. Now on-premise as well as cloud solutions frequently provide a reliable means to send data to a server in order to generate predictions.
Pros: Cheap, simple implementation, and low risk. Because API-based solutions abstract away scoring code, there is a much lower chance that scoring errors get introduced in the implementation process.
Cons: May be hard for IT to adopt. This is a newer approach, compared to traditional deployment methodologies, so it may be difficult to convince tech teams to go down this route.
3. Batch/distributed methods
What it is: Distributed computing has exploded in popularity, utilizing both commodity hardware and open-source implementations. Scoring is an inherently parallelizable task, and so distributed methods tend to be very fast on large quantities of data.
Pros: Fast for big data and batch scoring and can take advantage of streaming technologies.
Cons: Less mature and more complex. The hadoop ecosystem is very new and is changing rapidly. Finding experts in this field to build your solution may be difficult and keeping up with the pace of change may be a challenge.
4. Scoring Code implementations
What it is: Many machine learning packages have learned to export some serialization of the models that have been built. PMML was an early example. POJO has become more common today, but these approaches allow machine learning tools to output production grade code that can be implemented in a larger system.
Pros: Flexible and fast. Scoring code requires a custom implementation, so you can build whatever system that you want to support it. POJO is a java-based implementation, so it tends to be faster for real-time use cases than, say, an API-based method
Cons: Tends to be error-prone, time-consuming, and expensive. Implementing scoring code is harder than any of the other methods we have discussed, and so it is more error-prone. There is also more rework when you want to update your models, which means more time and higher costs.
I have seen firsthand the potential of AI to generate hundreds of millions of dollars in revenue, even for a single use case. The sad truth, though, is that most machine learning models never get implemented. In order to maximize your organization’s AI ROI, think about implementation early and consider all of the available options when you design your solution.
About the Author:
Greg Michaelson is the Director of DataRobot Labs, an R&D group focussed on developing data science applications in a business context. Prior to that he led the data science practice at DataRobot, working clients across the world to ensure their success using the DataRobot platform to solve their business problems. Prior to joining DataRobot, Greg led modeling teams at Travelers and Regions Financial, focusing on pricing and risk modeling. He earned his Ph.D. in applied statistics from the Culverhouse College of Business Administration at the University of Alabama. Greg lives in Charlotte, NC with his wife and four children and their pet tarantula.