Enabling Data Science Teams to Meet the Challenges of Machine Learning with DataRobot and Amazon SageMaker

Why Machine Learning Is So Hard

The machine learning process can be challenging for developers, data engineers, and data scientists — building, training, and then deploying models into production is complicated and time-consuming. 

  1. First, you need to collect and prepare historical training data where you know the outcome you are trying to predict, which will be used to discover which elements of the dataset are important.

  2. Next, you need to select the best algorithm to fit the data and the outcome and deliver an accurate model .

  3. After deciding on an approach, you need to train the algorithm to make predictions using the historical data, which requires a lot of computing time and effort.

  4. Then, you need to tune the chosen model so it delivers the best possible predictions. This is often a tedious and complex process that requires a great deal of manual effort.

  5. After you’ve developed a fully trained model, you need to integrate the model with your application and deploy this application on an infrastructure that will scale.

DataRobot and Amazon Web Services (AWS) recognize that these steps require specialized expertise, access to large amounts of compute power and storage, and time to experiment and optimize every step of the process. In the end, it's clear why machine learning is so time-consuming and difficult for most developer and data science teams.

To ensure machine learning success, DataRobot and Amazon SageMaker offer a robust combination of tools that empower developers and data science teams to address the complexities of the machine learning process.


Building Models Just Got Easier

Amazon SageMaker’s focus is scale, cost-effectiveness, automation and security. The AWS SageMaker software works to improve the machine learning process and offers:

  • Extreme flexibility to build and train models. Preconfigured deep learning frameworks, BYO support, and built-in algorithms bring value to both data scientists and developers.

  • Scalable model training. SageMaker can handle a large scale machine learning workload through a full range of AWS instance types and a highly scalable distributed training environment.

  • Highly elastic and scalable hosting environment with high availability and low latency, which is critically important for models deployed to production.

DataRobot is an AWS Machine Learning Competency partner and complements SageMaker by bringing the power of automated machine learning to users.  With DataRobot, users can quickly build, train, validate, and tune thousands of combinations of machine learning models in just a few minutes or hours, and quickly choose the best model based on their own data and what they are trying to accomplish.

Depending on their environment and how they want to work, users have options for making predictions or deploying the models. DataRobot offers on-demand scoring, portable predictions with export code in Java or Python, and a REST API that allows DataRobot to serve as the prediction engine with very little coding. In addition, DataRobot features Spark Scoring for installations in Hadoop environments. SageMaker provides its own deployment capabilities, and offers a coding environment where data science teams can work together to further train and tune the model. Once a model is trained on DataRobot, it can be exported into a file, which contains the model as well as the code to invoke the model. This model can then be deployed in Amazon SageMaker using the “Bring your own container” method. This simple method involves packaging this Python model into the base container provided by AWS. An example of this is shown here.

With DataRobot, SageMaker users have access to the leading automated machine learning platform to optimize and accelerate the development and deployment of machine learning models. When combined with the SageMaker environment, enterprises of all sizes have the ability to create a high-efficiency data science solution that covers data preparation, collaboration, model building, deployment, and long-term model maintenance. All on the rock-solid AWS platform!

DataRobot has developed sample notebooks to show you how to use the DataRobot automated machine learning platform with Amazon SageMaker to build and evaluate custom machine learning models in the quickest, most efficient manner possible.


New call-to-action

About the Author:

Dan Ganancial leads Partner Marketing at DataRobot, and he is responsible for driving joint marketing initiatives with technology alliance and channel partners. Dan is a marketing professional with more than 10 years of experience in partner, product, and strategic marketing. He has held several roles in his career related to sales, business development, and marketing where he has produced a strong record in driving both customer and revenue growth.