How to Deploy Machine Learning Models using Amazon SageMaker

May 23, 2022

Introduction

Data Scientists and Machine Learning Engineers spend most of their time focusing on data collection, data cleaning and preprocessing, modeling, and fine-tuning the hyperparameters. However, many of them did not get the opportunity to take their ML models to deployment.

This article covers forecasting flight delays for US domestic flights using built-in SageMaker algorithms. Then, deploying the model using the SageMaker endpoint to get real-time inferences.

What is deploying an ML model?

How you train and deploy a model with Amazon SageMaker

Source: https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html

After having a robust ML model and you as a data science team agreed to rely on it, you need to deploy it in a production environment. This could be a simple task by running the ML model on batch data and perhaps writing a creating a scheduled cron job for scheduling the process of running the model every while in the future. However, sometimes you might empower your software application with the created model to make predictions on streaming data. Therefore, you might call the ML model using simple RESTful or gRPC APIs. By this time, you might need to retrain your model on the updated data to avoid overfitting. As a result, having updated versions of the ML model run on production. It is necessary to handle model versioning, smoothly transition from one model to the next, and even plan to roll back to the previous model in case any failures happen, and maybe run multiple versions of your models in parallel to perform A/B or canary experiments.

How do you deploy machine learning models into production?

Based on your use case, you might perform various deployment approaches. For instance, AWS SageMaker provides real-time inference workloads where you have real-time, interactive, low latency requirements. Further, you can use SageMaker batch transform to get predictions for an entire dataset. Also, there are more deployment types like; Serverless Inference, Asynchronous Inference, and SageMaker Edge Manager for edge deployment.

How should you maintain a deployed model?

Monitoring your model is the fundamental principle to maintaining your ML model. AWS SageMaker real-time monitoring ensures the quality of your machine learning models in production. You can set alerts when anomalies and data drift hit the model. AWS model monitor services allow you to take quick actions to maintain and upgrade the quality of your running model.

Introducing the use case for deploying a model

In this project, we use Amazon SageMaker to create a machine learning model that forecasts flight delays for US domestic flights on the US Department of Transportation flight data to train the model that predicts flight delays.

We will go through the process of preparing raw data for use with machine learning algorithms. Then you will use a built-in SageMaker algorithm to train a model using the prepared data. Finally, we will use SageMaker to host the trained model and learn how you can make real-time predictions using the model.

Create

First, navigate to the AWS SageMaker (https://aws.amazon.com/sagemaker/) from the console.

From the left list, choose Notebook → Notebook instances.

Choose the default configurations and hit Create notebook instance. The Sagemaker notebook environment is similar to a Jupyter notebook. Therefore, it should be easy to follow along.

Train the model to learn from the data

I uploaded the US flight delay data to S3 bucket and loaded it with pandas to convert CSV data into a DataFrame (df) that is easy to manipulate. To do so, you need to provide pandas with data types (dtypes) for each feature.

Now that the training data is available on S3 in a format that SageMaker can use, you are ready to train the model. The Amazon SageMaker linear learner algorithm provides a solution for both classification and regression problems. It scales to large data sets and is sufficient for demonstrating the use of built-in SageMaker algorithms using Python.

SageMaker algorithms are available via container images. Each region that supports SageMaker has its own copy of the images. You will begin by retrieving the URI of the container image.

From the image URI you can see the images are stored in Elastic Container Registry (ECR). You can also make your own images to use with SageMaker and store them in ECR.

Pipe mode is used for faster training and allows data to be streamed directly from S3 as opposed to File mode, which downloads the entire data files before any training.

You must now configure the hyperparameters for the linear-learner algorithm. You will only configure the required hyperparameters.

The training job can now begin training the model. You can monitor the progress of the training job as it is output below the following cell. Blue logs represent logs coming from the training container and summarize the progress of the training process.

Deploy the model

SageMaker can host models through its hosting services. The model is accessible to clients through a SageMaker endpoint. The hosted model, endpoint configuration resource, and endpoint are all created with a single function called deploy(). The process takes around 5 minutes, including the time to create an instance to host the model.

SageMaker Studio Notebooks are one-click Jupyter notebooks that can be spun up quickly. The underlying compute resources are fully elastic, and the notebooks can be easily shared with others, enabling seamless collaboration. For this lab, ml.t3.medium would do the task with 2vCPU and 4 GiB Memory.

The Endpoint is accessible over HTTPS, but when using the SageMaker Python SDK, you can use the predict function to abstract the complexity of HTTPS requests. You first configure how the predictor is to serialize requests and deserialize responses. You will use CSV to serialize requests and JSON to deserialize responses.