This is part five in a series on low-code machine learning with Azure ML.
Where We Are
In the prior post, we trained a model using the Azure ML designer.

In this post, we’re going to make it available for the whole world to use, as what the world needs is a way of classifying penguins by characteristics like culmen length.
Create an Inference Pipeline
The first thing we need to do is create an inference pipeline. Inference pipelines differ from training pipelines in that they won’t use the training dataset, but they will accept user input and provide a scored response. There are two types of inference pipeline: real-time and batch. Real-time inference pipelines are intended for small-set work. We’ll host a service on some compute resource in Azure and people will make REST API calls to our service, sending in a request with a few items to score and we send back classification results.
By contrast, a batch pipeline is what you’d use if you have a nightly job with tens of millions of items to score. In that case, the typical pattern is to have a service listening for changes in a storage account and, some time after people drop new files into the proper folder, the batch inference process will pick up these files, score the results, and write those results out to a destination location.
Because it’s a lot easier to show, we’ll use a real-time inference pipeline. From the Create inference pipeline menu, select Real-time inference pipeline.

In doing so, the designer changes a bit. We have our penguin-data
dataset but can also see a Web Service Input
. We also have a Web Service Output
which vies with our Evaluate Model
component from training. Let’s remove the Evaluate component, as it is no longer necessary.

Once that’s done, we have a few input components for cleaned and normalized data. The TD-
datasets are File outputs which include the instructions on how to transform our data. The MD-
dataset contains our model, including a score.py
file for scoring results.
Next, we’re going to need to change the compute target. Our compute instance was good enough for training, but this is inference time and we’ll need a compute cluster for that. Select the Apply Transformation component, navigate to Run settings, and change the compute target to a proper compute cluster. We don’t need a GPU for this work, so I’ll stick with a CPU-based cluster and save some cash.

Do this for the two Apply Transformation components as well as the Score Model component. Or, go back in time and choose a compute cluster as your default training location. That also works.
Now that we have everything in place, select Submit and let’s get this dog and pony show on the road. If you get an error that the pipeline compute target is invalid, change the Default compute target to a compute cluster as well. Once you’re done, you should see something like the following:

Select Submit one more time and wait for the inference pipeline to build. This should take a relatively short amount of time, as you aren’t training a new model here, but it can still take a few minutes for the compute to become available.
Once your pipeline has completed, it’s time to Deploy that sucker.
Deploy that Sucker
The deployment screen has a few options available to us. First, we choose a name, which must include lower-case letters, hyphens, and numbers. We can also choose a compute type. There are two options here: Azure Kubernetes Service and Azure Container Instance. The general guideline is that you want to use Azure Kubernetes Service for production. We talked about the reasoning behind this in an earlier post in the series, but the short version is that AKS will have higher uptime guarantees and will also allow you to scale out your service. ACI is good for dev/test scenarios or infrequently-used production services, as it’s a lot less expensive and requires very little maintenance. I’m going to use ACI for this demo for exactly those reasons.
If you do choose to use Azure container Instance, you can choose the amount of CPU and memory capacity to reserve. I’m going to set CPU reserve capacity to 0.5 and Memory reserve capacity to 0.5. Why those numbers? Well, it’s not going to be a heavily-used service and the algorithm we chose is pretty simple, so we won’t even need a full CPU core. The default of 0.1 may work, but I tend to nudge it up a bit, and of course, depending on the model you’ve created, how many operations you’re running, and how much load you expect this service to take, you might need to kick these values up a bit. Finally, I’ll select Deploy to get things rolling.

From here, you’ll see an info box at the top of the screen which reads “Deploy: Waiting real-time endpoint creation.” You can see the process yourself by selecting Endpoints from the Assets menu and ensuring you are in the Real-time endpoints tab. Then, choose the endpoint you want to view.

You’ll see some details on the deployment. If you see an Unhealthy state, don’t panic!

This is a UI change for the worse from Azure ML. A while ago, there was a transitional state which would indicate that your service was being configured, and then it would change to either Healthy or Unhealthy. Now, during transition, the deployment state may come up as Unhealthy or it may show Transitioning. Give it about 10-15 minutes to set everything up and either you’ll see it switch to Healthy or you’ll see information in the deployment logs which will tell you why it’s unhealthy.

We have a healthy endpoint, so let’s give it a go. The REST endpoint field shows what URL we need to call and it even provides an OpenAPI definition. In case you aren’t familiar with it, OpenAPI (nee Swagger) is a standard for defining API services, like a very lightweight version of SOAP.
Testing the Endpoint
We can make cURL or Postman calls to this service, but there’s also a testing facility built into the app, so we can put in values and get back our scored responses.

Note that it does take Species as an input, as we didn’t change the inference pipeline to remove that column. Let’s change that now.
Removing the Species Input
Back in the designer, I want to select the inference pipeline and add a new Select Columns component from the Data Transformation menu. Have it connect to the penguin-data
dataset and act as a go-between from penguin-data
to Apply Transformation
.

In the properties for this component, edit the list of columns and include everything aside from Species.

Now submit the run again and when it succeeds, deploy that pipeline. The good news is that you can see how important it is to do this first. The bad news is that it’s going to take some time to re-run and re-deploy this service. When it does come time to re-deploy, make sure to replace your existing endpoint; after all, no use in having two of these around.

While this is deploying, you will not be able to access the endpoint if you’re using ACI. This is another reason why we tend not to recommend using Container Instances for production services, especially ones which can’t have downtime.
But once it is done, if we head back to the Test tab, we can see that the Species column is no longer marked as an input.

Which means all of this was worth it.
Consuming the Service
Azure ML also includes some helpful code for calling this service in C#, Python, and R. You can, of course, also call it from any other language (like the vastly superior F#) so long as you’re able to access a REST API in that language and send along a bearer token.
Conclusion
In today’s post, we looked at deploying a service in Azure Machine Learning. There’s a whole other method available in batch service prediction, and I think we’ll cover that in the next post. In the meantime, having an endpoint available will cost you money, so you might want to delete the endpoint you created in this post once you’re done. That is, unless you’re all-in on penguin classification.
3 thoughts on “Low-Code ML: Deployment and Endpoints”