This is part six in a series on getting beyond the basics with Azure ML.
A Better Way for Deploying Code
One development we’ve seen in software engineering over the past couple of decades has been the automation of code deployment leading to more frequent deployment of smaller sets of code changes. This in turn reduces the risk of catastrophic failure as a result of a release (although paradoxically, it does seem to increase the number of bugs which escape). Throughout this series, we’ve seen a natural progression in deployment capabilities.
We ended the prior series with model deployment via the Azure ML Studio UI. This is entirely manual and UI-driven. Then, we looked at model deployment via manually-run notebooks. This is still manual but at least offers the possibility of automation as we control the code to run.
From there, we moved to model deployment via the Azure CLI and Python SDK. Now we have the capability to run, train, register, and deploy models via scripts. This leads to the next phase in the process, in which we can perform continuous integration and continuous deployment of models using a tool like Azure DevOps or GitHub Actions. This is where MLOps starts to shine.
MLOps and Software Maturity
Machine Learning Operations (MLOps) is built off of the principles of DevOps but it is tailored to a world in which data and artifacts are just as important as code. With classic software development, code is the critical asset. We figure out how to store the code (using source control repositories), how to modify the code, and how to deploy the code where we need it to go.
More recently, we’ve heard of DataOps. One of the key tenets of DataOps is that it’s not just code which is important but also data. We might have reference data in databases which needs to go out alongside the code; otherwise, that code isn’t liable to work correctly. Keeping databases in source control is a rather tricky proposition and so there are more considerations than with classic DevOps.
MLOps picks up where DataOps left off. We definitely need the code to train, register, deploy, and use machine learning models. We also need the data to train these models. Typically, we store this data in very large files rather than in databases. Regardless of how we store this data, however, we need to keep track of how the data changes over time. If we train a model today, we are going to use a particular dataset. In the future, if we need to understand why the model ended up the way that it did, we are going to need the same data that existed at the time we trained this model, not the data as it looks today. This leads us to capturing potentially rather large datasets and tracking that data over time, something Git and other source control repos are terrible at. Instead, we capture those changes in the data lake.
In addition to data, we also need to keep track of the models themselves, as well as other model artifacts like weights. We need the actual models to deploy out but these models can also be rather large—we might have multi-gigabyte neural networks and the last thing I want to subject teams to is trying to pull different multi-gig binary files out of Git, especially if we retrain frequently. This means we’ll need specialized tooling to track these models as well and that’s where services like MLflow come into play.
Ultimately, the biggest problem in MLOps isn’t really deployment; it’s automated re-training and re-deploying. You could re-train and re-deploy manually but as your organization matures, you’ll want to be more selective in when and why you re-train, as well as when and why you re-deploy. You want to make sure that you only deploy a new model if it is superior to the current model. You also want to automate the entire process to minimize human error.
Two Models for MLOps
Google and Microsoft both have MLOps maturity level models. There’s a lot of agreement between the two and I recommend reading both papers. The Microsoft version is a bit more fine-grained and I think it gives us a slightly better understanding of the topic.
Building MLOps Maturity
When working with Azure ML, we can use both Azure DevOps and GitHub Actions to incorporate capabilities allowing us to perform model CI/CD. Microsoft has a public template for MLOps with Azure ML and it serves as a really good starting point. They also have a reference architecture explaining how it works. Inside the docs for this repo, they have a getting started guide.
For a more detailed dive into MLOps, my colleagues at Solliance and I have put together a public repo for Azure AI in a Day labs which includes work on MLOps. Lab 2 pertains specifically to MLOps. I will note that there’s a lot to this story and it’s not the type of thing you can do with a couple clicks of the mouse. You will probably need to fight with Azure DevOps or GitHub Actions to get it all working but when once it’s in place for a project, you won’t need to make frequent changes to keep it working. Most of the process is also scriptable, so get it done the first time and you can repeat the process for later projects.
Conclusion
In today’s post, we took a very brief look at MLOps, getting an idea of what it is, why it is important, and what kinds of capabilities we have available to us for performing MLOps via Azure DevOps or GitHub Actions.
The next post will wrap up this series and provide some additional resources on where to learn more.
2 thoughts on “Beyond the Basics with Azure ML: MLOps”