This is sort of related to a series on getting beyond the basics with Azure ML but is not officially part of that series. In this post, we are going to cover MLflow, a platform for managing the machine learning lifecycle.

What is MLflow?

MLflow is a set of products which allow data scientists to train models, register those models, deploy the models to a server, and manage model updates. MLflow itself came out of Databricks and is in very common use there. That said, its utility goes well beyond Databricks—you do not need to be using Databricks to take advantage of MLflow.

There are four products which make up MLflow:

  • MLflow Tracking
  • MLflow Projects
  • MLflow Models
  • MLflow Model Registry

Let’s take a look at each one in turn, understand what they do, and see how they work in conjunction.

MLflow Tracking

MLflow Tracking allows data scientists to work with experiments. For each run in an experiment, a data scientist may log parameters, versions of libraries used, evaluation metrics, and generated output files when training machine learning models.

If we use MLflow Tracking, we can review and audit prior executions of a model training process.

MLflow Project

An MLflow project is a way of packing up code in a manner which allows for consistent deployment and reproducibility. We can use MLflow Project via Conda, Docker, or directly on a system for installation.

MLflow Model

MLflow offers a standardized format for packaging models for distribution with MLflow Models. We can train models using a broad variety of libraries, including scikit-learn, PyTorch, Tensorflow, R (via crate), Statsmodel, and so much more. Regardless of the input method, MLflow Model creates a consistent serialized form of the model outputs that we can restore the model later regardless of which mechanism we chose to create it.

MLflow Model Registry

We can use the MLflow model registry to allow data scientists to register models. Once we have registered a model, operations staff can deploy models from the registry, either by serving them through a REST API or via a batch inference process.

Learn More about MLflow in Databricks

I helped put together some training on Databricks and the module I worked on was around its MLflow implementation. If you’re willing to piece things together, the GitHub repo is available, including lab notebooks.

2 thoughts on “The Basics of MLflow

Leave a comment