< All Topics
Print

MLflow

MLflow is an open-source machine learning lifecycle platform that manages the end-to-end process of testing and deploying models. It’s the first of its kind, open source, and it rivals Facebook FBLearner and Uber Michaelangelo.

In the early days, Facebook encountered several challenges in the ML process, from experimentation to testing and deployment. It was a very complex process that required highly skilled engineers and data scientists to conduct experiments. And when the same data was fed into the same models, the results would differ. 

FBLearner solved these problems. Today, non-data scientists can conduct thousands of experiments and reproduce the same results when the same dataset and tools are used. However, this tool belongs to Facebook. Databricks developed it as an open-source alternative that eases the process of testing, experimentation, training, and deployment.

Components

This workflow system consists of the following modules:

Tracking

It tracks the results of an ML model by logging parameters and metrics. It also provides a UI for easy visualization of those results.

Projects

It allows users to organize and pack their ML code into easily transportable projects, ensuring reproducibility on other platforms.

Models

It offers a standardized format for implementing and managing ML models in an efficient and organized way.

Model Registry

It provides a centralized storage tool for sharing and collaborating on model versions.

Together, these modules provide a streamlined and integrated experience for the machine learning life cycle, enabling users to track experiments, package code into reproducible runs, and share and reuse models.

Check out this tool for a comprehensive comparison between MLflow and leading workflow systems.

Quick Start Guide: MLflow

This section will guide you on installing and importing MLflow into your code. This tutorial uses the method for Jupyter Notebook. If you use a different IDE, refer to the official documentation. You can easily install it using the following command.

!pip install mlflow

To know the version, you need to run:

!mlflow --version

Once you have installed it, the next step is to launch the UI. To access the dashboard and compare experiments, simply run “!mlflow ui” in the command prompt, which will start the server at an URL such as “http://127.0.0.1:5000/”. A new window will appear displaying information such as Start Time, User, Source, Parameters, Metrics, and more. To view additional details, click on the experiment’s name.

!mlflow ui

Now, the goal is to perform experiments in the form of model training and tracking the runs in the UI.

Getting started with MLflow for ML lifecycle

MLflow Use Cases

Understanding specific use cases is crucial as it highlights MLflow’s practical applications and justifies the added complexity of managing it. To improve tracking machine learning experiments with MLflow, you can consider the following steps:

  • Use MLflow to keep all experiment tracking data in one place for better organization and accessibility.
  • Automate logging of experiment parameters, results, and code to make the tracking process more efficient and consistent.
  • Use MLflow’s built-in visualizations to easily compare results from different experiments and understand the relationships between various parameters and outcomes.
  • Collaborate with others: Share experiment results and insights with others in your team through MLflow’s collaboration features, such as sharing experiment runs and artifacts.
  • Integrate it with other tools in your machine learning pipeline, such as model training frameworks, data storage systems, and version control systems, to create a seamless and streamlined machine learning workflow.

Highlights

Project Background

  • Project: MLflow
  • Author: Databricks
  • Initial Release: June 2018
  • Type: ML lifecycle platform
  • License: Apache 2.0
  • Language: Python, JavaScript, TypeScript, Java, R.
  • Supports: Multiple frameworks and tools: TensorFlow, PyTorch, XGBoost, etc.
  • Runs On: Anywhere / Cloud and on-prem
  • Hardware: Supports CPUs and GPUs
  • Twitter: MLflow

Main Features

  • Track and manage experiments
  • Reproducible runs
  • Light-weight API
  • CI/CD workflow integration
  • Artifact store
  • Versioning

Prior Knowledge Requirements

  • Machine learning algorithms, statistics, and predictive models.
  • Programming languages such as Python.
  • Cloud infrastructure and databases if users want to use a workflow system in a distributed or production environment.
  • Workflow concepts, their structure, and how they can be used to automate and streamline processes.

Community Benchmarks

  • 13,500 Stars
  • 3,200 Forks
  • 539+ Code contributors
  • 61+ Releases
  • Source: MLflow GitHub

Releases

  • MLflow 2.1.1(12-26-2022): Fixes. e.g., Fix mlflow.pyfunc.spark_udf() type casting error on model with ColSpec input schema.
  • MLflow 2.1.0 (12-21-2022): Major features and improvements. e.g., Introduce support for multi-class classification.
  • MLflow 2.0.1 (11-15-2022): Major milestone release. e.g., MLflow Pipelines is now MLflow Recipes – a framework that enables data scientists to quickly develop high-quality models and deploy them to production.
  • MLflow 1.30.0 (10-20-2022): Major features and improvements. e.g.,  Introduce hyperparameter tuning support to MLflow Pipelines.
  • MLflow 1.20.0 (8-15-2021): Major features and improvements. e.g., Autologging for scikit-learn now records post training metrics when scikit-learn evaluation APIs.

References

[1] Documentation: https://mlflow.org/

Was this article helpful?
0 out of 5 stars
5 Stars 0%
4 Stars 0%
3 Stars 0%
2 Stars 0%
1 Stars 0%
5
Please Share Your Feedback
How Can We Improve This Article?
Table of Contents
Scroll to Top