MLflow

PostedSeptember 25, 2022

UpdatedFebruary 6, 2023

ByErnie

0 out of 5 stars

5 Stars		0%
4 Stars		0%
3 Stars		0%
2 Stars		0%
1 Stars		0%

MLflow is an open-source machine learning lifecycle platform that manages the end-to-end process of testing and deploying models. It’s the first of its kind, open source, and it rivals Facebook FBLearner and Uber Michaelangelo.

In the early days, Facebook encountered several challenges in the ML process, from experimentation to testing and deployment. It was a very complex process that required highly skilled engineers and data scientists to conduct experiments. And when the same data was fed into the same models, the results would differ.

FBLearner solved these problems. Today, non-data scientists can conduct thousands of experiments and reproduce the same results when the same dataset and tools are used. However, this tool belongs to Facebook. Databricks developed it as an open-source alternative that eases the process of testing, experimentation, training, and deployment.

Components

This workflow system consists of the following modules:

Tracking

It tracks the results of an ML model by logging parameters and metrics. It also provides a UI for easy visualization of those results.

Projects

It allows users to organize and pack their ML code into easily transportable projects, ensuring reproducibility on other platforms.

Models

It offers a standardized format for implementing and managing ML models in an efficient and organized way.

Model Registry

It provides a centralized storage tool for sharing and collaborating on model versions.

Together, these modules provide a streamlined and integrated experience for the machine learning life cycle, enabling users to track experiments, package code into reproducible runs, and share and reuse models.

Check out this tool for a comprehensive comparison between MLflow and leading workflow systems.

Quick Start Guide: MLflow

This section will guide you on installing and importing MLflow into your code. This tutorial uses the method for Jupyter Notebook. If you use a different IDE, refer to the official documentation. You can easily install it using the following command.

!pip install mlflow

To know the version, you need to run:

!mlflow --version

Once you have installed it, the next step is to launch the UI. To access the dashboard and compare experiments, simply run “!mlflow ui” in the command prompt, which will start the server at an URL such as “http://127.0.0.1:5000/”. A new window will appear displaying information such as Start Time, User, Source, Parameters, Metrics, and more. To view additional details, click on the experiment’s name.

!mlflow ui

Now, the goal is to perform experiments in the form of model training and tracking the runs in the UI.

Getting started with MLflow for ML lifecycle

MLflow Use Cases

Understanding specific use cases is crucial as it highlights MLflow’s practical applications and justifies the added complexity of managing it. To improve tracking machine learning experiments with MLflow, you can consider the following steps:

Use MLflow to keep all experiment tracking data in one place for better organization and accessibility.
Automate logging of experiment parameters, results, and code to make the tracking process more efficient and consistent.
Use MLflow’s built-in visualizations to easily compare results from different experiments and understand the relationships between various parameters and outcomes.
Collaborate with others: Share experiment results and insights with others in your team through MLflow’s collaboration features, such as sharing experiment runs and artifacts.
Integrate it with other tools in your machine learning pipeline, such as model training frameworks, data storage systems, and version control systems, to create a seamless and streamlined machine learning workflow.

Highlights

Project Background

Project: MLflow
Author: Databricks
Initial Release: June 2018
Type: ML lifecycle platform
License: Apache 2.0
Language: Python, JavaScript, TypeScript, Java, R.
Supports: Multiple frameworks and tools: TensorFlow, PyTorch, XGBoost, etc.
Runs On: Anywhere / Cloud and on-prem
Hardware: Supports CPUs and GPUs
Twitter: MLflow

Main Features

Track and manage experiments
Reproducible runs
Light-weight API
CI/CD workflow integration
Artifact store
Versioning

Prior Knowledge Requirements

Machine learning algorithms, statistics, and predictive models.
Programming languages such as Python.
Cloud infrastructure and databases if users want to use a workflow system in a distributed or production environment.
Workflow concepts, their structure, and how they can be used to automate and streamline processes.

Community Benchmarks

13,500 Stars
3,200 Forks
539+ Code contributors
61+ Releases
Source: MLflow GitHub

Releases

MLflow 2.1.1(12-26-2022): Fixes. e.g., Fix mlflow.pyfunc.spark_udf() type casting error on model with ColSpec input schema.
MLflow 2.1.0 (12-21-2022): Major features and improvements. e.g., Introduce support for multi-class classification.
MLflow 2.0.1 (11-15-2022): Major milestone release. e.g., MLflow Pipelines is now MLflow Recipes – a framework that enables data scientists to quickly develop high-quality models and deploy them to production.
MLflow 1.30.0 (10-20-2022): Major features and improvements. e.g., Introduce hyperparameter tuning support to MLflow Pipelines.
MLflow 1.20.0 (8-15-2021): Major features and improvements. e.g., Autologging for scikit-learn now records post training metrics when scikit-learn evaluation APIs.

References

^[1]Documentation: https://mlflow.org/

Was this article helpful?

0 out of 5 stars

5 Stars		0%
4 Stars		0%
3 Stars		0%
2 Stars		0%
1 Stars		0%

Tags:

Machine Learning

AutoML

Tools

Frameworks

LLM

NLP

Data Infrastructure

Stream Processing

Data Processing

Workflows

Data Stores

Data Lakes

Hadoop Ecosystem

File Systems

Compilers

GPU & CPU

Kernel

Python Tools

Tools

MLflow

0 out of 5 stars

Components

Quick Start Guide: MLflow

MLflow Use Cases

Highlights

Project Background

Main Features

Prior Knowledge Requirements

Community Benchmarks

Releases

References

0 out of 5 stars

Please Share Your Feedback

How Can We Improve This Article?