LinkedIn Open Sources Feathr for ML Feature Management

Search

Table of Contents

Why integrate LinkedIn Feathr into your stack? This tool can be really useful to automate and improve your Machine Learning (ML) Feature Management easily.

As it was explained in a previous article, Feature Engineering is “the preparation of raw data that is to be used in the machine learning model.” In this matter, the success of a model depends on the quality of the data that feeds it. Data scientists spend most of their work time on tasks related to data preparation. However, to achieve this efficiently it poses different challenges: feeding quality data, cleaning and organizing files, picking the right algorithm and model, creating the best features, generating outputs, and more.

There are various techniques of Feature Engineering to help in this venture. Combined with the right tools, it can make a huge difference when working with any ML project. Precisely, LinkedIn open-sourced Feathr and promises to be “the feature store we built to simplify machine learning (ML) feature management and improve developer productivity.”

What is Feathr?

For LinkedIn data scientists, preparing and managing features was one of the most time-consuming tasks. Additionally, this made it difficult to scale ML applications. It was because having non-common processes, frameworks, and ways to develop projects meant difficulties in sharing or reusing work.

Thinking of these limitations, the team created and released Feathr in 2017. It’s “an abstraction layer that provides a common feature namespace for defining features and a common platform for computing, serving, and accessing them “by the name” from within ML workflows”, according to LinkedIn. Feathr is also categorized as a “feature store”, a recent term “to describe systems that manage and serve ML feature data.”

A “producer” can define and register features into fear, and “consumers” can access those features and incorporate them into their ML model workflows. This allows to reduce time and resources needed, but also to standardize the way features are created. “For model training, features are computed and joined to input labels in a point-in-time correct way, and for model inferencing, features are pre-materialized and deployed to online data stores for low-latency online serving. Features defined by different teams and projects can easily be used together, enabling collaboration and reuse.”, the company explains.

This Datanami blog post adds: “Instead of manually working with features as part of an individual data pipeline, Feathr automates and standardizes the interaction with the data type, which is used in both the training and inference stages of machine learning.”

Nowadays, LinkedIn uses it for different and big ML projects, as well as deploying dozens of applications involving core functionalities, such as Search, Feed, and Ads. 

Feathr characteristics

With Feathr, you can:

  • Define features based on raw data sources.
  • Create features for different scenarios. For example: model training or model serving.
  • Use APIs based on Python.
  • Share the features with your entire team to reuse them.
  • Use features created by other team members into your pipelines and projects.
  • Connect offline data sources to transform them into features.
  • Integrate other tools, such as Databricks and Azure Synapse, in a native way.

Feathr capabilities

This article in GitHub highlights some of the main capabilities of Feathr:

  • Feathr UI “provides an intuitive UI so you can search and explore all the available features and their corresponding lineages.”
  • Rich UDF Support: “Feathr has highly customizable UDFs with native PySpark and Spark SQL integration to lower the learning curve for data scientists.”
  • Determine Window Aggregation Features with Point-in-time correctness.
  • Define features on top of other features – Derived Features.
  • Define Streaming Features.
  • Point in Time Joins.

Additionally, it shows how Cloud integrations and architecture work with Feathr:

As explained before, Feathr can be integrated with other tools, including the Azure catalog. There are a bunch of Feathr components, they are available to work with one of those tools, in different tasks and phases of your projects: Object store, streaming, governance, computing engine, credentials, and more.

An Open Source Tool

One of the main advantages of Feathr is that it’s now Open Source. Since April 2022, Feathr has been available for developers, data scientists, and any ML professional. The project is also under LF AI & Data Foundation. “We aim to support Feathr to expand its user base, grow its community of developers, become a leader within its category, and enable collaboration and integration opportunities with other projects. We look forward to the project’s continued growth and success as part of LF AI & Data”, said Dr. Ibrahim Haddad, Foundation’s General Manager.

How to install Feathr

To install Feathr client in a Python environment, use this:

pip install feathr

Or use the latest code from GitHub:

pip install

git+https://github.com/feathr-ai/feathr.git#subdirectory=feathr_project

For more information and details, visit this repository in GitHub, or watch this tutorial: Build Product Recommendation Machine Learning Model with Feathr Feature Store.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Scroll to Top