LakeFS is an open-source storage layer that brings a Git-like version control interface and features to object storage and data stores. Treeverse, the company behind LakeFS describes it as “bringing source control to the world of big data.” It works with structured and unstructured data.
Object storage is known for its performance and low-cost benefits. However, Treeverse explains there are some functionality gaps in object storage such as 1) identifying and fixing data errors instantly 2) developing new data assets in isolation 3) reproducing jobs and pipelines easily and 4) updating datasets atomically. LakeFs solves these issues with ease. Features include version control, testing in isolation, rollbacks, implementing parallel pipelines for experimentation, and more.
- Project: LakeFS
- Author: Treeverse
- Released: 2020
- Type: Provides Git-like features to object storage
- License: Apache License 2.0
- Language: Go
- GitHub: treeverse/lakeFS
- Runs on: Multi-platform
- GitHub Stars: 2.3k
- GitHub Contributors: 58
- Full reproducibility of data and code
- Pre-commit/merge hooks for data CI/CD
- Instantly revert changes to data
- Petabytes scale version control
- Offers git-like operations
- Zero copy branching for frictionless experiments
- LakeFS commit represents a set of contiguous, non-overlapping SSTables. These tables make up the entire keyspace of a repository.
- This metadata is also encoded into a format known as “Graveler” – a standardized way to encode the content.