Apache Delta Lake
Apache Delta Lake is an open-source storage framework that brings ACID transaction support and schema enforcement to Apache Spark-driven data lakes. It allows users to build a lakehouse architecture that works with structured, semi-structured, and unstructured data. Thus, data integrity is maintained while users are reading and writing data to storage systems like HDFS.
In addition, Delta Lake supports 1) batch and stream processing 2) delta sharing to secure data sharing with others 3) strong isolation levels, and 4) audits, rollbacks, snapshots to name a few. Another benefit, it works with products like PrestoDB, Flink, Trino, and Hive.
- Platform: Apache Delta Lake
- Author: Databricks
- Released: N/A
- Type: Open-source
- License: Apache License 2.0
- Language: Java, Python, Ruby, NodeJS
- GitHub: delta-io/delta
- Runs on: Microsoft Windows, macOS, Linux
- GitHub Stars: 4.2k
- GitHub Contributors: 145
- Delta sharing
- ACID transactions
- Scalable metadata handling
- Batch data processing
- Developers can use Delta Lake with existing data pipelines as it is fully compatible with Spark.
- It commonly uses a big data processing engine.
- It brings ACID transactions to your data lakes and provides serializability, the strongest level of isolation level.
- Provides snapshots of data enabling developers to access and revert to earlier versions of data for audits.