Apache Delta Lake

Apache Delta Lake is an open-source storage framework that brings ACID transaction support and schema enforcement to Apache Spark-driven data lakes. It allows users to build a lakehouse architecture that works with structured, semi-structured, and unstructured data. Thus, data integrity is maintained while users are reading and writing data to storage systems like HDFS. 

In addition, Delta Lake supports 1) batch and stream processing 2) delta sharing to secure data sharing with others 3) strong isolation levels, and 4) audits, rollbacks, snapshots to name a few. Another benefit, it works with products like PrestoDB, Flink, Trino, and Hive.

Project Background

  • Platform: Apache Delta Lake
  • Author: Databricks
  • Released: N/A
  • Type: Open-source
  • License: Apache License 2.0
  • Language: Java, Python, Ruby, NodeJS
  • GitHub: delta-io/delta
  • Runs on: Microsoft Windows, macOS, Linux
  • GitHub Stars: 4.2k
  • GitHub Contributors: 145

Applications

  • Delta sharing
  • ACID transactions
  • Scalable metadata handling
  • Batch data processing

Summary

  • Developers can use Delta Lake with existing data pipelines as it is fully compatible with Spark.
  • It commonly uses a big data processing engine.
  • It brings ACID transactions to your data lakes and provides serializability, the strongest level of isolation level.
  • Provides snapshots of data enabling developers to access and revert to earlier versions of data for audits. 
Scroll to Top