Menu
Apache Beam
Beam is an open source tool used for defining and executing batch and streaming data pipelines. It supports Spark, Flink, and Samza data processing engines. The Beam pipeline is a DAG that contains data such as computation and tasks.Â
Project Background
- Tool:Â Apache BeamÂ
- Author: Google
- Released: June 2016
- Type: Open source tool for monitoring workflow
- License: Apache v2 license
- Support: Apache Flink, Apache Spark, and Google Cloud Dataflow
- GitHub:Â apache/beam
Applications
- Batch-data parallel-processing pipelines
- Streaming-data parallel-processing pipelines
- Large-scale data
- Distributed processing
- Parallel processing
- AbstractionÂ
Summary
- Use single programming for both batch and stream use cases.
- It can be executed on different runners.
- Choose your language – Python, Java, or Golang.
- The process includes three major steps – Pipeline, PCollection, and PTransform.
- Explore and share new SDKs as there are ten connectors and libraries available.