Beam is an open source tool used for defining and executing batch and streaming data pipelines. It supports Spark, Flink, and Samza data processing engines. The Beam pipeline is a DAG that contains data such as computation and tasks.
- Tool: Apache Beam
- Author: Google
- Released: June 2016
- Type: Open source tool for monitoring workflow
- License: Apache v2 license
- Support: Apache Flink, Apache Spark, and Google Cloud Dataflow
- GitHub: apache/beam
- Batch-data parallel-processing pipelines
- Streaming-data parallel-processing pipelines
- Large-scale data
- Distributed processing
- Parallel processing
- Use single programming for both batch and stream use cases.
- It can be executed on different runners.
- Choose your language – Python, Java, or Golang.
- The process includes three major steps – Pipeline, PCollection, and PTransform.
- Explore and share new SDKs as there are ten connectors and libraries available.