Apache Samza is an open-source, distributed stream processing framework that allows users to build stateful applications. It can process data in real-time from multiple sources, including Apache Kafka. Also, the platform supports a rich ecosystem of products like YARN. Samza was designed to run at very low latency, and it can analyze data in real-time. It integrates with Kafka, HDFS, AWS Kinesis, and ElasticSearch.
- Framework: Apache Samza
- Author: LinkedIn
- Released: June 11, 2019
- Type: Distributed stream processing framework
- License: Apache License 2.0
- Language: Scala, Java
- GitHub: apache/samza
- Runs on: Cross-platform
- GitHub Stars: 728
- GitHub Contributors: 135
- Supports snapshots
- Can restore a stream processing state
- Easily migrate tasks to several machines
- Processor isolation and efficient resource management
- It provides extremely low latencies and high throughput to analyze the data instantly.
- Scales to several terabytes with features like incremental checkpoints and host-affinity.
- It can be operated with YARN, Kubernetes, and standalone.
- Integrates with various sources like Kafka, HDFS, AWS Kinesis, Azure Eventhubs, and more.