Apache Samza

Apache Samza is an open-source, distributed stream processing framework that allows users to build stateful applications. It can process data in real-time from multiple sources, including Apache Kafka. Also, the platform supports a rich ecosystem of products like YARN. Samza was designed to run at very low latency, and it can analyze data in real-time. It integrates with Kafka, HDFS, AWS Kinesis, and ElasticSearch.

Project Background

  • Framework: Apache Samza
  • Author: LinkedIn
  • Released: June 11, 2019
  • Type: Distributed stream processing framework
  • License: Apache License 2.0
  • Language: Scala, Java
  • GitHub: apache/samza
  • Runs on: Cross-platform
  • GitHub Stars: 728
  • GitHub Contributors: 135

Applications

  • Supports snapshots
  • Can restore a stream processing state
  • Easily migrate tasks to several machines
  • Processor isolation and efficient resource management

Summary

  • It provides extremely low latencies and high throughput to analyze the data instantly.
  • Scales to several terabytes with features like incremental checkpoints and host-affinity.
  • It can be operated with YARN, Kubernetes, and standalone.
  • Integrates with various sources like Kafka, HDFS, AWS Kinesis, Azure Eventhubs, and more.
Scroll to Top