Menu
Apache Spark
Apache Spark is a processing engine that works with large-scale data demands. The platform provides high-level APIs in Java, Python, Scala, and R. Also, it supports general computation graphs required for data analysis. Nearly 80% of Fortune 500 companies around the globe use Apache Spark for scalable computing. It has more than 2,000 contributors from industry and academia.
Project Background
- Framework: Apache Spark
- Author: Matei Zaharia
- Released: May 26, 2014
- Type: Open-source unified analytics engine for large-scale data processing
- License: Apache License 2.0
- Supports: Spark Core, Spark SQL, Spark Streaming, MLlib Machine Learning Library, and GraphX
- Language: Scala, Java, SQL, Python, R, C#, F#
- GitHub: apache/spark
Applications
- Big data
- Machine learning
Summary
- Unifies batch/streaming data using your preferred programming language.
- Helps with Exploratory Data Analysis (EDA) on petabyte-scale data. No downsampling!
- Fast execution of distributed ANSI SQL queries
- Provides training to machine learning algorithms
- Supports a pseudo-distributed local mode for development or testing
- Supports standalone Hadoop YARN, Apache Mesos, or Kubernetes