Apache Impala

Apache Impala is the open-source SQL query engine designed to work in the Hadoop ecosystem. It was developed by Marcel Kornacker while working at Cloudera. Impala provides users with a low-latency and high-concurrency SQL query environment that is able to interact with data stored in HDFS.  

Project Background

  • Platform: Apache Impala
  • Author: Cloudera
  • Released: April 2013
  • Type: Relational Hadoop-analytics
  • License: Apache License 2.0
  • Language: C++, ‎Java
  • GitHub: apache/impala
  • Runs on: Cross-platform
  • GitHub Stars: 770
  • GitHub Contributors: 165

Applications

  • Manage multiple data pipelines
  • Extendable Model
  • Alerting system via mail or slack
  • Simple interface log for each task
  • Pipelines are defined in Python
  • Uses DAGs for setting and managing various workflows
  • Top-notch security
  • Parallel processing database technology
  • Works with Kerberos and LDAP
Scroll to Top