Apache Hive
Apache Hive is an open-source query engine and data warehouse framework that has an SQL-like interface called HQL (Hive Query Language). It was developed by Facebook to extend the functionality of the Hadoop ecosystem by providing an SQL extraction layer to MapReduce.Â
The problem with MapReduce is it requires programmers to write complex Java code for queries and analysis. Hive gets around this by introducing an SQL-like interface where non-technical users can work with HDFS via the popular SQL. Â
Hive does not replace MapReduce but extends its functionality – it’s more like a MapReduce extraction layer. Apache Hive is not storage but uses HDFS as storage. In fact, when it comes to storing metadata, Hive must be set up to work with an RDMS like MySQL or PostgreSQL. Hive organizes data into partitions and tables in the HDFS directories. Organizations of all sizes used Apache Hive for their data warehouse needs. Â
 Project Background
- Project:Â Apache HiveÂ
- Author: Facebook, Inc.
- Released: October 1, 2010
- Type: Open Source Project
- License: Apache License 2.0
- Language: Java
- GitHub:Â apache/hive
- Runs on: Microsoft Windows, macOS, Linux
- GitHub Stars: 4.2k
- GitHub Contributors: 300
Summary
- It can impose structure on a variety of data formats
- Enables data warehousing tasks such as extract/transform/load (ETL)
- Hive is best used for traditional data warehousing tasks.
- It can maximize scalability, performance, extensibility, fault tolerance, and loose coupling.
- Hive is a good choice for queries.
- You can use Hive to write feature-rich, fault-tolerant, batch transformation, or ETL jobs in a pluggable SQL engine.
- It enables SQL developers to write Hive Query Language statements.
