Lake and Warehouse = Lakehouse
The open-source ecosystem for data lake and data warehouse technology continues to grow, as startups and established companies develop innovative software stacks for a variety of use cases. This growth is driven by data growing exponentially, where it is accumulated from images, audio files, websites, emails, videos, media content, social media, transactions, and so on. Some experts estimate that 80% of the data within an organization is unstructured.
Data Warehouse tools are ideal for working with structured data. Data Lakes are designed to work with unstructured data. And the Lakehouse brings data warehouse capabilities to data lakes where users get the best of both worlds.
The most popular big data framework is Hadoop. Although Hadoop has been around for a while, it’s still widely a core building block of big data infrastructure for many large organizations. The Hadoop ecosystem is comprised of three core features (HDFS, MapReduce, and YARN) and dozens of supporting tools like Hive, Pig, Impala, Kudu, and so on.