Apache Airflow

Apache Airflow is an open-source tool that programmatically authors, schedules, and monitors workflows. It was created by Airbnb in 2014 and became an Apache Incubator project in March 2016 and Top-Level Apache Software Foundation project in January 2019. It is a popular platform among data engineers for managing workflows and pipelines. The workflows are represented as Directed Acyclic Graphs and it lets users view pipeline workflows, troubleshoot problems, and evaluate data from numerous sources.

Project Background

  • Platform: Apache Airflow 
  • Author:  Airbnb
  • Released: October 2014
  • Type: Data engineering and workflow tool
  • License: Apache Licence 2.0
  • Language: Python 
  • GitHub: apache/airflow
  • Runs on: Microsoft Windows, macOS, Linux

Applications

  • Manage multiple data pipelines
  • Extendable Model
  • Alerting system via mail or slack
  • Simple interface log for each task
  • Pipelines are defined in Python
  • Uses DAGs for setting and managing various workflows
  • Top-notch security

Summary

  • It’s scalable and uses message queues for orchestrating an arbitrary number of workers.
  • Supports Python and Kubernetes versions.
  • You can define operators and extend the libraries according to the task.
Scroll to Top