Apache Airflow is an open-source workflow management platform for building the data pipelines. Airflow uses directed acyclic graphs (DAGs) to manage workflow orchestration. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. In this article, we will discuss the procedures for running Apache Airflow in Docker container (Community Edition).
- Install the prerequisites
- Run the service
- Check http://localhost:8080
To run airflow in docker, prerequisites must be met, namely:
- Docker Community Edition (CE). If we don’t have docker installed on the system yet, we have to install it first. We can follow the article about Docker CE installation.
- Docker Compose v1.29.1 and newer on our workstation. We can follow the article about install Docker Compose.
Older version of docker-compose do not support all the features required by docker-compose.yaml file, so double check that your version meets the minimum version requirements.
To deploy Airflow on Docker Compose,you should fetch docker-compose.yaml
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.2.0/docker-compose.yaml'
Before starting Airflow for the first time, You need to prepare your environment, i.e. create the necessary files, directories and initialize the database.
1. Setting the right Airflow user
On Linux, the mounted volumes in container use the native Linux filesystem user/group permissions, so you have to make sure the container and host computer have matching file permissions.
2. Initialize the database
On all operating systems, you need to run database migrations and create the first user account. To do it, run.
docker-compose up airflow-init
To check if all services are running or not, type the following command to see all the running containers.
Now you can start all services:
Now go to the address http://localhost:8080 and you will be presented with the below screen
Read Apache Airflow Documentation for more knowledge.
To read more tech blogs, visit Knoldus Blogs.