How to Set-up Airflow Environment Using Docker

Reading Time: 3 minutes

In this blog we will learn how to set-up airflow environment using Docker..

Why we need Docker

Apache airflow is an open source project and grows an overwhelming pace , As we can see the airflow github repository there are 632 contributors and 98 release and more than 5000 commits and the last commit was 4 hours ago. That means airflow had new commits everyday and constant releases.

So To manage and maintains different version of airflow is already a challenge.

So Airflow is build to integrate with all databases, system, cloud environments,…

  • Managing and maintaining all of the dependencies changes will be really difficult.
  • Takes lots of time to set up, and config Airflow environment.
  • How to share development and production environments for all developers.

Therefor,If you miss one installation steps then you have to clear everything and start over again , So with all the challenges in mind that all those problem gives us motivation to use Docker.

Docker :

In simple words, Docker is a software containerization platform, meaning you can build your application, package them along with their dependencies into a container and then these containers can be shipped to run on other machines.

Okay, but what is Containerization anyway?

Containerization, also called container-based virtualization and application containerization — is an OS-level virtualization method for deploying and running distributed applications without launching an entire VM for each application. Instead, many isolated systems, called containers, on a single control host and access a single kernel.

Benefits of using Docker :

  • Docker is freeing us from the task of managing, maintaining all of the Airflow dependencies, and deployment.
  • Easy to share and deploy different versions and environments.
  • Keep track through Github tags and releases.
  • Ease of deployment from testing to production environment.

Getting Started-

Prerequisite

So for deploy Airflow on Docker Compose, you should fetch Docker-compose.yaml.

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.2.3/docker-compose.yaml'

Here we can see we got the docker-compose.yaml

Setting the right Airflow user

On Linux, the quick-start needs to know your host user id and needs to have group id set to 0. Otherwise the files created in dagslogs and plugins will be created with root user. You have to make sure to configure them for the docker-compose:

mkdir -p ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)" > .env

Initialize the Database

On all operating systems, you need to run database migrations and create the first user account. To do it, run.

docker-compose up airflow-init

After initialization is complete, you should see a message like below.

Running Airflow:

Now you can start all the services:

docker-compose up

And Now go to the address http://localhost:8080 and you will be presented with the below screen

login with your credential, and if you are login for the first time :

Id : airflow

Password : airflow

And, If you see the above image when accessing the localhost (8080 Port) that means Airflow has been installed on your system.

That’s it, folks. I hope you liked the blogs. Thanks.

References

https://airflow.apache.org/

https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html

To read more tech blogs, visit Knoldus Blogs.

Written by 

Gaurav srivastav is a Software Consultant working in java domain.