In this blog we will learn how to set-up airflow environment using Docker..
Why we need Docker
Apache airflow is an open source project and grows an overwhelming pace , As we can see the airflow github repository there are 632 contributors and 98 release and more than 5000 commits and the last commit was 4 hours ago. That means airflow had new commits everyday and constant releases.
So To manage and maintains different version of airflow is already a challenge.
So Airflow is build to integrate with all databases, system, cloud environments,…
- Managing and maintaining all of the dependencies changes will be really difficult.
- Takes lots of time to set up, and config Airflow environment.
- How to share development and production environments for all developers.
Therefor,If you miss one installation steps then you have to clear everything and start over again , So with all the challenges in mind that all those problem gives us motivation to use Docker.
Docker :
In simple words, Docker is a software containerization platform, meaning you can build your application, package them along with their dependencies into a container and then these containers can be shipped to run on other machines.
Okay, but what is Containerization anyway?
Containerization, also called container-based virtualization and application containerization — is an OS-level virtualization method for deploying and running distributed applications without launching an entire VM for each application. Instead, many isolated systems, called containers, on a single control host and access a single kernel.
Benefits of using Docker :
- Docker is freeing us from the task of managing, maintaining all of the Airflow dependencies, and deployment.
- Easy to share and deploy different versions and environments.
- Keep track through Github tags and releases.
- Ease of deployment from testing to production environment.
Getting Started-
- Install the prerequisites
- Run the service
- Check http://localhost:8080
- Done!
Prerequisite
- Install Docker
- Install Docker Compose v1.29.1 and newer on your workstation.
So for deploy Airflow on Docker Compose, you should fetch Docker-compose.yaml.
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.2.3/docker-compose.yaml'
Here we can see we got the docker-compose.yaml
Setting the right Airflow user
On Linux, the quick-start needs to know your host user id and needs to have group id set to 0
. Otherwise the files created in dags
, logs
and plugins
will be created with root
user. You have to make sure to configure them for the docker-compose:
mkdir -p ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)" > .env
Initialize the Database
On all operating systems, you need to run database migrations and create the first user account. To do it, run.
docker-compose up airflow-init
After initialization is complete, you should see a message like below.
Running Airflow:
Now you can start all the services:
docker-compose up
And Now go to the address http://localhost:8080 and you will be presented with the below screen
login with your credential, and if you are login for the first time :
Id : airflow
Password : airflow
And, If you see the above image when accessing the localhost (8080 Port) that means Airflow has been installed on your system.
That’s it, folks. I hope you liked the blogs. Thanks.
References
https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html
To read more tech blogs, visit Knoldus Blogs.