In this article, we’ll see how to write a basic “Hello World” DAG in Apache Airflow. We will go through all the files that we have to create in Apache Airflow to successfully write and execute our first DAG.
Create a Python file
Firstly, we will create a python file inside the “airflow/dags” directory. Since we are creating a basic Hello World script, we will keep the file name simple and name it “HelloWorld_dag.py“. Keep in mind if this is your first time writing a DAG in Airflow, then we will have to create the “dags” folder.
Importing important modules
To create a properly functional pipeline in airflow, we need to import the “DAG” python module and the “Operator” python module in our code. We can also import the “datetime” module.
Create a DAG Object
In this step, we will create a DAG object that will nest the tasks in the pipeline. We send a “dag id”, which is the dag’s unique identifier.
As a best practice, it is advised to keep the “dag_id” and the name of the python file as the same. Therefore, we will keep the “dag_id” as “HelloWorld_dag“.
Now we will define a “start_date” parameter, this is the point from where the scheduler will start filling in the dates.
For the Apache Airflow scheduler, we also have to specify the interval in which it will execute the DAG. We define the interval in “corn expression“. Apache Airflow has some pre-defined cron expressions such as “@yearly“, “@hourly“, and “@daily“. For this example, we will be going with “@hourly“.
Once the scheduler starts filling in the dates from the specified “start_date” parameter on an “hourly” basis and it will keep filling in the date till it reaches the current hour. This is called a “catchup“. We can turn off this “catchup” by keeping its parameter value as “False”.
Create a Task
Now we will define a PythonOperator. A PythonOperator is used to invoke a Python function from within your DAG. We will create a function that will return “Hello World” when it is invoked.
Like an object has “dag_id“, similarly a task has a “task_id“.
It also has a “python callable” parameter, which takes as input the name of the function to be called.
Creating a callable function
Now we will create a callable function which will be called by the “PythonOperator”.
Setting Dependecies in DAG
We don’t need to indicate the flow because we only have one task here; we can just write the task name. But if we had multiple tasks that we wanted to execute, we can set their dependencies by using the following operators “>>” or “<<“ respectively.
Our complete DAG file should like this
To run our DAG file
To execute our DAG file, we need to start Apache Airflow and Airflow scheduler. We can do that using the following commands:
1) airflow webserver -p 8081 2) airflow scheduler 3) http://localhost:8081/
We will be able to see our DAG running in Airflow Web UI once we log in to the terminal successfully.
In this blog, We saw how to write our first DAG and execute it. We saw how to instantiate a DAG object and Create a task and a callable function.
Stay tuned for more blogs on: https://blog.knoldus.com/