If you are wondering how to start working with Apache Airflow for small development or academic purposes here you will learn how to. Well deploying Airflow on GCP Compute Engine (self-managed deployment) could cost less than you think with all the advantages of using its services like BigQuery or Dataflow.
Table of Content
- What is apache airflow
- cloud composer overview
- Google cloud composer benefit
- Composer environment
- setup Cloud Composer environment
Apache Airflow is an open-source tool to programmatically author, schedule, and monitor workflows. Airflow is an ETL(Extract, Transform, Load) workflow orchestration tool, used in data transformation pipelines. You can easily visualize your data pipelines’ dependencies, progress, logs, code, trigger tasks, and success status.
With Airflow, users can author workflows as Directed Acyclic Graphs (DAGs) of tasks. Airflow’s rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed. It connects with multiple data sources and can send an alert via email or Slack when a task completes or fails. Airflow is distributed, scalable, and flexible, making it well suited to handle the orchestration of complex business logic.
Cloud Composer Overview
Cloud Composer is managed Apache Airflow service that helps you to create, schedule, monitor, and manage workflows.
Cloud Composer is built on the popular Apache Airflow open source project and operates using the Python programming language.
As part of Google Cloud Platform, Cloud Composer integrates with tools such as BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub and Cloud ML Engine, giving users the ability to orchestrate end-to-end GCP workloads.
By using Cloud Composer instead of a local instance of Apache Airflow, you can benefit from the best of Airflow with no installation or management overhead. And also helps you create Airflow environments quickly and use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, so you can focus on your workflows and not your infrastructure.
Google Cloud Composer benefit
- The nice thing about google cloud composer is that you just focus on your workflows (writing code), and let composer manage the infrastructure.
- Cloud composer provides easy access to the Airflow web user interface. with just one click You can create a new Airflow environment.
- Integrate with all of cloud services: Big Data, Machine learning etc.
- Is built on an open source orchestration tool that allows for frequent updates and upgrades.
Installing airflow in GCC is much simpler than installing it natively or using it via docker, which gave us permission and volume headaches.
If you have a google account, it’s really just a few clicks away.
Cloud Composer supports both Airflow 1 and Airflow 2.
To run workflows, you first need to create an environment. Airflow depends on many micro-services to run, so Cloud Composer provisions Google Cloud components to run your workflows. These components are collectively known as a Cloud Composer environment.
Cloud Composer environments are based on Cloud Composer images. When you create an environment, you can select an image with a specific Airflow version.
Environments are self-contained Airflow deployments based on Google Kubernetes Engine. They work with other Google Cloud services using connectors built into Airflow. You can create multiple environment within a project and each environment is a different cluster within multiple nodes, so they are perfectly isolated from each other.
Set up Google Cloud Composer environment Airflow
Step 1. Basic setup
- In the Google Cloud Console, search composer envoironment and click on create environment .
- In the name field, enter a name for your environment.
Step 2. Node Configuration
- Enter the Node count.
- Choose Machine type for nodes.
- Enter the Disk size
- Choose the Number of schedulers
Step 3. Network configuration
Networking parameters depend on the type of environment that you want to create
Step 4. Web server configuration
The Airflow web server access parameters do not depend on the type of your environment. Instead, you can configure web server access separately. For example, a Private IP environment can still have the Airflow UI accessible from the internet.
step 5. Maintenance windows
This feature gives you fine-grained control over when automatic maintenance can occur on your Composer environment. You can configure a maintenance window for a new or existing environment. If not specified, maintenance time is selected automatically without considering the schedule of your DAG runs.
step 6. Airflow configuration overrides and Environmental variables
You can set up Airflow configuration overrides and environment variables when you create an environment. As an alternative, you can do it later, after your environment is created.
You can assign labels to your environments to break down billing costs based on these labels.
step 7. Data encryption
By default, data in your environment is encrypted with a key provided by Google.
To use customer-managed encryption keys (CMEK) to encrypt data in your environment, follow the instructions outlined in Using customer-managed encryption keys.
step 8. Beta API
Composer Beta API provides a set of functionality that is under preview and is not covered by any SLA and deprecation policy.
Wait for 10 min, You will have a complete composer environment up and running.
When your environment is up and running, the Google Cloud UI is clean and hassle-free: it just links to the DAG folder and to your Airflow webserver, which is where you’ll be spending most of your time.
A complete Composer environment
if you have any questions related to this article do let me know in the comment section.