Search Results for: kubernetes

Running Apache Airflow DAG with Docker

Reading Time: 3 minutes In this blog, we are going to run the sample dynamic DAG using docker. Before that, let’s get a quick idea about the airflow and some of its terms. What is Airflow? Airflow is a workflow engine which is responsible for managing and scheduling running jobs and data pipelines. It ensures that the jobs are ordered correctly based on dependencies and also manages the allocation Continue Reading

Introduction to a Modern Reverse-Proxy: Traefik

Reading Time: 3 minutes Traefik is an open source API gateway, written in Golang and was developed in a Unix-centric way. It is designed to simplify the complexity regarding microservices operations. Traefik performs auto-configuration of services, which means that the developer only needs to worry about developing and deploying applications. Traefik auto-configures with sensible defaults and sends a request to the said service. With changing requirements and needs of Continue Reading

Introduction to Apache Airflow

Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule and monitor workflows.These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. The main functions of Apache Airflow is to schedule workflow, monitor Continue Reading

Apache Pulsar: A Quick Overview

Reading Time: 3 minutes What is Apache Pulsar? Yahoo developed Pulsar and it is now open source under the Apache License. Apache Pulsar is a distributed messaging system that is based on the publisher and subscriber model, and unlike other pub-sub models, apache pulsar decouples producers from consumers. Pulsar is the middleware that accepts information from producers and consumers then source that data from the pulsar. Why Apache Pulsar? Continue Reading

How To Deploy Fargate Using AWS CDK

Reading Time: 5 minutes Introduction What is AWS Fargate ? In this blog we’ll see How To Deploy Fargate Using AWS CDK. As per the aws official documentation ” AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building applications without managing servers. “ In simple words Fargate allows you to run container on ECS In aws without managing the servers.Using this makes it easier to Continue Reading

Apache Beam Vs Apache Spark

Reading Time: 4 minutes Before going through the comparison of  Apache Beam and Apache Spark, we should have a glimpse of what these two exactly are. Apache Beam means a unified programming model. It implements batch and streaming data processing jobs that run on any execution engine. It executes pipelines in multiple execution environments. Apache Spark defines as a fast and general engine for large-scale data processing. Spark is a fast Continue Reading

What, Why, and How Cloudstate?

Reading Time: 4 minutes This article talks about the Lightbend’s Cloudstate which is used for serverless computing.

Securing Your Containers with Encryption of Containerized Data

Reading Time: 6 minutes Most of the business applications today are enabled by the cloud with a lot of them residing as containerized workloads. Digital transformation is being powered by concepts encompassing containers, Kubernetes, and microservices and has become indispensable parts of how applications are developed & deployed.  If we take containers particularly in consideration, they are modernizing applications like never before and helping in creating scalable & agile Continue Reading

Airflow on Google Cloud Composer

Reading Time: 4 minutes If you are wondering how to start working with Apache Airflow for small development or academic purposes here you will learn how to. Well deploying Airflow on GCP Compute Engine (self-managed deployment) could cost less than you think with all the advantages of using its services like BigQuery or Dataflow. Table of Content What is apache airflow cloud composer overview Google cloud composer benefit Composer Continue Reading

Airbyte OSS Metrics in Prometheus

Reading Time: 4 minutes Airbyte is a fast-growing ELT tool that helps acquire data from multiple sources. Particularly useful in building data lakes. Airbyte offers pre-built connectors to over 300 sources and 10s of destinations and also allows custom connectors to be built quickly using language SDKs. Airbyte recently released Opentelemetry-based metrics, however, the documentation has been spotty and incomplete. You can check it out here. In this blog, Continue Reading

Core Concepts of Apache Airflow

Reading Time: 4 minutes In this blog we will go over the core concepts basic you must understand if you want to use Apache airflow. In this article, you will learn: What is Airflow Architecture Overview Dag Task Operator Dag Run Execution Date Airflow Airflow was started in October 2014 and developed by Maxime Beauchemin at Airbnb. It is a platform for programmatically authoring, scheduling, and monitoring workflows. It Continue Reading

Apache Kafka for beginners

Reading Time: 4 minutes Introduction One of the biggest challenges associated with big data is, analyzing the data. But before we get to that part, the data has to be first collected, and also for a system to process impeccably it should be able to grasp and make the data available to users. This is where Apache Kafka comes in handy. Let’s briefly understand how Kafka came into existence? Continue Reading

Apache Airflow – A Workflow Manager

Reading Time: 4 minutes As the industry is becoming more data driven, we need to look for a couple of solutions that would be able to process a large amount of data that is required. A workflow management system provides an infrastructure for the set-up, performance and monitoring of a defined sequence of tasks, arranged as a workflow application. Workflow management has become such a common need that most Continue Reading