Big Data and Fast Data

Stateful Streaming in Spark

Reading Time: 4 minutes Apache Spark is a fast and general-purpose cluster computing system. In Spark, we can do the batch processing and stream processing as well. It does near real-time processing. It means that it processes the data in micro-batches. I have discussed more Spark Streaming in my previous blog. Now in this blog, I’ll discuss Stateful Streaming in Spark. So let’s start !! What is Stateful Streaming? Continue Reading

Spark Structured Streaming (Part 2) – The Internals

Reading Time: 2 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Introduction to Structured Streaming“. So I’ll exactly start from the point where I left in the previous blog. Structure of Streaming Query When we call start() API, Spark internally translates this code into a Logical Plan (an abstract representation of what the code does), then Continue Reading

Spark Structured Streaming (Part 1) – Introduction

Reading Time: 5 minutes In this Spark Structured Streaming series of blogs, we will have a deep look into what structured streaming is in a very layman language. So let’s get started. Introduction Structured streaming is a stream processing engine built on top of the Spark SQL engine and uses the Spark SQL APIs. It is fast, scalable and fault-tolerant. It provides rich, unified and high-level APIs in the Continue Reading

Running Apache Airflow DAG with Docker

Reading Time: 3 minutes In this blog, we are going to run the sample dynamic DAG using docker. Before that, let’s get a quick idea about the airflow and some of its terms. What is Airflow? Airflow is a workflow engine which is responsible for managing and scheduling running jobs and data pipelines. It ensures that the jobs are ordered correctly based on dependencies and also manages the allocation Continue Reading

KSnow: Know about Cloning in Snowflake

Reading Time: 2 minutes This blog pertains to Cloning feature in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Zero Copy Clone Cloning also Snowflake as Zero Copy Clone in Snowflake. It used to create a copy of a Table or Schema or a Database. In most database, in order to make a copy Continue Reading

integrating Cucumber with Akka-Http

Akka Cluster in use (Part 9): Effectively Resolving Split Brain Problem

Reading Time: 7 minutes We can deal with cluster failures manually, as mentioned in our blog post Manually Healing an Akka Cluster. However, it requires a DevOps engineer to be available 24 x 7. This process is very expensive and can be very frustrating (sometimes). Hence, we really need a way via which we can resolve cluster failures in an automated fashion. This is where a Split Brain Resolver Continue Reading

KSnow: Time Travel and Fail-safe in Snowflake

Reading Time: 5 minutes This blog pertains to Time Travel and Fail-safe in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Introduction to Time Travel Snowflake allows accessing historical data of a point in the past that may have been modified or deleted at the current time. Using time travel functionality a number of Continue Reading

Wondering how to upgrade your skills in the pandemic? Here’s a simple way you can do it.

Reading Time: 3 minutes Corona Virus Pandemic has brought the world to a standstill. Countries are on a major lockdown. Schools, colleges, theatres, gym, clubs, and all other public places are shut down, the country’s economy is suffering, human health is on stake, people are losing their jobs and nobody knows how worse it can get. Since most of the places are on lockdown, and you are working from Continue Reading

integrating Cucumber with Akka-Http

Akka Cluster in use (Part 8): What is a Split Brain?

Reading Time: 5 minutes In our previous blog post, Manually Healing an Akka Cluster, we have already seen that if we do not handle the failures in an Akka Cluster carefully, then it can lead to disastrous situations like Split Brain. Hence, in this blog post we will learn more about the consequences of the Split Brain problem. What is Split Brain? Split Brain is a destructive condition of Continue Reading