Author: Sarfaraz Hussain

Understanding persistence in Apache Spark

Reading Time: 4 minutes In this blog, we will try to understand the concept of Persistence in Apache Spark in a very layman term with scenario-based examples. Note: The scenarios are only meant for your easy understanding. Spark Architecture Note: Cache memory can be shared between Executors. What does it mean by persisting/caching an RDD? Spark RDD persistence is an optimization technique which saves the result of RDD evaluation Continue Reading

Spark Structured Streaming (Part 4) – Handling Late Data

Reading Time: 3 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Understanding Stateful Streaming“. And this blog pertains to Handling Late Arriving Data in Spark Structured Streaming. So let’s get started. Handling Late Data With window aggregates (discussed in the previous blog) Spark automatically takes cares of late data. Every aggregate window is like a bucket Continue Reading

Spark Structured Streaming (Part 3) – Stateful Streaming

Reading Time: 4 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Internals of Structured Streaming“. And this blog pertains to Stateful Streaming in Spark Structured Streaming. So let’s get started. Let’s start from the very basic understanding of what is Stateful Stream Processing. But to understand that, let’s first understand what Stateless Stream Processing is. In Continue Reading

Spark Structured Streaming (Part 2) – The Internals

Reading Time: 2 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Introduction to Structured Streaming“. So I’ll exactly start from the point where I left in the previous blog. Structure of Streaming Query When we call start() API, Spark internally translates this code into a Logical Plan (an abstract representation of what the code does), then Continue Reading

Spark Structured Streaming (Part 1) – Introduction

Reading Time: 5 minutes In this Spark Structured Streaming series of blogs, we will have a deep look into what structured streaming is in a very layman language. So let’s get started. Introduction Structured streaming is a stream processing engine built on top of the Spark SQL engine and uses the Spark SQL APIs. It is fast, scalable and fault-tolerant. It provides rich, unified and high-level APIs in the Continue Reading

KSnow: Know about Cloning in Snowflake

Reading Time: 2 minutes This blog pertains to Cloning feature in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Zero Copy Clone Cloning also Snowflake as Zero Copy Clone in Snowflake. It used to create a copy of a Table or Schema or a Database. In most database, in order to make a copy Continue Reading

KSnow: Time Travel and Fail-safe in Snowflake

Reading Time: 5 minutes This blog pertains to Time Travel and Fail-safe in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Introduction to Time Travel Snowflake allows accessing historical data of a point in the past that may have been modified or deleted at the current time. Using time travel functionality a number of Continue Reading

Adaptive Query Execution (AQE) in Spark 3.0

Reading Time: 4 minutes This blog pertains to Apache SPARK 3.x, where we will find out how Spark’s new feature named Adaptive Query Execution (AQE) works internally. So let’s get started. One of the major feature introduced in Apache Spark 3.0 is the new Adaptive Query Execution (AQE) over the Spark SQL engine. So, in this feature, the Spark SQL engine can keep updating the execution plan per computation Continue Reading

KSnow: Load continuous data into Snowflake using Snowpipe

Reading Time: 5 minutes In this blog, we will discuss loading streaming data into Snowflake table using Snowpipe. But before that, if you haven’t read the previous part of this blog i.e., Loading Bulk Data into Snowflake then I would suggest you go through it. As now we have been set so let’s get started and see what Snowpipe is all about. Introduction Snowpipe is a mechanism provided by Continue Reading

KSnow: Loading Data Into Snowflake

Reading Time: 5 minutes This blog pertains to Loading Data into Snowflake, and I will explain you about the various step involved in this process. So let’s get started. Before moving ahead, you can visit the blog on understanding the basic of Snowflake Data Warehouse in case you want to refresh your concepts. Now let’s talk about the actual topic for which you have click on this blog. To Continue Reading

Unfolding foldLeft and foldRight in Scala

Reading Time: 4 minutes The fold method is a Higher Order Function in Scala and it has two variant namely,i. foldLeftii. foldRightIn this blog, we will look into them in detail and try to understand how they work. Before moving ahead, I want to clarify that the fold method is just a wrapper to foldLeft, i.e. the fold method internally invokes the foldLeft method. So, now let’s get started. Continue Reading

Back2Basics: Currying Function in Scala

Reading Time: 2 minutes Normally we write function and it seems like below: We declare a function with all the arguments needed inside a single parameter list. In currying however, we can split this parameter list into multiple parameter lists. For example, we could split the parameter list into multiple where each list just takes one parameter. However, the function body in both cases remains the same. Also, the Continue Reading

Companion Object in Scala

Reading Time: 4 minutes In order to understand the use of companion objects, it is important to understand the static members first. We have learned that in object-oriented languages, classes are the blueprint that contains members such as fields and methods. But in order to access these fields or methods, we first need to construct the objects from these classes. Let’s look at an example. We created a class Continue Reading