Big Data and Fast Data

Understanding persistence in Apache Spark

Reading Time: 4 minutes In this blog, we will try to understand the concept of Persistence in Apache Spark in a very layman term with scenario-based examples. Note: The scenarios are only meant for your easy understanding. Spark Architecture Note: Cache memory can be shared between Executors. What does it mean by persisting/caching an RDD? Spark RDD persistence is an optimization technique which saves the result of RDD evaluation Continue Reading

Learning about Reactive Messaging Patterns

Reading Time: 4 minutes Overview According to the Reactive Manifesto, a critical element in any Reactive system is that it is message-driven. But what does it mean to be message-driven? Message-driven systems are those that communicate primarily through asynchronous and non-blocking messages. Messages enable us to build systems that are both resilient, and elastic, and therefore responsive under a variety of situations. Message Driven Architecture We have various ways Continue Reading

Implementing Akka Cluster Sharding

Reading Time: 3 minutes Now that we have a basic understanding of Akka Cluster Sharding in my previous blog. Let’s have a look at how we are going to implement this and what are the things that we need to keep in our mind while doing that.  To shard a specific type of actor we use the cluster sharding akka extension, and we call it as start, we can Continue Reading

Introduction to Akka Cluster Sharding

Reading Time: 3 minutes When we think of sharding or partitioning it’s typically related to databases. Databases uses sharding to improve resilience and elasticity. The Akka Toolkit provides Cluster Sharding as a way to introduce sharding in your application. Instead of distributing database records across a cluster, we’re distributing actors across nodes in the cluster, and it enables running at most one instance of a given actor at any Continue Reading

Circuit Breaker in Akka

Reading Time: 3 minutes Hey everyone, in today’s blog I will be covering the concept of Circuit Breaker in Akka. Before moving forward to it just think of a situation when you are requesting on a website and it is taking too much time. You try to refresh the page and still the same. Would you like to use that website again? I think the answer will be no. Continue Reading

Flink: Union operator on Multiple Streams

Reading Time: 3 minutes Apache Flink offers rich sources of API and operators which makes Flink application developers productive in terms of dealing with the multiple data streams. Flink provides many multi streams operations like Union, Join, and so on. In this blog, we will explore the Union operator in Flink that can combine two or more data streams together. We know in real-time we can have multiple data streams from different sources Continue Reading

Flink: Implementing the Session window.

Reading Time: 3 minutes In the previous blogs, we learned about Tumbling, Sliding, and Count windows in Flink. There is one another useful way to window the data which Flink offers i.e, Session window. So in this blog, we will explore the Session window in detail with an example. In the real world, all the work that we do online- Visiting a website, Clicking around the website, do online Continue Reading

Flink: Implementing the Count Window

Reading Time: 3 minutes In the blog, we learned about Tumbling and Sliding windows which is based on time. In this blog, we are going to learn to define Flink’s windows on other properties i.e Count window. As the name suggests, count window is evaluated when the number of records received, hits the threshold. Count window set the window size based on how many entities exist within that window. For example, if we fixed the count Continue Reading

Flink: Time Windows based on Processing Time

Reading Time: 4 minutes In the previous blog, we talked about Flink’s windows operator, a heart of processing infinite streams. Generally in Flink, after specifying that the stream is keyed or non keyed, the next step is to define a window assigner. The window assigner defines how elements are assigned to windows. Flink provides some useful predefined window assigners like Tumbling windows, Sliding windows, Session windows, Count windows, and Continue Reading

Spark SQL in Delta Lake 0.7.0

Reading Time: 3 minutes Nowadays Delta lake is a buzz word in the Big Data world, especially among the spark developers because it relegates lots of issues found in the Big Data domain. Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It is evolving day by day and adds cool features in its every release. Continue Reading

Basic Anatomy of a Flink Program

Reading Time: 3 minutes Hi Folks! Hope you all are safe in the COVID-19 pandemic and learning new tools and tech while staying at home. I also have just started learning a very prominent Big Data framework for stream processing which is  Flink. Flink is a distributed framework and based on the streaming first principle, means it is a real streaming processing engine and implements batch processing as a special case. In Continue Reading

Windows operator: Heart of processing infinite streams in Flink

Reading Time: 3 minutes Apache Flink is an open-source, distributed, Big Data framework for stream and batch data processing. Flink is based on the streaming first principle which means it is a real streaming processing engine and implements batching as a special case. Flink is considered to have a heart and it is the “Windows” operator. It makes Flink capable of processing infinite streams quickly and efficiently. Windows split Continue Reading