Big Data and Fast Data

Big Data Analytics: An Introduction

Reading Time: 5 minutes DATA ANALYTICS Data can help businesses better understand their customers and improve their advertising campaigns. It can also help personalise their content, and improve their bottom lines. The advantages of data are many, but you can’t access these benefits without the proper data analytics tools and processes. While raw data has a lot of potentials, you need data analytics to unlock the power to grow Continue Reading

Spark 3.0 : Adaptive Query Execution(AQE)

Reading Time: 3 minutes Introduction As we all know optimization plays an important role in the success of spark SQL. Therefore, a lot of work has been done in this direction. Before spark 3.0, cost-based optimization was a major hit in which different stages related to cost (based on time efficiency and estimated CPU and I/O usage) are compared and executes the strategy which minimizes the cost. But, because Continue Reading

Exploring HepPlanner for Apache Calcite

Reading Time: 3 minutes In this blog, we will see different ways to manipulate a rel node tree using a Hep planner. A basic understanding of Apache Calcite is necessary for this. Check out the homepage here https://calcite.apache.org/ What is a HepPlanner? It is a rule-based planner to transform a relational expression represented as a tree-like structure. It allows us to specify a condition to identify particular nodes of Continue Reading

Synchronous Testing In Akka ToolKit | Testing Classic Akka Actors

Reading Time: 3 minutes Akka, a free open source toolkit simplifying the construction of concurrent and distributed systems/applications. In this blog, we are gonna discuss Testing the Akka Actors Synchronously. Usually, we people say that testing the Akka Actors is a bit confusing and tricky too, but it isn’t. Coming to the types of testing in Akka Toolkit, we have two types of testing i.e. Synchronous Testing and Asynchronous Continue Reading

Understanding Akka Streams and Its Components

Reading Time: 4 minutes Overview In this blog, we’ll be understand about akka streams and its components. Also, we’ll do a simple exercise that involves each of these components. Introduction Stream A stream is a flow of data that involves moving and transforming data. An element is the processing unit of the stream. Akka Streams In software development, there can be cases where we need to handle the potentially Continue Reading

How to do Unit testing using embedded PostgreSQL in Akka

Reading Time: 2 minutes Embedded PostgreSQL provides a platform neutral way for running PostgreSQL binary in unit tests. It is an efficient database to write test cases as it supports all data types of PostgreSQL. In this blog I will not dive deep in the features of Embedded PostgreSQL but rather focus on it’s integration with an Akka application. I have added a sample project for the better understanding. Continue Reading

Apache Calcite : Adding custom types and functions

Reading Time: 2 minutes Introduction In this blog we will introduce a custom function and type in our SQL . In the end,we want to parse,validate and convert to a relational node for a simple query like“SELECT CAST(my_custom_function(name) as my_custom_type) FROM SAMPLE” . Setting up the basics A sample schema First we need a simple table named Sample : Sample(ID int not null,NAME varchar not null) FrameworkConfig Next we Continue Reading

Apache Beam Overview

Reading Time: 2 minutes This blog gives an overview of Apache Beam. What is Apache Beam? Apache Beam is an open-source, unified model for defining both batches as well as streaming data-parallel processing pipelines. Moreover available open-source Beam SDKs, can help us to easily build a program for our pipeline. Apache Flink, Apache Spark, and Cloud DataFlow are some of the possible runners to run the program. Why use Continue Reading

Introduction to Akka Streams

Reading Time: 4 minutes Hey folks, let us understand the basics of akka streams. I hope you have a basic understanding of Akka Actor. What is Akka Streams Akka Streams is a library to process and transfer a sequence of elements. It is built on top of Akka Actors to make the ingestion and processing of streams easy. As it is build on top of Akka Actors, it provide Continue Reading

Mailboxes in Akka

Reading Time: 5 minutes Mailboxes are one of the fundamental parts of the actor model. Through the mailbox mechanism, actors can decouple the reception of a message from its elaboration. So, let’s see how Akka Typed, the most famous incarnation of the actor system, implements the concept of mailboxes. Logging Users’ Navigations First, an actor is an object that carries out its actions in response to communications it receives. Hence, in Akka Typed, Continue Reading

Scheduler in Akka

Reading Time: 2 minutes The Akka Actor System provides Akka Scheduler for managing the periodic execution of tasks. In this blog, we’ll see how we can schedule tasks using Akka Scheduler. Dependency Let’s add the Akka-actor dependency to our project: Single Execution Scheduler A single execution scheduler lets us defer the execution of a task. The task will execute after the configured delay. Let’s see how we can create a single Continue Reading

Backpressure in Akka Stream

Reading Time: 4 minutes “Reactive Streams” — whenever we come across these words, there are two things that come to our mind. The first is asynchronous stream processing, and the second is non-blocking backpressure. In this blog, we are going to learn about the latter part. Understanding Backpressure Very simply put, the idea behind backpressure is the ability to say “hey slow down!”. Let’s start with an example that Continue Reading

Set-up Kafka Cluster On GCP

Reading Time: 4 minutes In this article, we are going to create Kafka Clusters on the GCP platform. We can do it in various ways like uploading Kafka directory to GCP, creating multiple zookeepers, by creating multiple copies of the server.properties file, etc. But, In this article, we are doing it in a simpler way i.e. by Creating a Kafka Cluster (with replication). Let’s Start… What is GCP?  GCP Continue Reading