Big Data and Fast Data

Backpressure in Akka Stream

Reading Time: 4 minutes “Reactive Streams” — whenever we come across these words, there are two things that come to our mind. The first is asynchronous stream processing, and the second is non-blocking backpressure. In this blog, we are going to learn about the latter part. Understanding Backpressure Very simply put, the idea behind backpressure is the ability to say “hey slow down!”. Let’s start with an example that Continue Reading

Set-up Kafka Cluster On GCP

Reading Time: 4 minutes In this article, we are going to create Kafka Clusters on the GCP platform. We can do it in various ways like uploading Kafka directory to GCP, creating multiple zookeepers, by creating multiple copies of the server.properties file, etc. But, In this article, we are doing it in a simpler way i.e. by Creating a Kafka Cluster (with replication). Let’s Start… What is GCP?  GCP Continue Reading

Debugging Apache Beam Pipeline

Reading Time: 2 minutes Overview Apache Beam is known as one of the widely used frameworks for Stream and Batch processing in a distributed environment and provides some very unique features. It is an open-source, unified bulk data processing framework that supports data processing through various SDKs that allow the execution of pipelines in different processing engines/runners. Beam Apache runners : Spark Flink Apex Google Cloud Dataflow DirectRunner. A Continue Reading

Fault tolerance and Resiliency in Apache Kafka.

Reading Time: 5 minutes Kafka is known for it’s performance with resiliency & fault tolerance. In this article we’ll see how to make some changes in configuration to achieve fault tolerance and resilience for better architectural need. before starting the article, we need to have basic knowledge of Kafka or we can go through the Document. Apache Kafka is a distributed system, and the term fault tolerance is very Continue Reading

Akka Actors: How do they actually work

Reading Time: 2 minutes You might have used akka actors for building concurrent and distributed systems. But do you know how actors actually works under the hood? So let us understand in this blog how akka actors actually work. First of all, let us understand what are akka actors. Introduction Akka is an open-source library that helps to easily develop concurrent and distributed applications using Java or Scala by leveraging the Continue Reading

Complete CI/CD Pipeline For MicroService Using Akka Http And BitBucket

Reading Time: 4 minutes In this blog, we will see how to set up a Bit-Bucket Pipeline to get CI/CD for your Akka HTTP Application. We will be deploying the application on the HEROKU server, but in this blog, we will also see how to dockerize the application and push the image on Docker Hub which further can be used to deploy on any container orchestration platform. Let’s get Continue Reading

Build your first API with Scala and Akka HTTP

Reading Time: 3 minutes Akka is a free and open-source toolkit and runtime simplifying the construction of concurrent and distributed applications on the JVM. It is built by Lightbend. Akka supports multiple programming models for concurrency, but it prefers actor-based concurrency. One can integrate this library into any JVM support language. It implements Actor Based Model. The Actor Model provides a higher level of abstraction for writing concurrent and Continue Reading

How to delete record from Kafka Topic : Tombstone

Reading Time: 4 minutes Hello Reader,Here we will see how can we delete records from Kafka’s topic(compacted topic as well as the non-compacted topic). Problem : GDPR: General Data Protection Regulation is a regulation that requires businesses to protect the personal data and privacy of EU citizens for transactions that occur within EU member states. CCPA: The California Consumer Privacy Act is a state-wide data privacy law that regulates Continue Reading

Akka Toolkit | Creating your First Akka Actor | Scala

Reading Time: 3 minutes Akka, a free open source toolkit simplifying the construction of concurrent and distributed system/application. In this blog we are gonna discussing about Akka, Actors and finally creating and running our first actor. Akka – Again, it is a free and open-source toolkit and runtime. It is used to develop highly concurrent, distributed, and fault-tolerant message-driven applications on the JVM(Java Virtual Machine). It includes features for Continue Reading

Creating a DAG in Apache Airflow

Reading Time: 4 minutes In my previous blog, I have discussed Airflow – A workflow Manager. In this blog, we will write a DAG for Airflow that would define a workflow of tasks and their dependencies. Before writing a DAG file, we will first look into the operators that can be used while writing a DAG. Airflow Operators An operator represents a single, ideally idempotent, task. Operators determine what actually Continue Reading

Apache Airflow – A Workflow Manager

Reading Time: 4 minutes As the industry is becoming more data driven, we need to look for a couple of solutions that would be able to process a large amount of data that is required. A workflow management system provides an infrastructure for the set-up, performance and monitoring of a defined sequence of tasks, arranged as a workflow application. Workflow management has become such a common need that most Continue Reading

Beginners Level: Akka Typed API

Reading Time: 3 minutes In this blog, I will be explaining Akka Typed API. This is going to be my first blog on Akka Typed, so let us name it “Beginner Level: Akka Typed API“. Here, I will be telling you the reason for preferring Akka typed over untyped. Along with that, I will also be demonstrating some implementations with Akka Typed. Now before heading towards Akka Typed API, Continue Reading

Writing Unit Test for Apache Spark using Memory Streams

Reading Time: 2 minutes In this post, we are going to look into how we can leverage apache spark’s memory streams for Unit testing What is it ? Apache spark’s memory streams is a concrete streaming source of memory data source that supports reading in Micro-Batch Stream Processing. Lets jump into it We will be using a memory stream writing some test data in memory as a stream. We Continue Reading