Big Data and Fast Data

Introduction to Scanamo Version – 1.0.0-M11

Reading Time: 2 minutes Scanamo Scanamo is a library to make use of DynamoDB with Scala. In this blog, we are going to work on latest version 1.0.0-M11 So, let’s get it started- 1. Add Library Dependencies: – Setup sbt project and add these library dependencies 2. Setup application.conf: – Create application.conf file and place it inside the src\main\resources\ 3. Create Case Class And DAO: – Create case class Continue Reading

Introduction to Akka Http

Reading Time: 2 minutes Akka Http The Akka HTTP modules implement a full server- and client-side HTTP stack on top of akka-actor and akka-stream. You can read more in-depth about akka-http from here. Here, we are going to use Intellij for the project setup and Scala as a programming language. The steps are as follows:- 1. Importing library dependencies 2. Creating User case class We are going to create Continue Reading

Introduction to Alpakka DynamoDB

Reading Time: 3 minutes Alpakka DynamoDB This AWS DynamoDB connector provides a flow for streaming DynamoDB requests, using Akka stream and AWS java DynamoDB SDK. And in this blog, we are going to work on version 1.1.2 So, let’s get it started- 1. Add Library Dependencies: – Setup sbt project and add these library dependencies in build.sbt 2. Setup application.conf: – Create application.conf file and place it inside the Continue Reading

Rebalancing in Akka Cluster Sharding

Reading Time: 4 minutes In this blog we will be discussing about one of the important feature of Akka Cluster Sharding which is Rebalancing. Before moving forward make sure you have some basic knowledge on Akka Cluster Sharding, if not then please read Introduction to Akka Cluster Sharding and Implementing Akka Cluster Sharding. Before directly diving into this amazing feature which akka sharding provides, lets first understand the need Continue Reading

Cloudstate (Part 3): Giving a Second-Thought to CRUD

Reading Time: 4 minutes In this blog post, we will take a slight detour from Cloudstate and understand why we need to give a second thought to the way CRUD operations are done in Serverless Computing. Before diving deep into the need of reconsidering CRUD operations strategy in Serverless Computing, let’s first understand CRUD. What is CRUD? CRUD is an acronym for the four general operations that a database Continue Reading

A Quick Demo: Kafka to Flink to Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Flink with Kafka and Cassandra to build a simple streaming data pipeline. Apache Flink is a framework and distributed processing engine. it is used for stateful computations over unbounded and bounded data streams.Kafka is a scalable, high performance, low latency platform. It allows reading and writing streams of data like a messaging system.Cassandra: A distributed and wide-column Continue Reading

Loading JSON data into Snowflake

Reading Time: 4 minutes Have you ever faced any use case or scenario where you’ve to load JSON data into the Snowflake? We better know JSON data is one of the common data format to store and exchange information between systems. JSON is a relatively concise format. If we are implementing a database solution, it is very common that we will come across a system that provides data in Continue Reading

Demystifying Dispatchers in Akka

Reading Time: 2 minutes As the name suggests ‘dispatch‘ decides when the actor should dispatch a message and when it does not have to. In the situation in which you think that your actor system must get faster but it’s not working as per expectation then dispatchers are the first place, you should look at. It’s up to the Dispatchers to decide when to allow the actor to process Continue Reading

Flink: Join two Data Streams

Reading Time: 3 minutes Apache Flink offers rich sources of API and operators which makes Flink application developers productive in terms of dealing with the multiple data streams. Flink provides many multi streams operations like Union, Join, and so on. In this blog, we will explore the Window Join operator in Flink with an example. It joins two data streams on a given key and a common window. Let say we have one stream which contains salary information of all Continue Reading

DevOps Shorts: How to increase the replication factor for a Kafka topic

Reading Time: 2 minutes Have you ever faced a situation where you had to increase the replication factor for a topic? Turns out it’s really easy to do it. In this super short blog, let’s try to do just that. We’d start with creating a topic, one, with a replication factor of just 1 and then work on bits that include creating the increase.json file and then actually triggering the plan. Step 1: Create Continue Reading

Supervising Actors in Akka

Reading Time: 3 minutes After going through the previous blogs, we are now familiar with Akka Actors, their implementation and the Ask pattern. In this blog, we are going to discuss about supervision and various supervision strategies. So, let’s begin. What is Supervision? In case of failure, rather than forcing it back on the caller(customer), we prefer to handle it internally. Within Akka, it is done using a technique Continue Reading

Using Spark as a Database

Reading Time: 4 minutes You must have heard that Apache Spark is a powerful distributed data processing engine. But do you know that Spark (with the help of Hive) can also act as a database? So, in this blog, we will learn how Apache Spark can be leveraged as a database by creating tables in it and querying upon them. Introduction Since Spark is a database in itself, we Continue Reading