Streaming

Let’s get to know Data Streaming: A dev’s point of view

Reading Time: 5 minutes Streaming of data is the need of the hour. This blog focuses on the developer’s need to process this stream, benefits, and the challenges it introduces.

Reading Avro files using Apache Flink

Reading Time: 2 minutes In this blog, we will see how to read the Avro files using Flink. Before reading the files, let’s get an overview of Flink. There are two types of processing – batch and real-time. Batch Processing: Processing based on the data collected over time. Real-time Processing: Processing based on immediate data for an instant result. Real-time processing is in demand and Apache Flink is the Continue Reading

Realtime Supply Chains

Reading Time: 7 minutes Supply chains is a serious topic and is critical to the survival of mankind. COVID has proven that the current supply chains performed reasonably well and ensured ‘essential’ goods are delivered. But, it was also evident that even with couple of months of disruption things have become very scary. There are several trends that are emerging. First is the impact of corona virus. Reconfiguring Global Continue Reading

Apache Spark: Delta Lake as a Solution – Part I

Reading Time: 3 minutes Today, everyone is talking about Delta Lake. Why? Ever tried to find the answer to this question? Yes or No doesn’t matter, don’t worry here in Part1 we will be discussing the same & also will be targetting the following questions: What are the features missing from Apache Spark? What kind of issues it causes in executing Data Lake? Answering the above questions will definitely Continue Reading

Apache Spark: Tricks to Increase Job Performance

Reading Time: 2 minutes Apache Spark is quickly adopting the Real-world and most of the companies like Uber are using it in their production. Spark is gaining its popularity in the market as it also provides you with the feature of developing Streaming Applications and doing Machine Learning, which helps companies get better results in their production along with proper analysis using Spark. Although companies are using Spark in Continue Reading

Data Lake – Build it in Phases

Reading Time: 3 minutes Data Lake – How to build a data lake and what are the phases involved in the same.

Kafka Timestamp Extractor

Reading Time: 3 minutes Hi folks, I hope you all’re doing well, so if you land up here you probably looking for Timestamp Extractor for kafka streams, so whats the buzz is all about? So in this blog we are going to look what it is and would explore it as well, so buckle up. The Timestamp Extractor As per docs, A timestamp extractor extracts a timestamp from an Continue Reading

Apache Spark

Deep Dive into Apache Spark Transformations and Action

Reading Time: 4 minutes In our previous blog of Apache Spark, we discussed a little about what Transformations & Actions are? Now we will get deeper into the topic and will understand what actually they are & how they play a vital role to work with Apache Spark? What is Spark RDD? Spark introduces the concept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, distributed collection of objects Continue Reading

Diving deeper into Delta Lake

Reading Time: 6 minutes Delta Lake is an open-source storage layer that brings reliability to data lakes. It has numerous reliability features including ACID transactions, scalable metadata handling, and unified streaming and batch data processing.

Delta Lake To the Rescue

Reading Time: 4 minutes Welcome Back. In our previous blogs, we tried to get some insights about Spark RDDs and also tried to explore some new things in Spark 2.4. You can go through those blogs here: RDDs – The backbone of Apache Spark Spark 2.4: Adding a little more Spark to your code In this blog, we will be discussing something called a Delta Lake. But first, let’s Continue Reading

Spark – Actions and Transformations

Reading Time: 4 minutes Hey guys, welcome to series of spark blogs, this blog being the first blog in this series we would try to keep things as crisp as possible, so let’s get started. So I recently get to start learning spark about believe me and now it has made me inquisitive about it, for a brief introduction of spark, I would say that it is a pretty Continue Reading

Tale of Apache Spark

Reading Time: 6 minutes Data is being produced extensively in today’s world and it is going to be generated more rapidly in future. 90% of total data that is produced in the world is produced in last two years only and it is estimated that in 2020 world’s total data would reach 45 ZB and data generated each day would be enough that if we try to store it Continue Reading

Streaming data from Cassandra using Alpakka

Reading Time: 7 minutes Alpakka project is an open-source initiative to implement stream aware and reactive pipelines using Java and Scala which is built on top of Akka streams and specially designed to provide a DSL for reactive and stream-oriented programming with built-in support for backpressure to avoid the flood of data. As a reference, Akka streams supports reactive streams and JDK 9+ compliant implementation and therefore fully interoperable Continue Reading