0 comments on “Spark Stream-Stream Join”

Spark Stream-Stream Join


In Spark 2.3, it added support for stream-stream joins, i.e, we can join two streaming Datasets/DataFrames and in this blog we are going to see how beautifully spark now give support for joining the two streaming dataframes. I this example,…

0 comments on “SOLID Principles: Basic building block of the software system”

SOLID Principles: Basic building block of the software system


"Good software system begins with clean code" and by "clean code", I mean it is code that is easy to understand and easy to change. This is the point where there is a need to know about SOLID Principles which…

1 comment on “HDFS Erasure Coding in Hadoop 3.0”

HDFS Erasure Coding in Hadoop 3.0


HDFS Erasure Coding(EC) in Hadoop 3.0 is the solution of the problem that we have in the earlier version of Hadoop, that is nothing but its 3x replication factor which is the simplest way to protect our data even in…

0 comments on “Quick Start with Finagle”

Quick Start with Finagle


Finagle is an extensible RPC system for the JVM, used to construct high-concurrency servers. Finagle implements uniform client and server APIs for several protocols, and is designed for high performance and concurrency. Most of Finagle‚Äôs code is protocol agnostic, simplifying…

2 comments on “Apache Storm: Architecture”

Apache Storm: Architecture


Apache Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing the realtime computation. Storm is simple, can be used…

2 comments on “Apache Storm: The Hadoop of Real-Time”

Apache Storm: The Hadoop of Real-Time


Apache Storm is an open source & distributed stream processing computation framework written predominantly in the Clojure programming language. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.…

10 comments on “Basic Example for Spark Structured Streaming & Kafka Integration”

Basic Example for Spark Structured Streaming & Kafka Integration


The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration…