Author: Ayush Tiwari

The Rise Of Scanamo: Async Access For DynamoDB In Scala

Scanamo is a library to use DynamoDB with Scala in a simpler manner with less error-prone code. Now the question is  “Why should anyone use it?” The answer is very simple. As DynamoDB clients provided by AWS are not available in Scala DSL. So there are a number of libraries available for DynamoDB to write your queries in Scala. But what makes Scanamo different from other Continue Reading

Spark Stream-Stream Join

Spark Stream-Stream Join

In Spark 2.3, it added support for stream-stream joins, i.e, we can join two streaming Datasets/DataFrames and in this blog we are going to learn about Spark Stream-Stream Join and see how beautifully spark now give support for joining the two streaming dataframes. I this example, I am going to use Apache Spark 2.3.0 Apache Kafka 0.11.0.1 Scala 2.11.8 The build.sbt looks like the following:- scalaVersion := Continue Reading

SOLID PRINCIPLES

SOLID Principles: Basic building block of the software system

“Good software system begins with clean code” and by “clean code”, I mean it is code that is easy to understand and easy to change. This is the point where there is a need to know about SOLID Principles which helps us in writing the clean code as the code is like humour. “When you have to explain it, it’s bad.” The SOLID PRINCIPLES tell Continue Reading

HDFS Erasure Coding in Hadoop 3.0

HDFS Erasure Coding(EC) in Hadoop 3.0 is the solution of the problem that we have in the earlier version of Hadoop, that is nothing but its 3x replication factor which is the simplest way to protect our data even in the failure of Datanode but needs too much extra storage. Now,  in EC storage overhead magically reduced to 50% which is earlier 200% because of Continue Reading

Quick Start with Finagle

Finagle is an extensible RPC system for the JVM, used to construct high-concurrency servers. It implements uniform client and server APIs for several protocols, and is designed for high performance and concurrency. Most of Finagle’s code is protocol agnostic, simplifying the implementation of new protocols. Today here I am going to implement the Finagle example using Scala where I am sending the request with some message Continue Reading

Apache Storm: Architecture

Apache Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing the realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use! Components of a Storm cluster Apache Storm cluster Continue Reading

Apache Storm

Apache Storm: The Hadoop of Real-Time

Apache Storm is an open source & distributed stream processing computation framework written predominantly in the Clojure programming language. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm allows developers to build powerful applications that are highly responsive and can find trends between topics on twitter, monitoring spikes in payment failures, and Continue Reading

Basic Example for Spark Structured Streaming & Kafka Integration

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage. This version of the integration is marked as Continue Reading

%d bloggers like this: