Author Archives: Ayush Tiwari

HDFS Erasure Coding in Hadoop 3.0


HDFS Erasure Coding(EC) in Hadoop 3.0 is the solution of the problem that we have in the earlier version of Hadoop, that is nothing but its 3x replication factor which is the simplest way to protect our data even in … Continue reading

Posted in apache spark, HDFS, Scala, Spark | Tagged | 1 Comment

Quick Start with Finagle


Finagle is an extensible RPC system for the JVM, used to construct high-concurrency servers. Finagle implements uniform client and server APIs for several protocols, and is designed for high performance and concurrency. Most of Finagle’s code is protocol agnostic, simplifying … Continue reading

Posted in Scala, Web Services | Tagged , | Leave a comment

Apache Storm: Architecture


Apache Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing the realtime computation. Storm is simple, can be used … Continue reading

Posted in big data, Clojure, Scala, Streaming | 2 Comments

Apache Storm: The Hadoop of Real-Time


Apache Storm is an open source & distributed stream processing computation framework written predominantly in the Clojure programming language. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. … Continue reading

Posted in Scala | 2 Comments

Basic Example for Spark Structured Streaming & Kafka Integration


The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration … Continue reading

Posted in Scala, Spark, Streaming | Tagged , | 10 Comments