Author: Ramandeep

Serialization in Kafka

Reading Time: 2 minutes Serialization is the process of converting an object into a stream of bytes that are used for transmission. Kafka stores and transmits these bytes of arrays in its queue. Deserialization, as the name suggests, does the opposite of serialization, in which we convert bytes of arrays into the desired data type. Apache Kafka stores as well as transmit these bytes of arrays in its queue. Continue Reading

Rebalancing: What the fuss is all about?

Reading Time: 4 minutes Apache Kafka is ruling in the world of Big Data. It is just not a messaging queue but a full-fledged event streaming platform. We have looked through the basic idea of Kafka and what makes it faster than any other messaging queue. You can read about it from my previous blog. Also, we looked through Partitions, Replicas, and ISR. We are now ready for our Continue Reading

Kafka

Apache Kafka: Topic Partitions, Replicas & ISR

Reading Time: 6 minutes In earlier blogs, we have gone through the basic terminologies of Kafka, and one step deeper into Zookeeper. Now let’s talk in detail about topic Partitions and replicas.  Topic Partitions The topic is a place holder of your data in Kafka. Data on a topic is further divided onto partitions. Each partition is an ordered, immutable sequence of records that is continually appended to a Continue Reading

Apache Zookeeper: Does Kafka need it?

Reading Time: 3 minutes In my previous blog, we started with what Kafka is, and what makes Kafka fast. If you haven’t read already, you should give it a read. We also talked briefly about Zookeeper. We know that Zookeeper keeps track of the status of the Kafka cluster nodes and it also keeps track of Kafka topics, partitions, etc. But what else?In this blog, we will learn more Continue Reading

Apache Kafka: What & Why?

Reading Time: 6 minutes What is Apache Kafka? Apache Kafka is a well-known name in the world of Big Data. It is one of the most used distributed streaming platforms. Kafka is just not a messaging queue but a full-fledged event streaming platform. It is a framework for storing, reading and analyzing streaming data. It is a publish-subscribe based durable messaging system exchanging data between processes, applications, and servers. Continue Reading

A tour to the Scala Futures

Reading Time: 6 minutes While executing long computations, performance is something that’s always being the concern. Luckily, Futures come to our rescue. A Future gives you a simple way to run an algorithm concurrently. Futures are the standard mechanism for writing multithreaded code in Scala. Whenever we create a new Future operation, Scala spawns a new thread to run that Future’s code, and after completion, it executes any provided Continue Reading

Do you really need Spark? Think Again!

Reading Time: 5 minutes With the massive amount of increase in big data technologies today, it is becoming very important to use the right tool for every process. The process can be anything like Data ingestion, Data processing, Data retrieval, Data Storage, etc. Today we are going to focus on one of those popular big data technologies i.e., Apache Spark. Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark Continue Reading

Knolx: How Spark does it internally?

Reading Time: < 1 minute Knoldus has organized a 30 min session on Oct 12 at 3:30 PM. The topic was How Spark does it internally? Many people have joined and enjoyed the session. I am going to share the slides and the video here. Please let me know if you have any question related to linked slides.   How Spark Does It Internally? from Knoldus Inc.   Here’s the video of the Continue Reading

kafka with spark

RDD: Spark’s Fault Tolerant In-Memory weapon

Reading Time: 5 minutes A fault-tolerant collection of elements that can be operated on in parallel:  “Resilient Distributed Dataset” a.k.a. RDD RDD (Resilient Distributed Dataset) is the fundamental data structure of Apache Spark which are an immutable collection of objects which computes on the different node of the cluster. Each and every dataset in Spark RDD is logically partitioned across many servers so that they can be computed on Continue Reading

kafka with spark

Spark Unconstructed | Deep dive into DAG

Reading Time: 4 minutes Apache Spark is all the rage these days. People who work with Big Data, Spark is a household name for them. We have been using it for quite some time now. So we already know that Spark is lightning-fast cluster computing technology, it is faster than Hadoop MapReduce. If you ask any of these Spark techies, how Spark is fast, they would give you a Continue Reading

The Law of Demeter

Reading Time: 3 minutes You’ll often get to hear from good programmers about having “loosely coupled” classes. What do they mean by saying that? Let’s understand this first before jumping onto the Law of Demeter. Loosely Coupled  In object-oriented design, the amount of coupling refers to how much the design of one class depends on the design of another class. In other words, how often do changes in class

Clean Code – Robert C. Martin’s Way

Reading Time: 7 minutes Writing good code in accordance with all the best practices is often overrated. But is it really? Writing good and clean code is just like good habits which will come with time and practice. We always give excuses to continue with our patent non-efficient bad code. Reasons like no time for best practices, meeting the deadlines, angry boss, tired of the project etc. Most of Continue Reading