Big Data and Fast Data

Flinkathon: What makes Flink better than Kafka Streams?

Initially, I would like you all to focus on a few questions before comparing the frameworks:1. Is there any comparison or similarity between Flink and the Kafka?2. What could be better in Flink over the Kafka?3. Is it the problem or system requirement to use one over the other? Before talking about the Flink betterment and use cases over the Kafka, let’s first understand their Continue Reading

Commit Log: A commitment that Cassandra provides.

Welcome back, everyone. I have been working on Cassandra for quite some time now but never actually got to explore its working in depth. We know that its decentralized nature, as well as its ability to handle such a large volume of writes, makes it really commendable. But how does it manage to be efficient? How is it able to achieve what it is so Continue Reading

Data Ingestion in Druid – Overview

Hey folks, nowadays Big Data is the most trending topic and most of us are familiar with it as well as all the trending technologies. One of them is Druid which is a distributed, column-oriented, real-time analytical data store. A quick recap on Druid can be found on the blog posts https://blog.knoldus.com/introducing-druid-realtime-fast-data-analytics-database/. In this blog post, we will be discussing how data ingestion can be done in Continue Reading

Kafka: Consumer – Push vs Pull approach

Have you ever thought about the Push vs Pull approach for the system, which one suits or solves which problem? Another Question why did Kafka choose Pull over Push design for Consumers? Before talking about the Kafka approach, whether the Broker should push the data to consumer or consumer should pull from Kafka? Let’s first understand both of the approaches, as each one has its Continue Reading

Lagom Persistence API with Couchbase

In this blog, I will talk about using Couchbase with Lagom’s Persistent Entity API. And then we will see how we can query for fetching the data. We already know that Lagom handles data persistence with ‘Persistent Entity’ which holds the state of individual entities. And to interact with them one must know the identifier of the entity. So Lagom provides the support to build Continue Reading

KSQL: Streams and Tables

By now you must be familiar with KSQL and how to get started with it. If not, check out the Part1 KSQL: Getting started with Streaming SQL for Apache Kafka of this series. In this blog, we’ll move one step forward to get an understanding of the Dual streaming model to see what abstractions does KSQL use to process the data. All the data that we Continue Reading

integrating Cucumber with Akka-Http

Akka Persistence: Making Actor Stateful

Akka is a toolkit for designing scalable, resilient systems that span processor cores and networks. Akka allows you to focus on meeting business needs instead of writing low-level code to provide reliable behavior, fault tolerance, and high performance. Akka actor can have state but it’s lost when the actor is shutdown or crashed. Fortunately, we can persist actor state using Akka Persistence which is one of Akka Continue Reading

Flinkathon: Why Flink is better for Stateful Streaming applications?

Stream processing is a way to query a continuous stream of data and draw conclusions from it within the boundaries of a real-time scenario. For example, receiving an alert as soon as a fraudulent transaction is done via a credit/debit card. The 2 main types of stream processing done are: Stateless: Where every event is handled completely independent from the preceding events. Stateful: Where a Continue Reading

Monitor a Kafka stream application with Graphite-Grafana using JMX metrics

A few days back, we got the requirement that we need to monitor a Kafka stream application using JMX metrics. We looked for the solution and reached to the conclusion which we will discuss in this blog. I will try to explain each and every component of the solution along with the setup and the integration part of the whole system. Proposed solution: Service (application) exposes Continue Reading

Reactivate your streams with Reactive Streams!!

As you all might have known by now that one of the hot topics for quite some time has been streaming of big data. Day after day, we see tons of streaming technologies out there competing with one another. The obvious reason for that, processing big volumes of data is not enough. We need real-time processing of data, especially when we need to handle continuously increasing Continue Reading

Spark: Introduction to Datasets

As I have already discussed in my previous blog Spark: RDD vs DataFrames about the shortcomings of RDDs and how DataFrames overcome them. Now we’ll try to have a look at the shortcomings of DataFrames and how Dataset APIs can overcome them. DataFrames:- A DataFrame is a distributed collection of data, which is organized into named columns. Conceptually, it is equivalent to the relational tables with Continue Reading

Akka Stream: Map And MapAsync

In this blog, we will discuss what are “map” and “mapAsync” when used in the Akka stream and how to use them. The difference is highlighted in their signatures:- Flow.map takes in a function that returns a type T, while Flow.mapAsync takes in a function that returns a type Future[T]. Let’s take one practical example to understand both:- Problem – Suppose we have a user with a userId and Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!