Streaming

Structured Streaming: Philosophy behind it

In our previous blogs: Structured Streaming: What is it? & Structured Streaming: How it works? We got to know 2 major points about Structured Streaming – It is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. It treats the live data stream as a table that is being continuously appended/updated which allows us to express our streaming computation as Continue Reading

Structured Streaming: How it works?

In our previous blog post – Structured Streaming: What is it? we got to know that Structured Streaming is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. Now it’s time to learn  – How it works? So, in this blog post, we will look at the working of a structured stream via an example. So, let’s take a Continue Reading

Structured Streaming: What is it?

With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like – Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on solving business challenges. The reason is, the frameworks (the ones mentioned above) provided inbuilt support for all of them. For example: In Spark Streaming, by just adding Continue Reading

Kafka And Spark Streams: The happily ever after !!

Hi everyone, Today we are going to understand a bit about using the spark streaming to transform and transport data between Kafka topics. The demand for stream processing is increasing every day. The reason is that often, processing big volumes of data is not enough. We need real-time processing of data especially when we need to handle continuously increasing volumes of data and also need Continue Reading

KnolX: Understanding Spark Structured Streaming

Hello everyone, Knoldus organized a session on 05th January 2018. The topic was “Understanding Spark Structured Streaming”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides:

Knolx: Guaranteed No Stress Baby Steps Using Akka Streams Part-II

Hello everyone, Knoldus organized a session on 25th November 2017. The topic was “Guaranteed No Stress Baby Steps Using Akka Streams Part-II”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides:

Knolx: Guaranteed No Stress Baby Steps Using Akka Streams Part-I

Hello everyone, Knoldus organized a session on 28th October 2017. The topic was “Guaranteed No Stress Baby Steps Using Akka Streams Part-I”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides:

Rules while working with stream in Java 8

First, let’s have a basic understanding of stream. Then we will have a look at the side effects that can occur while working with streams. Stream represents a sequence of objects from a source, which supports aggregate operations. One thing to be notified while working with streams  is that, aggregate operation (intermediate operations) are lazy evaluated i.e. they do not start processing the contents of Continue Reading

One-way & two-way streaming in a Lagom application

Now a days streaming word is a buzz word and you should have heard many types of streaming till now i.e. kafka streaming, spark streaming etc etc. But in this blog we will see a new type of streaming i.e Lagom-streaming. Lagom-streaming internally uses Akka streams, with the help of which we will see one way & two way streaming. But before going forward, it Continue Reading

Apache Storm: Architecture

Apache Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing the realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use! Components of a Storm cluster Apache Storm cluster Continue Reading

Case Study to understand Kafka Consumer and its offsets

In this blog post, we will discuss mainly Kafka Consumer and its Offsets. We will understand this using a case study implemented in Scala. This blog post assumes that you are aware of basic Kafka terminology. CASE STUDY: The Producer is continuously producing records to the source topic. The Consumer is consuming those records from the same topic as it has subscribed for that topic. Continue Reading

fetching data from different sources using Spark 2.1

What’s new in Apache Spark 2.2

Apache recently released a newer version of Spark i.e Apache Spark 2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production ready and its experimental tag has been removed. Some of the high-level changes and improvements : Production ready Structured Streaming Expanding SQL functionalities New Continue Reading

Basic Example for Spark Structured Streaming & Kafka Integration

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage. This version of the integration is marked as Continue Reading

%d bloggers like this: