Streaming

Reactivate your streams with Reactive Streams!!

As you all might have known by now that one of the hot topics for quite some time has been streaming of big data. Day after day, we see tons of streaming technologies out there competing with one another. The obvious reason for that, processing big volumes of data is not enough. We need real-time processing of data, especially when we need to handle continuously increasing Continue Reading

Spark Streaming vs. Structured Streaming

Fan of Apache Spark? I am too. The reason is simple. Interesting APIs to work with, fast and distributed processing, unlike map-reduce no I/O overhead, fault tolerance and many more. With this much, you can do a lot in this world of Big data and Fast data. From “processing huge chunks of data” to “working on streaming data”, Spark works flawlessly in all. In this Continue Reading

Is Apache Flink the future of Real-time Streaming?

In our last blog, we had a discussion about the latest version of Spark i.e 2.4 and the new features that it has come up with. While trying to come up with various approaches to improve our performance, we got the chance to explore one of the major contenders in the race, Apache Flink. Apache Flink is an open source platform which is a streaming Continue Reading

kafka with spark

Apache Spark 2.4: Adding a little more Spark to your code

Continuing with the objectives to make Spark faster, easier, and smarter, Apache Spark recently released its fifth release in the 2.x version line i.e Spark 2.4. We were lucky enough to experiment with it so soon in one of our projects. Today we will try to highlight the major changes in this version that we explored as well as experienced in our project. In our Continue Reading

Alpakka – Connecting Kafka and ElasticSearch to Akka streams

In our previous blog, we had a look at what Akka streams are and how they are different from the other streaming mechanisms we have. In this blog, we will be taking a little step forward into the world of Akka Streams. In order to work with Akka streams, we need a mechanism to connect Akka Streams to the existing system components. That is where Alpakka Continue Reading

Akka Streams: Is it a Solution to Your Streaming Problems?

A few days earlier, in our project, we were using Spark streaming and initially, it worked like a charm. But as we were very close to completion of our use case, the unexpected occurred. Spark does have a lot of interesting features, but we had some more custom needs such as running a ton of varying jobs with different actors/flows. Also, we needed something which Continue Reading

Spark Structured Streaming with Elasticsearch

There’s been a lot of time we have been working on streaming data. Using Apache Spark for that can be much convenient. Spark provides two APIs for streaming data one is Spark Streaming which is a separate library provided by Spark. Another one is Structured Streaming which is built upon the Spark-SQL library. We will discuss the trade-offs and differences between these two libraries in Continue Reading

Structured Streaming: Philosophy behind it

In our previous blogs: Structured Streaming: What is it? & Structured Streaming: How it works? We got to know 2 major points about Structured Streaming – It is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. It treats the live data stream as a table that is being continuously appended/updated which allows us to express our streaming computation as Continue Reading

Structured Streaming: How it works?

In our previous blog post – Structured Streaming: What is it? we got to know that Structured Streaming is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. Now it’s time to learn  – How it works? So, in this blog post, we will look at the working of a structured stream via an example. So, let’s take a Continue Reading

Structured Streaming: What is it?

With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like – Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on solving business challenges. The reason is, the frameworks (the ones mentioned above) provided inbuilt support for all of them. For example: In Spark Streaming, by just adding Continue Reading

Kafka And Spark Streams: The happily ever after !!

Hi everyone, Today we are going to understand a bit about using the spark streaming to transform and transport data between Kafka topics. The demand for stream processing is increasing every day. The reason is that often, processing big volumes of data is not enough. We need real-time processing of data especially when we need to handle continuously increasing volumes of data and also need Continue Reading

KnolX: Understanding Spark Structured Streaming

Hello everyone, Knoldus organized a session on 05th January 2018. The topic was “Understanding Spark Structured Streaming”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides:

Knolx: Guaranteed No Stress Baby Steps Using Akka Streams Part-II

Hello everyone, Knoldus organized a session on 25th November 2017. The topic was “Guaranteed No Stress Baby Steps Using Akka Streams Part-II”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides:

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!