Structured Streaming

Spark: Introduction to Datasets

As I have already discussed in my previous blog Spark: RDD vs DataFrames about the shortcomings of RDDs and how DataFrames overcome them. Now we’ll try to have a look at the shortcomings of DataFrames and how Dataset APIs can overcome them. DataFrames:- A DataFrame is a distributed collection of data, which is organized into named columns. Conceptually, it is equivalent to the relational tables with Continue Reading

Spark Streaming vs. Structured Streaming

Fan of Apache Spark? I am too. The reason is simple. Interesting APIs to work with, fast and distributed processing, unlike map-reduce no I/O overhead, fault tolerance and many more. With this much, you can do a lot in this world of Big data and Fast data. From “processing huge chunks of data” to “working on streaming data”, Spark works flawlessly in all. In this Continue Reading

Spark Structured Streaming with Elasticsearch

There’s been a lot of time we have been working on streaming data. Using Apache Spark for that can be much convenient. Spark provides two APIs for streaming data one is Spark Streaming which is a separate library provided by Spark. Another one is Structured Streaming which is built upon the Spark-SQL library. We will discuss the trade-offs and differences between these two libraries in Continue Reading

Structured Streaming: Philosophy behind it

In our previous blogs: Structured Streaming: What is it? & Structured Streaming: How it works? We got to know 2 major points about Structured Streaming – It is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. It treats the live data stream as a table that is being continuously appended/updated which allows us to express our streaming computation as Continue Reading

Spark Structured Streaming: A Simple Definition

“Structured Streaming”, nowadays we are hearing this term in Apache Spark ecosystem quite a lot, as it is being preached as next big thing in scalable big data world. Although, we all know that Structured Streaming means a stream having structured data in it, but very few of us knows what exactly it is and where we can use it. So, in this blog post Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!