Author: Himanshu Gupta

Structured Streaming: Philosophy behind it

In our previous blogs: Structured Streaming: What is it? & Structured Streaming: How it works? We got to know 2 major points about Structured Streaming – It is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. It treats the live data stream as a table that is being continuously appended/updated which allows us to express our streaming computation as Continue Reading

Structured Streaming: How it works?

In our previous blog post – Structured Streaming: What is it? we got to know that Structured Streaming is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. Now it’s time to learn  – How it works? So, in this blog post, we will look at the working of a structured stream via an example. So, let’s take a Continue Reading

Structured Streaming: What is it?

With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like – Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on solving business challenges. The reason is, the frameworks (the ones mentioned above) provided inbuilt support for all of them. For example: In Spark Streaming, by just adding Continue Reading

A Beginner’s Guide to Deploying a Lagom Service Without ConductR

How to deploy a Lagom Service without ConductR? This question has been asked and answered by many, on different forums. For example, take a look at this question on StackOverflow – Lagom without ConductR? Here the user is trying to know whether it is possible to use Lagom in production without ConductR or not. To which the best answer that came up was – “Yes, it is Continue Reading

KnolX: Understanding Spark Structured Streaming

Hello everyone, Knoldus organized a session on 05th January 2018. The topic was “Understanding Spark Structured Streaming”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides:

KnolX: Learning Kafka Streams with Scala

Hello everyone, Knoldus organized a session on 22nd September 2017. The topic was “Learning Kafka Streams with Scala”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides: Video: If you have any query, then please feel free to comment below.  

Self-Learning Kafka Streams with Scala – #2

In our previous blog – Self-Learning Kafka Streams with Scala – #1, we saw how to create a simple KStream in Scala. In this blog, we will see how to transform a KStream and create a new Stream from it. But, before we get into the details of the KStream transformations, let’s take a look at the code:

Self-Learning Kafka Streams with Scala – #1

A few days ago, I came across a situation where I wanted to do a stateful operation on the streaming data. So, I started finding possible solutions for it. I came across many solutions which were using different technologies like Spark Structured Streaming, Apache Flink, Kafka Streams, etc. All the solutions solved my problem, but I selected Kafka Streams because it met most of my Continue Reading

Spark Structured Streaming: A Simple Definition

“Structured Streaming”, nowadays we are hearing this term in Apache Spark ecosystem quite a lot, as it is being preached as next big thing in scalable big data world. Although, we all know that Structured Streaming means a stream having structured data in it, but very few of us knows what exactly it is and where we can use it. So, in this blog post Continue Reading

Apache Spark: 3 Reasons Why You Should Not Use RDDs

Apache Spark, whenever we hear these two words, the first thing that comes to our mind is RDDs, i.e., Resilient Distributed Datasets. Now, it has been more than 5 years since Apache Spark came into existence and after its arrival a lot of things got changed in big data industry. But, the major change was dethroning of Hadoop MapReduce. I mean Spark literally replaced MapReduce and this happened Continue Reading

Hello React #2: Smallest Example in React

Hello Folks, In our previous blog post – Hello React #1: Creating a Single Page Application with React, we saw how to create a SPA with React. But, we didn’t got into its details. Here is the link for a quick recap. Before we start building production grade applications with React, lets create a small example in React which would give us an idea, as to Continue Reading

Hello React #1: Creating a Single Page Application with React

Few days ago we were looking for a JavaScript library which is flexible and can be used in a variety of projects. Basically, something with which we can create new apps, or introduce into an existing project without rewriting it. However, we came across React, a JavaScript library for building user interfaces and AngularJS, a JavaScript MVW Framework. Before we start creating a single page application with React Continue Reading

Partition-Aware Data Loading in Spark SQL

Data loading, in Spark SQL, means loading data in memory/cache of Spark worker nodes. For which we use to write following code: val connectionProperties = new Properties() connectionProperties.put(“user”, “username”) connectionProperties.put(“password”, “password”) val jdbcDF = spark.read .jdbc(“jdbc:postgresql:dbserver”, “schema.table”, connectionProperties) In here we are using jdbc function of DataFrameReader API of Spark SQL to load the data from table into Spark Executor’s memory, no matter how many rows are Continue Reading

%d bloggers like this: