Author: Rahul Agarwal

Creating a DataFrame in Apache Spark from scratch

Reading Time: 3 minutes In Apache Spark, we have what’s called a DataFrame which is the primary abstraction that Spark provides for use. In this blog, we will learn how to create a DataFrame in Spark from scratch. Introduction In broad terms, a DataFrame(DF) is a distributed, table-like structure with rows and columns and has a well-defined schema. DataFrames can be constructed from a wide variety of sources such Continue Reading

Using Spark as a Database

Reading Time: 4 minutes You must have heard that Apache Spark is a powerful distributed data processing engine. But do you know that Spark (with the help of Hive) can also act as a database? So, in this blog, we will learn how Apache Spark can be leveraged as a database by creating tables in it and querying upon them. Introduction Since Spark is a database in itself, we Continue Reading

Understanding Java enums

A Guide to Method References in Java 8

Reading Time: 2 minutes In this blog, we will get to know about method references in Java, where and how to use them. What are Method References? Method References allow us to pass the name of a method where a functional interface is expected. This brings us to another question. What is a Functional Interface? Functional Interfaces are single-method interfaces that encapsulate a single behaviour. For eg. Runnable which Continue Reading

Java8 Futures: Introduction & Best Practices

Reading Time: 3 minutes Hi there! Today, we are going to talk about Futures in Java. We will also look at some of the best practices related to them. What are Java Futures and why do we need them? To understand this better, firstly we must understand what is blocking and why is it bad for our software. BLOCKING – A blocking/long-running call occurs when a thread is tied Continue Reading