Author: Ayush Hooda

Akka: A modern need

Reading Time: 3 minutes There has been some time since the Akka has been in news and there is some reason for being so. But before moving to those key points, let’s revise/understand what Akka actually is?

CRT020: Databricks Spark Certification

Reading Time: 3 minutes Last week, I cleared my Spark Certification from Databricks. Here is the link to the exam. In this post, I’ll try to cover each and every related thing which is required to clear this exam.

Scala Extractors

Scala: Extractors and Pattern Matching

Reading Time: 3 minutes An extractor in Scala is an object which has an unapply method as one of its members. Often, the extractor object also defines a method apply for building values, but this is not required. An apply method is like a constructor which takes arguments and creates an object, the unapply method takes an object and tries to give back the arguments. The unapply method reverses the construction procedure of the Continue Reading

Knolx: Structured Streaming in Spark

Reading Time: < 1 minute Knoldus has organized a session on 08th February 2019. The topic was “Understanding Spark Structured Streaming”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides: Video: If you have any query, then please feel free to comment below.

Scala: Type Bounds

Reading Time: 3 minutes As we have already discussed in my last blog about the Type Parameterization concepts like Generics, Generic classes, and Variance. So to follow up, in this blog we’ll be discussing type bounds. Type Bounds: How can we create type parameter restrictions? To create them we make use of Scala type bounds. In Scala, Type Bounds are restrictions on Type Parameters or Type Variable. By using Type Continue Reading

Scala: Generic classes and Variance

Reading Time: 4 minutes Generic classes are classes which take a type as a parameter. This means, one class can be used with different types without actually writing down it multiple times. They are particularly useful for collection classes. Defining a generic class: Generic classes take a type as a parameter within square brackets [ ]. One convention is to use the letter A as type parameter identifier, though Continue Reading

Spark: Introduction to Datasets

Reading Time: 3 minutes As I have already discussed in my previous blog Spark: RDD vs DataFrames about the shortcomings of RDDs and how DataFrames overcome them. Now we’ll try to have a look at the shortcomings of DataFrames and how Dataset APIs can overcome them. DataFrames:- A DataFrame is a distributed collection of data, which is organized into named columns. Conceptually, it is equivalent to the relational tables with Continue Reading

Spark: RDD vs DataFrames

Reading Time: 3 minutes Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations.One use of Spark SQL is to execute SQL queries. When running SQL from within another Continue Reading

Functional Programming: Lambda Calculus

Reading Time: 3 minutes We have already explored the introduction to FP in my previous blog. Once you get into FP, you’ll quickly start hearing the terms lambda and lambda calculus. Lambda: Lambda is basically just a symbol represented as λ that Alonzo Church chose when he first defined the concept of a function. In modern functional programming, lambda means “anonymous function“. Calculus: Calculus is means a formal system in mathematical logic for expressing Continue Reading

HDFS: A Conceptual View

Reading Time: 5 minutes There has been a significant boom in distributed computing over the past few years. Various components communicate with each other over network inspite of being deployed on different physical machines. A distributed file system (DFS) is a file system with data stored on a server. The data is accessed and processed as if it was stored on the local client machine. The DFS makes it convenient to share information Continue Reading

Back2Basics: Algebraic Data Types (ADTs)

Reading Time: 4 minutes To understand ADTs, firstly, we have to understand what is the meaning of the word “algebra“. Algebra Algebra is basically the study of mathematical symbols and the rules for manipulating these symbols. So, logically, an algebra can be thought of as consisting of two things: A set of objects The operations that can be applied to those objects to create new objects Numeric algebra Numeric algebra is the Continue Reading

Dependency Injection: The Core

Reading Time: 4 minutes We can define Dependency Injection as a technique where one object supplies the dependencies to other objects. But what actually does it mean? Well, there are many answers to this question and sometimes they are quite confusing and annoying. But in this blog, I’ll try to keep things quite straightforward and as simple as possible. A class has a dependency on another class if it Continue Reading

Scala 3.0

Functional Programming: A Paradigm

Reading Time: 4 minutes It’s surprisingly hard to find a consistent definition of functional programming. But I think you can define FP with just two statements: 1. FP is about writing software applications using only pure functions. 2. When writing FP code you only use immutable values. Hence, Functional programming is a way of writing software applications using only pure functions and immutable values. Now, let us understand the Continue Reading