Author: kundankumarr

Delta Lake: Schema Enforcement & Evolution

Reading Time: 4 minutes Nowadays data is constantly evolving and changing. As well as the business problems and requirements are evolving, the shape or the structure of the data is also changing. When that happens, we want to be in control of how the data or schema changes. But how we can achieve this? Delta Lake has good ways to control how schema changes. With Delta Lake, users have Continue Reading

fetching data from different sources using Spark 2.1

Spark: createDataFrame() vs toDF()

Reading Time: 2 minutes There are two different ways to create a Dataframe in Spark. First, using toDF() and second is using createDataFrame(). In this blog we will see how we can create Dataframe using these two methods and what’s the exact difference between them. toDF() toDF() method provides a very concise way to create a Dataframe. This method can be applied to a sequence of objects. To access Continue Reading

Cluster vs Client: Execution modes for a Spark application

Reading Time: 3 minutes Whenever we submit a Spark application to the cluster, the Driver or the Spark App Master should get started. And the Driver will be starting N number of workers. Spark driver will be managing spark context object to share the data and coordinates with the workers and cluster manager across the cluster. Cluster Manager can be Spark Standalone or Hadoop YARN or Mesos. Workers will Continue Reading

fetching data from different sources using Spark 2.1

Spark: Type Safety in Dataset vs DataFrame

Reading Time: 4 minutes With type safety, programming languages prevents type errors, or we can say that type safety means the compiler will validate type while compiling, and throw an error when we try to assign a wrong type to a variable. Spark, a unified analytics engine for big data processing provides two very useful API’s DataFrame and Dataset that is easy to use, and are intuitive and expressive which makes Continue Reading

Scala: map vs flatMap

Reading Time: 3 minutes While working with collections in Scala we frequently find ourselves using two most popular Functional combinators i.e, map() and its close cousin flatMap(). Both are higher-order functions. Click here to know more about higher order functions. In this blog, we will explore the the map and flatMap in detail. map(): The map method transforms a collection by applying a function to each element of that collection. map is a function Continue Reading

A tour to the Scala Tuples

Reading Time: 3 minutes In Scala, a tuple is a class that gives us a simple way to store heterogeneous items or different data types in the same container. A tuples purpose is to combine a fixed and finite number of items together to allow the programmer to pass a tuple as a whole. A tuple is immutable in Scala. Click here to know more about mutability and immutability. Continue Reading

Scala variables – var vs val

Reading Time: 3 minutes We know in any programming language Variables are used to store information to be referenced and manipulated. They also provide a way of labeling data with a descriptive name, so our programs can be understood more clearly by the reader and ourselves. Scala – a multi-paradigm programming language allows one to declare variables mutable or immutable. We can create mutable variables via keyword var and immutable variables via val keyword. Let’s understand each one Continue Reading

Spark: ACID Transaction with Delta Lake

Reading Time: 3 minutes Spark doesn’t provide some of the most essential features of a reliable data processing system such as Atomic APIs and ACID transactions as discussed in the blog Spark: ACID compliant or not. Spark welcomes a solution to the problem by working with Delta Lake. Delta Lake plays an intermediary service between Apache Spark and the storage system. Instead of directly interacting with the storage layer, Continue Reading

Time Travel: Data versioning in Delta Lake

Reading Time: 3 minutes In today’s Big Data world, we process large amounts of data continuously and store the resulting data into data lake. This keeps changing the state of the data lake. But, sometimes we would like to access a historical version of our data. This requires versioning of data. Such kinds of data management simplifies our data pipeline by making it easy for professionals or organizations to Continue Reading

The breaking changes in Dgraph v1.1.0

Reading Time: 4 minutes Dgraph v1.1.0 was released on 3rd September, 2019 with significant changes and new features. So, It becomes important to know these changes that could break our existing code when we try to upgrade Dgraph with the new version. In this blog, we will cover the important changes introduced. We can find all the changes and new features detail in the change-log. 1. Predicates of type UID Continue Reading

Whats new in Dgraph v1.1.0?

Reading Time: 3 minutes In this blog, we will cover the new features Introduced in Dgraph v1.1.0 which was released on 3rd September. We can also find the breaking changes in dgraph v1.1.0 . Also, we can find all the changes and new features detail in the change-log. New Type System This version of Dgraph supports a type system that can be used to categorize nodes and query them based Continue Reading

JSON Web Token (JWT): A Complete Guide

Reading Time: 5 minutes In this blog, we are going to talk about a very powerful yet simple way to represent an user identity securely during a two-party interaction. What I mean to say, when two systems exchange data we can identify an user without having to send private credentials on every request. But how it is possible? The answer is JSON Web Token. So, we are going to Continue Reading