delta lake

Apache Spark: Delta Lake as a Solution – Part II

Reading Time: 3 minutes Well, we have already covered the missing features in Apache Spark & also the causes of the issue in executing Delta Lake in Part1. However, today we will be talking about What Delta Lake is & how it provides the solution to all those problems discussed herein Delta Lake as a Solution: Part1.As we all know that Spark is just a processing engine, it doesn’t Continue Reading

Apache Spark: Delta Lake as a Solution – Part I

Reading Time: 3 minutes Today, everyone is talking about Delta Lake. Why? Ever tried to find the answer to this question? Yes or No doesn’t matter, don’t worry here in Part1 we will be discussing the same & also will be targetting the following questions: What are the features missing from Apache Spark? What kind of issues it causes in executing Data Lake? Answering the above questions will definitely Continue Reading

Spark: ACID Transaction with Delta Lake

Reading Time: 3 minutes Spark doesn’t provide some of the most essential features of a reliable data processing system such as Atomic APIs and ACID transactions as discussed in the blog Spark: ACID compliant or not. Spark welcomes a solution to the problem by working with Delta Lake. Delta Lake plays an intermediary service between Apache Spark and the storage system. Instead of directly interacting with the storage layer, Continue Reading

Time Travel: Data versioning in Delta Lake

Reading Time: 3 minutes In today’s Big Data world, we process large amounts of data continuously and store the resulting data into data lake. This keeps changing the state of the data lake. But, sometimes we would like to access a historical version of our data. This requires versioning of data. Such kinds of data management simplifies our data pipeline by making it easy for professionals or organizations to Continue Reading

Diving deeper into Delta Lake

Reading Time: 6 minutes Delta Lake is an open-source storage layer that brings reliability to data lakes. It has numerous reliability features including ACID transactions, scalable metadata handling, and unified streaming and batch data processing.

Delta Lake To the Rescue

Reading Time: 4 minutes Welcome Back. In our previous blogs, we tried to get some insights about Spark RDDs and also tried to explore some new things in Spark 2.4. You can go through those blogs here: RDDs – The backbone of Apache Spark Spark 2.4: Adding a little more Spark to your code In this blog, we will be discussing something called a Delta Lake. But first, let’s Continue Reading