NoSql

Apache Spark: Delta Lake as a Solution – Part I

Reading Time: 3 minutes Today, everyone is talking about Delta Lake. Why? Ever tried to find the answer to this question? Yes or No doesn’t matter, don’t worry here in Part1 we will be discussing the same & also will be targetting the following questions: What are the features missing from Apache Spark? What kind of issues it causes in executing Data Lake? Answering the above questions will definitely Continue Reading

Couchbase Disaster recovery

Couchbase – Enhance Database Performance

Reading Time: 5 minutes While transitioning from a Relational to a NoSQL Database, architects expect none or a minimal effect on performance with the scaling up of the size of data.  Dealing with a huge amount of data may be the USP of a Database, but still, we need to design things in order to make them run well at scale. In this blog, I’d try to explain what Continue Reading

Apache Spark: Tricks to Increase Job Performance

Reading Time: 2 minutes Apache Spark is quickly adopting the Real-world and most of the companies like Uber are using it in their production. Spark is gaining its popularity in the market as it also provides you with the feature of developing Streaming Applications and doing Machine Learning, which helps companies get better results in their production along with proper analysis using Spark. Although companies are using Spark in Continue Reading

Spark: ACID Transaction with Delta Lake

Reading Time: 3 minutes Spark doesn’t provide some of the most essential features of a reliable data processing system such as Atomic APIs and ACID transactions as discussed in the blog Spark: ACID compliant or not. Spark welcomes a solution to the problem by working with Delta Lake. Delta Lake plays an intermediary service between Apache Spark and the storage system. Instead of directly interacting with the storage layer, Continue Reading

The breaking changes in Dgraph v1.1.0

Reading Time: 4 minutes Dgraph v1.1.0 was released on 3rd September, 2019 with significant changes and new features. So, It becomes important to know these changes that could break our existing code when we try to upgrade Dgraph with the new version. In this blog, we will cover the important changes introduced. We can find all the changes and new features detail in the change-log. 1. Predicates of type UID Continue Reading

Data Lake – Build it in Phases

Reading Time: 3 minutes Data Lake – How to build a data lake and what are the phases involved in the same.

Apache Spark: Repartitioning v/s Coalesce

Reading Time: 3 minutes Does partitioning help you increase/decrease the Job Performance? Spark splits data into partitions and computation is done in parallel for each partition. It is very important to understand how data is partitioned and when you need to manually modify the partitioning to run spark applications efficiently. Now, diving into our main topic i.e Repartitioning v/s Coalesce What is Coalesce? The coalesce method reduces the number Continue Reading

Apache Spark

Deep Dive into Apache Spark Transformations and Action

Reading Time: 4 minutes In our previous blog of Apache Spark, we discussed a little about what Transformations & Actions are? Now we will get deeper into the topic and will understand what actually they are & how they play a vital role to work with Apache Spark? What is Spark RDD? Spark introduces the concept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, distributed collection of objects Continue Reading

Reactive Architecture

Reading Time: 2 minutes Recently I got an invitation to present a guest lecture for faculty of Engineering colleges in ABES college of Engineering. I came up with the most trending topic i.e Reactive Architecture. We talked about what is this buzzing keywords and why does it came into existence. Also What are the challenges one were facing and how are the real world problems being solved by using Continue Reading

Tale of Apache Spark

Reading Time: 6 minutes Data is being produced extensively in today’s world and it is going to be generated more rapidly in future. 90% of total data that is produced in the world is produced in last two years only and it is estimated that in 2020 world’s total data would reach 45 ZB and data generated each day would be enough that if we try to store it Continue Reading

Can we do joins in MongoDB?

Reading Time: 3 minutes MongoDB is a NoSQL document database designed for ease of development and scaling. The best part about using a relational DBMS is that we can perform a wide range of relational queries on it. Doing joins on different tables is very easy. But, when we talk about MongoDB, the way data is stored here is quite different from any relational DBMS. How data is Stored Continue Reading

First steps to build Spring Boot Application with Couchbase

Reading Time: 2 minutes As the name of the blog suggests, we will be taking the first steps to build a simple application from scratch using Couchbase as a database and spring -boot as a framework. Let’s get started. You can create a starter maven project using this link. Add Web, Lombok and Couchbase dependencies. The pom.xml should look like: <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.1.3.RELEASE</version> </parent> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> Continue Reading