Database

Apache Spark: Tricks to Increase Job Performance

Reading Time: 2 minutes Apache Spark is quickly adopting the Real-world and most of the companies like Uber are using it in their production. Spark is gaining its popularity in the market as it also provides you with the feature of developing Streaming Applications and doing Machine Learning, which helps companies get better results in their production along with proper analysis using Spark. Although companies are using Spark in Continue Reading

Spark: ACID Transaction with Delta Lake

Reading Time: 3 minutes Spark doesn’t provide some of the most essential features of a reliable data processing system such as Atomic APIs and ACID transactions as discussed in the blog Spark: ACID compliant or not. Spark welcomes a solution to the problem by working with Delta Lake. Delta Lake plays an intermediary service between Apache Spark and the storage system. Instead of directly interacting with the storage layer, Continue Reading

The breaking changes in Dgraph v1.1.0

Reading Time: 4 minutes Dgraph v1.1.0 was released on 3rd September, 2019 with significant changes and new features. So, It becomes important to know these changes that could break our existing code when we try to upgrade Dgraph with the new version. In this blog, we will cover the important changes introduced. We can find all the changes and new features detail in the change-log. 1. Predicates of type UID Continue Reading

Want to know about Greenplum?

Reading Time: 6 minutes Hello Developer’s, this blog is about what is Greenplum and the feature of Greenplum. So it is an MPP SQL Database based on PostgreSQL. Greenplum Database scales to multi-petabyte data sizes with ease. It also allows a cluster of powerful servers to work together to provide a single SQL interface to the data. We also discuss how to install Greenplum on the system. What is Continue Reading

Data Lake – Build it in Phases

Reading Time: 3 minutes Data Lake – How to build a data lake and what are the phases involved in the same.

Apache Spark: Read Data from S3 Bucket

Reading Time: 2 minutes Well, a one working with spark is very much familiar with the ways of reading the file from local either from a Table or HDFS or from any file. But do you know how tricky it is to read data into spark from an S3 bucket? So, this blog makes you give a stepwise follow up to how to read data from an S3 bucket. Continue Reading

Apache Spark: Repartitioning v/s Coalesce

Reading Time: 3 minutes Does partitioning help you increase/decrease the Job Performance? Spark splits data into partitions and computation is done in parallel for each partition. It is very important to understand how data is partitioned and when you need to manually modify the partitioning to run spark applications efficiently. Now, diving into our main topic i.e Repartitioning v/s Coalesce What is Coalesce? The coalesce method reduces the number Continue Reading

Understanding data persistence in Lagom

Reading Time: 4 minutes When we create any microservice, or in general any service, one of the biggest task is to manage data persistence. Lagom supports various databases for doing this task. By default, Lagom uses Cassandra to persist data.

Big Data Landscape explained

Reading Time: 5 minutes Big Data has now evolved into a buzz word and it seems everyone is either working on it or want to work on it. However, most of the people associate Big Data with some of the popular tool sets like Hadoop, Spark, NoSql databases like Hive, Cassandra , HBase etc. HDFS made Big Data popular as it gave us an option to distribute the data Continue Reading

An Overview of the Stored Procedures in PostgreSQL

Reading Time: 4 minutes As you may know in all the versions up to PostgreSQL 10, it was not possible to create a procedure in PostgreSQL. In  PostgreSQL11, PROCEDURE was added as a new schema object which is a similar object to FUNCTION, but without a return value. Over the years many people were anxious to have the functionality and it was finally added in  PostgreSQL. Traditionally,  PostgreSQL has provided all Continue Reading

Apache Spark

Deep Dive into Apache Spark Transformations and Action

Reading Time: 4 minutes In our previous blog of Apache Spark, we discussed a little about what Transformations & Actions are? Now we will get deeper into the topic and will understand what actually they are & how they play a vital role to work with Apache Spark? What is Spark RDD? Spark introduces the concept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, distributed collection of objects Continue Reading

Reactive Architecture

Reading Time: 2 minutes Recently I got an invitation to present a guest lecture for faculty of Engineering colleges in ABES college of Engineering. I came up with the most trending topic i.e Reactive Architecture. We talked about what is this buzzing keywords and why does it came into existence. Also What are the challenges one were facing and how are the real world problems being solved by using Continue Reading

Tale of Apache Spark

Reading Time: 6 minutes Data is being produced extensively in today’s world and it is going to be generated more rapidly in future. 90% of total data that is produced in the world is produced in last two years only and it is estimated that in 2020 world’s total data would reach 45 ZB and data generated each day would be enough that if we try to store it Continue Reading