SQL

Different types of Keys in DBMS (Database Management)

Reading Time: 4 minutes Introduction A huge amount of data is available in this real-world. Now, for storing the data in DBMS, a large number of tables are required. These tables may contain thousand of duplicate, sorted, and unsorted Records. Now, to fetch any particular or specific record, without any constraints/ restrictions from these tables is a very difficult process. To overcome all the difficulties, a new concept of Continue Reading

Using Spark as a Database

Reading Time: 4 minutes You must have heard that Apache Spark is a powerful distributed data processing engine. But do you know that Spark (with the help of Hive) can also act as a database? So, in this blog, we will learn how Apache Spark can be leveraged as a database by creating tables in it and querying upon them. Introduction Since Spark is a database in itself, we Continue Reading

Spark SQL in Delta Lake 0.7.0

Reading Time: 3 minutes Nowadays Delta lake is a buzz word in the Big Data world, especially among the spark developers because it relegates lots of issues found in the Big Data domain. Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It is evolving day by day and adds cool features in its every release. Continue Reading

KSnow: Know about Cloning in Snowflake

Reading Time: 2 minutes This blog pertains to Cloning feature in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Zero Copy Clone Cloning also Snowflake as Zero Copy Clone in Snowflake. It used to create a copy of a Table or Schema or a Database. In most database, in order to make a copy Continue Reading

KSnow: Time Travel and Fail-safe in Snowflake

Reading Time: 5 minutes This blog pertains to Time Travel and Fail-safe in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Introduction to Time Travel Snowflake allows accessing historical data of a point in the past that may have been modified or deleted at the current time. Using time travel functionality a number of Continue Reading

KSnow: Load continuous data into Snowflake using Snowpipe

Reading Time: 5 minutes In this blog, we will discuss loading streaming data into Snowflake table using Snowpipe. But before that, if you haven’t read the previous part of this blog i.e., Loading Bulk Data into Snowflake then I would suggest you go through it. As now we have been set so let’s get started and see what Snowpipe is all about. Introduction Snowpipe is a mechanism provided by Continue Reading

Import multiple CSV files into the Postgres through Java/Scala code.

Reading Time: 2 minutes It’s pretty simple to ingest data in the Postgres using the insert query, but in the big data world, we have a lot of data that we can not insert using insert queries. We get the data in CSV files that we want to import directly to the Postgres. It will take a lot of effort and time if we will try to import these Continue Reading

KSnow: Loading Data Into Snowflake

Reading Time: 5 minutes This blog pertains to Loading Data into Snowflake, and I will explain you about the various step involved in this process. So let’s get started. Before moving ahead, you can visit the blog on understanding the basic of Snowflake Data Warehouse in case you want to refresh your concepts. Now let’s talk about the actual topic for which you have click on this blog. To Continue Reading

Apache Spark: Delta Lake as a Solution – Part I

Reading Time: 3 minutes Today, everyone is talking about Delta Lake. Why? Ever tried to find the answer to this question? Yes or No doesn’t matter, don’t worry here in Part1 we will be discussing the same & also will be targetting the following questions: What are the features missing from Apache Spark? What kind of issues it causes in executing Data Lake? Answering the above questions will definitely Continue Reading

Apache Spark: Handle Corrupt/Bad Records

Reading Time: 3 minutes Most of the time writing ETL jobs becomes very expensive when it comes to handling corrupt records. And in such cases, ETL pipelines need a good solution to handle corrupted records. Because, larger the ETL pipeline is, the more complex it becomes to handle such bad records in between. Corrupt data includes: Missing information Incomplete information Schema mismatch Differing formats or data types Apache Spark: Continue Reading

Parsing database Query with Apache Calcite

Reading Time: 3 minutes Hey there, as a technical person sometimes we have to write the query of database and that looks good but we don’t know the query we wrote was syntactically correct or not. So in this blog, we parse the database query and test it using a test case with the help of Apache Calcite. So not wasting any time lets discuss about Apache Calcite and Continue Reading

Database Normalization :: Part 2

Reading Time: 6 minutes Introduction Normalization helps one attain a good database design and thereby ensures continues efficiency of the database. Normalization, which is a process for assigning attributes to entities, offers the following advantages: There are 7 types of Normal forms: In my previous blog, Database Normalization :: Part 1 I’ve discussed about first four.In this blog, we will be looking into 4NF, 5NF and DKNF. Fourth Normal Continue Reading

Database Normalization :: Part 1

Reading Time: 6 minutes Introduction Normalization helps one attain a good database design and thereby ensures continues efficiency of the database. Normalization, which is a process for assigning attributes to entities, offers the following advantages: There are 7 types of Normal forms: In this blog, we will be looking into the first four only, rest I’ll be covering in Part 2 of Database Normalization. First Normal Form (1NF) :- Continue Reading