Database

Top-5-Reasons-to-Convert-Your-Cloud-Data-Lake-to-a-Delta-Lake

Top 5 Reasons to Convert Your Cloud Data Lake to a Delta Lake

Reading Time: 6 minutes There are various resources that give advice on how to [and how not to] partition your data, how to calculate the ideal file size, how to handle evolving schemas, how to build compaction routines, how to recover from failed ETL jobs, how to stream raw data into the data lake, etc. We have been working with customers throughout this time to encapsulate all of the Continue Reading

DAO | Abstraction of the Application/Business layer from the persistence layer

Reading Time: 3 minutes DAO…. In this blog we’ll be learning about data access objects , it’s pros and cons and implementation in Scala Language. What is DAO …. ? It’s an abbreviation of Data Access Object. It is a structural pattern that provides an abstract API to isolate the application/business layer from the persistence layer . This Layer could be a database or any other persistence mechanism The main purpose Continue Reading

Indexes in Cassandra

Reading Time: 2 minutes Cassandra is a distributed database from Apache which is highly scalable and effective in managing large amounts of structured data. It provides high availability with no single point of failure. Cassandra is column oriented DB. Often used for time series data. Primary keys in Cassandra It is a primary key database which means data is persisted and organised around a cluster based on hash values(partition Continue Reading

Generate logical plan in Calcite

Reading Time: 2 minutes Hello everyone! in the previous blog of Apache Calcite we discussed how Apache Calcite helps you to parse the database query and some basics. In this blog, we will discuss how to generate the logical plan of the database query you have written. What is logical plan A logical plan is a relational expression with only a logical operator. Logical algebra has no implementation of the relational operator and therefore Continue Reading

Different types of Keys in DBMS (Database Management)

Reading Time: 4 minutes Introduction A huge amount of data is available in this real-world. Now, for storing the data in DBMS, a large number of tables are required. These tables may contain thousand of duplicate, sorted, and unsorted Records. Now, to fetch any particular or specific record, without any constraints/ restrictions from these tables is a very difficult process. To overcome all the difficulties, a new concept of Continue Reading

A Quick Demo: Kafka to Flink to Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Flink with Kafka and Cassandra to build a simple streaming data pipeline. Apache Flink is a framework and distributed processing engine. it is used for stateful computations over unbounded and bounded data streams.Kafka is a scalable, high performance, low latency platform. It allows reading and writing streams of data like a messaging system.Cassandra: A distributed and wide-column Continue Reading

Loading JSON data into Snowflake

Reading Time: 4 minutes Have you ever faced any use case or scenario where you’ve to load JSON data into the Snowflake? We better know JSON data is one of the common data format to store and exchange information between systems. JSON is a relatively concise format. If we are implementing a database solution, it is very common that we will come across a system that provides data in Continue Reading

Using Spark as a Database

Reading Time: 4 minutes You must have heard that Apache Spark is a powerful distributed data processing engine. But do you know that Spark (with the help of Hive) can also act as a database? So, in this blog, we will learn how Apache Spark can be leveraged as a database by creating tables in it and querying upon them. Introduction Since Spark is a database in itself, we Continue Reading

How to Analyze query performance in MongoDB

Reading Time: 2 minutes Analyze query performance in mongodb may became complicated if we do not really know which part should be measured. Fortunately, MongoDB provides very handy tool which can be used to evaluate query performance: explain(“executionStats”). This tool provide us some general measurements such as number of examined document and execution time that can be used to do statistical analysis. The Database and Collection In this easy tutorial, Continue Reading

Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline. Spark Structured Streaming is a component of Apache Spark framework that enables scalable, high throughput, fault tolerant processing of data streams.Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data Continue Reading

Incorporate Postgres with Rust

Reading Time: 4 minutes PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance. Hello, folks! your wait is over, we have come up with a new blog. In this blog, we will discuss how we can incorporate the Postgres database using Rust programming language with the help of a sample example. I Continue Reading

KSnow: Know about Cloning in Snowflake

Reading Time: 2 minutes This blog pertains to Cloning feature in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Zero Copy Clone Cloning also Snowflake as Zero Copy Clone in Snowflake. It used to create a copy of a Table or Schema or a Database. In most database, in order to make a copy Continue Reading