Author: kundankumarr

BigQuery: Querying nested arrays

Reading Time: 2 minutes In a previous blog, we had seen BigQuery facilitate efficient data warehouse schema design. BigQuery supports the nested & repeated columns. We can use a combination of ARRAY and STRUCT data types to define our schema in BigQuery. It enables to denormalize data efficiently in single table. In this blog, for the same schema of sales data, we will execute a few DML operations on nested array fields. Schema In Continue Reading

BigQuery:  Efficient Data Warehouse Schema Design

Reading Time: 3 minutes Conventional data warehouses support data models based on star schema and snowflake schema. In these models, there are a number of fact tables and dimension tables. In order to minimize redundancy it is recommends to split data into multiple tables in . This is a normalization process. Normalization is the technique of eliminating the redundant data. It minimize the insertion, deletion, and update anomalies. It saves the disk Continue Reading

BigQuery: Rescue to the Conventional Data warehouse Problems

Reading Time: 4 minutes The present and future of every industry sector somehow depends on the ability to use the massive amounts of data. Use the data available to drive better product quality at a lower cost. Make favourable business decisions with data. Primarily, for decades, to store a wide variety of massive data and perform analysis on it, using Data Warehouse solutions. Traditional data warehouses designed on-premise specifically Continue Reading

Apache Beam: Ways to join PCollections

Reading Time: 4 minutes Joining multiple sets of data into a singular entity is very often when working with data pipelines. In this blog, We will cover how we can perform Join operations between datasets in Apache Beam. There are different ways to Join PCollections in Apache beam – Extension-based joins Group-by-key-based joins Join using side input Let’s understand the above different way’s to perform Join with examples. We Continue Reading

Apache Beam: Side input Pattern

Reading Time: 3 minutes Apache Beam is a unified programming model for defining both batch and streaming data-parallel processing pipelines. It is a modern way of defining data processing pipelines. It has rich sources of APIs and mechanisms to solve complex use cases. In some use cases, while we define our data pipelines the requirement is, the pipeline should use some additional inputs. For example, In streaming analytics applications, it Continue Reading

Stateful stream processing with Apache Flink(part 1): An introduction

Reading Time: 4 minutes Apache Flink, a 4th generation Big Data processing framework provides robust stateful stream processing capabilities. So, in a few parts of the blogs, we will learn what is Stateful stream processing. And how we can use Flink to write a stateful streaming application. What is stateful stream processing? In general, stateful stream processing is an application design pattern for processing an unbounded stream of events. Continue Reading

A Quick Demo: Kafka to Flink to Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Flink with Kafka and Cassandra to build a simple streaming data pipeline. Apache Flink is a framework and distributed processing engine. it is used for stateful computations over unbounded and bounded data streams.Kafka is a scalable, high performance, low latency platform. It allows reading and writing streams of data like a messaging system.Cassandra: A distributed and wide-column Continue Reading

Loading JSON data into Snowflake

Reading Time: 4 minutes Have you ever faced any use case or scenario where you’ve to load JSON data into the Snowflake? We better know JSON data is one of the common data format to store and exchange information between systems. JSON is a relatively concise format. If we are implementing a database solution, it is very common that we will come across a system that provides data in Continue Reading

Flink: Join two Data Streams

Reading Time: 3 minutes Apache Flink offers rich sources of API and operators which makes Flink application developers productive in terms of dealing with the multiple data streams. Flink provides many multi streams operations like Union, Join, and so on. In this blog, we will explore the Window Join operator in Flink with an example. It joins two data streams on a given key and a common window. Let say we have one stream which contains salary information of all Continue Reading

Flink: Union operator on Multiple Streams

Reading Time: 3 minutes Apache Flink offers rich sources of API and operators which makes Flink application developers productive in terms of dealing with the multiple data streams. Flink provides many multi streams operations like Union, Join, and so on. In this blog, we will explore the Union operator in Flink that can combine two or more data streams together. We know in real-time we can have multiple data streams from different sources Continue Reading

Flink: Implementing the Session window.

Reading Time: 3 minutes In the previous blogs, we learned about Tumbling, Sliding, and Count windows in Flink. There is one another useful way to window the data which Flink offers i.e, Session window. So in this blog, we will explore the Session window in detail with an example. In the real world, all the work that we do online- Visiting a website, Clicking around the website, do online Continue Reading

Flink: Implementing the Count Window

Reading Time: 3 minutes In the blog, we learned about Tumbling and Sliding windows which is based on time. In this blog, we are going to learn to define Flink’s windows on other properties i.e Count window. As the name suggests, count window is evaluated when the number of records received, hits the threshold. Count window set the window size based on how many entities exist within that window. For example, if we fixed the count Continue Reading

Flink: Time Windows based on Processing Time

Reading Time: 4 minutes In the previous blog, we talked about Flink’s windows operator, a heart of processing infinite streams. Generally in Flink, after specifying that the stream is keyed or non keyed, the next step is to define a window assigner. The window assigner defines how elements are assigned to windows. Flink provides some useful predefined window assigners like Tumbling windows, Sliding windows, Session windows, Count windows, and Continue Reading