Author: Niraj Kumar

Lasso And Ridge Regression

Reading Time: 4 minutes In this blog, we will learn about lasso regression and ridge regression techniques of regression. We will compare and analyze the methods in detail. Introducing Linear Models Linear regression is a type of linear model which is the most basic and commonly used predictive algorithm. This can not be dissociated from its simple, yet effective architecture. A linear model assumes a linear relationship between input Continue Reading

Terraform: Loops with For Expressions

Reading Time: 4 minutes In this blog, we are going to learn about how to use FOR loops in terraform. INTRODUCTION As we know, Terraform is a declarative language. Infrastructure-as-code in a declarative language tends to provide a more accurate about deployed items. It is easier to reason about and makes it easier to keep the codebase small. However, without access to a full programming language, certain types of Continue Reading

Terraform: Loops with Count and Problems

Reading Time: 6 minutes In this blog, we are going to expand our Terraform toolbox with some more advanced tips & tricks, such as how to use loops with the count. We’ll also discuss some of Terraform’s weaknesses so we can avoid the most common problems. Introduction As we know, Terraform is a declarative language. Infrastructure-as-code in a declarative language tends to provide a more accurate about deployed items. Continue Reading

Flink Architecture And Cluster Deployment

Reading Time: 4 minutes In this blog, we will be discussing Flink Architecture and its core components. Introduction Flink is a distributed system and requires effective allocation and management of compute resources in order to execute streaming applications. It integrates with all common cluster resource managers such as Hadoop YARN and Kubernetes,. In addition it,it can run standalone cluster or even as a library. Components of a Flink Cluster The Flink Architecture Continue Reading

Stateful processing with Apache Beam

Reading Time: 6 minutes Overview Beam lets us process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam. With these new features, we can unlock newer use cases and newer efficiencies Quick Recap In Beam, a big data processing pipeline is a directed, acyclic graph of parallel operations called PTransforms processing data from PCollections. The boxes are PTransforms and the edges Continue Reading

ProtoBuf: New way of Serialization

Reading Time: 5 minutes In this blog, we will learn how to use ProtoBuf with java and its comparison with JSON. Overview Protobuf is short for protocol buffers, which are language- and platform-neutral mechanisms for serializing structured data for use in communications protocols, data storage, and more. Think XML, but smaller, faster, and simpler The method involves an interface description language that describes the structure of some data and a program Continue Reading

Apache Beam Overview

Reading Time: 2 minutes This blog gives an overview of Apache Beam. What is Apache Beam? Apache Beam is an open-source, unified model for defining both batches as well as streaming data-parallel processing pipelines. Moreover available open-source Beam SDKs, can help us to easily build a program for our pipeline. Apache Flink, Apache Spark, and Cloud DataFlow are some of the possible runners to run the program. Why use Continue Reading