ML, AI and Data Engineering

Spark Structured Streaming (Part 3) – Stateful Streaming

Reading Time: 4 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Internals of Structured Streaming“. And this blog pertains to Stateful Streaming in Spark Structured Streaming. So let’s get started. Let’s start from the very basic understanding of what is Stateful Stream Processing. But to understand that, let’s first understand what Stateless Stream Processing is. In Continue Reading

Analysis of campus placement dataset using decision tree

Reading Time: 3 minutes KNIME Analytics Platform is open-source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone. With KNIME Analytics Platform, you can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding. Hello, folks! In this blog, we will analyse the Campus placement data Continue Reading

Stateful Streaming in Spark

Reading Time: 4 minutes Apache Spark is a fast and general-purpose cluster computing system. In Spark, we can do the batch processing and stream processing as well. It does near real-time processing. It means that it processes the data in micro-batches. I have discussed more Spark Streaming in my previous blog. Now in this blog, I’ll discuss Stateful Streaming in Spark. So let’s start !! What is Stateful Streaming? Continue Reading

Spark Structured Streaming (Part 2) – The Internals

Reading Time: 2 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Introduction to Structured Streaming“. So I’ll exactly start from the point where I left in the previous blog. Structure of Streaming Query When we call start() API, Spark internally translates this code into a Logical Plan (an abstract representation of what the code does), then Continue Reading

Spark Structured Streaming (Part 1) – Introduction

Reading Time: 5 minutes In this Spark Structured Streaming series of blogs, we will have a deep look into what structured streaming is in a very layman language. So let’s get started. Introduction Structured streaming is a stream processing engine built on top of the Spark SQL engine and uses the Spark SQL APIs. It is fast, scalable and fault-tolerant. It provides rich, unified and high-level APIs in the Continue Reading

KSnow: Know about Cloning in Snowflake

Reading Time: 2 minutes This blog pertains to Cloning feature in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Zero Copy Clone Cloning also Snowflake as Zero Copy Clone in Snowflake. It used to create a copy of a Table or Schema or a Database. In most database, in order to make a copy Continue Reading

Product demand forecasting with Knime

Reading Time: 5 minutes In this blog, we are going to see, Importance of demand forecasting and how we can easily create these forecasting workflows with Knime. Market request forecasting is a basic procedure for any business, however maybe none more so than those in buyer packaged products. Stock, production, storage, delivering, showcasing – each aspect of CPG and retail organizations’ activities are influenced by accurate forecasting. Identifying shoppers’ Continue Reading

KSnow: Time Travel and Fail-safe in Snowflake

Reading Time: 5 minutes This blog pertains to Time Travel and Fail-safe in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Introduction to Time Travel Snowflake allows accessing historical data of a point in the past that may have been modified or deleted at the current time. Using time travel functionality a number of Continue Reading

ICC Test Cricket Data Analysis using KNIME

Reading Time: 4 minutes KNIME Analytics Platform is open-source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone. With KNIME Analytics Platform, you can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding. Hello, folks! In this blog, we will analyse Continue Reading