Analytics

Introduction to Ensemble Learning

Reading Time: 4 minutes Ensemble methods are techniques that create multiple models and then combine them to produce improved results. Ensemble learning usually produces more accurate solutions than a single model would. This has been the case in a number of machine learning competitions and, where the winning solutions used ensemble methods. Ensemble methods You must ensure that your models are independent of one another and when creating a Continue Reading

Build REST API in Scala with Play Framework

Reading Time: 4 minutes Overview In earlier blogs I discussed about play framework now lets we move to further topics on play. For building simple, CRUD-style REST APIs in Scala, the Play Framework is a good solution. It has an uncomplicated API that doesn’t require us to write too much code. In this blog, we’re going to build a REST API in Scala with Play. We’ll use JSON as Continue Reading

Dealing with Missing Values in Python

Reading Time: 4 minutes For any Data Scientist, its very normal to deal with data sets having missing terms and still be able to manage and create a good predictive model out of it. Here we will discuss some techniques to handle missing data in a given data set. Missing Value occur when no data is stored for a variable or feature. It could be represented as “?”, “NA”, Continue Reading

dev-tools

Dev Tools to the Rescue – Part 2

Reading Time: 6 minutes In my previous article Dev Tools to the Rescue – Part 1, we looked at some of the best developer tools for software development, project management, continuous delivery/integrity, designing, testing, etc. In this article, we’ll continue with tools that are helpful for purposes like monitoring, analysis, cloud development, security, etc. Confluence Confluence is a team collaboration application that allows teams to work together and share Continue Reading

Snowflake integration with Power BI tool

Reading Time: 4 minutes Snowflake is a popular Cloud DWH solution and in this blog we will discuss how to get insights from you data using Power BI as the Reporting and Analytics tool.

Understanding persistence in Apache Spark

Reading Time: 4 minutes In this blog, we will try to understand the concept of Persistence in Apache Spark in a very layman term with scenario-based examples. Note: The scenarios are only meant for your easy understanding. Spark Architecture Note: Cache memory can be shared between Executors. What does it mean by persisting/caching an RDD? Spark RDD persistence is an optimization technique which saves the result of RDD evaluation Continue Reading

Spark Structured Streaming (Part 4) – Handling Late Data

Reading Time: 3 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Understanding Stateful Streaming“. And this blog pertains to Handling Late Arriving Data in Spark Structured Streaming. So let’s get started. Handling Late Data With window aggregates (discussed in the previous blog) Spark automatically takes cares of late data. Every aggregate window is like a bucket Continue Reading

Spark Structured Streaming (Part 3) – Stateful Streaming

Reading Time: 4 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Internals of Structured Streaming“. And this blog pertains to Stateful Streaming in Spark Structured Streaming. So let’s get started. Let’s start from the very basic understanding of what is Stateful Stream Processing. But to understand that, let’s first understand what Stateless Stream Processing is. In Continue Reading

Spark Structured Streaming (Part 2) – The Internals

Reading Time: 2 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Introduction to Structured Streaming“. So I’ll exactly start from the point where I left in the previous blog. Structure of Streaming Query When we call start() API, Spark internally translates this code into a Logical Plan (an abstract representation of what the code does), then Continue Reading

Spark Structured Streaming (Part 1) – Introduction

Reading Time: 5 minutes In this Spark Structured Streaming series of blogs, we will have a deep look into what structured streaming is in a very layman language. So let’s get started. Introduction Structured streaming is a stream processing engine built on top of the Spark SQL engine and uses the Spark SQL APIs. It is fast, scalable and fault-tolerant. It provides rich, unified and high-level APIs in the Continue Reading

KSnow: Know about Cloning in Snowflake

Reading Time: 2 minutes This blog pertains to Cloning feature in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Zero Copy Clone Cloning also Snowflake as Zero Copy Clone in Snowflake. It used to create a copy of a Table or Schema or a Database. In most database, in order to make a copy Continue Reading

Product demand forecasting with Knime

Reading Time: 5 minutes In this blog, we are going to see, Importance of demand forecasting and how we can easily create these forecasting workflows with Knime. Market request forecasting is a basic procedure for any business, however maybe none more so than those in buyer packaged products. Stock, production, storage, delivering, showcasing – each aspect of CPG and retail organizations’ activities are influenced by accurate forecasting. Identifying shoppers’ Continue Reading

KSnow: Time Travel and Fail-safe in Snowflake

Reading Time: 5 minutes This blog pertains to Time Travel and Fail-safe in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started. Introduction to Time Travel Snowflake allows accessing historical data of a point in the past that may have been modified or deleted at the current time. Using time travel functionality a number of Continue Reading