Analytics

MachineX: Run ML model prediction faster with Hummingbird

Reading Time: 3 minutes In this blog, we will see how to make our machine learning model’s prediction faster with a recently open-sourced library Hummingbird. Nowadays, we can see a lot of frameworks for deploying or serving the machine learning model into production. As a result, It is a headache for a data scientist to choose between these frameworks, keeping in mind how their model either Sklearn or LightGBM Continue Reading

KnolSnow: Load continuous data into Snowflake using Snowpipe

Reading Time: 5 minutes In this blog, we will discuss loading streaming data into Snowflake table using Snowpipe. But before that, if you haven’t read the previous part of this blog i.e., Loading Bulk Data into Snowflake then I would suggest you go through it. As now we have been set so let’s get started and see what Snowpipe is all about. Introduction Snowpipe is a mechanism provided by Continue Reading

KnolSnow: Loading Data Into Snowflake

Reading Time: 5 minutes This blog pertains to Loading Data into Snowflake, and I will explain you about the various step involved in this process. So let’s get started. Before moving ahead, you can visit the blog on understanding the basic of Snowflake Data Warehouse in case you want to refresh your concepts. Now let’s talk about the actual topic for which you have click on this blog. To Continue Reading

Knime Analytics Platform: A dream for a data scientist

Reading Time: 3 minutes In this blog, we are going to see, what is the Knime analytics platform and its important features to create an analytics workflow in an easy way. Introduction to Knime Analytics Platform KNIME is a platform built for powerful analytics on a GUI based workflow. This means you do not have to know how to code to be able to work using KNIME and derive Continue Reading

procurement-challenges-during-covid 19

COVID 19: Mitigating Procurement Challenges in Supply Chain with Technology

Reading Time: 6 minutes Supply Chain Executives across the globe are perplexed! For years companies have been designing their supply chain strategies with the assumption that materials will be easily available from anywhere across the globe. But the procurement challenges during COVID 19 have taken a new turn as this reality changes faster than anyone could ever imagine. An Institute For Supply Chain Management survey conducted in March says Continue Reading

MachineX: performance metrics for Model Evaluation

Reading Time: 6 minutes In this blog, we are going to see how to choose the right metrics for model evaluation in different kinds of applications. There are different metric categories based on the ML model/application, and we are going to cover the popular metrics used in the following problems: Classification Metrics (accuracy, precision, recall, F1-score, ROC, AUC) Regression Metrics (MSE, MAE) there are more metrics like Computer Vision Continue Reading

Tracking Pixels with Google Tag Manager

Reading Time: 3 minutes Introduction Hi Everyone! This is my first blog and in this, I’ll try to explain how we can use Google Tag Manager to manage and deploy tracking pixels (eg. Google Analytics) on your website without having to modify the code Tracking Pixels A tracking pixel is an HTML code snippet that is loaded when a user visits a website. It is useful for tracking user Continue Reading

Apache Spark: Delta Lake as a Solution – Part II

Reading Time: 3 minutes Well, we have already covered the missing features in Apache Spark & also the causes of the issue in executing Delta Lake in Part1. However, today we will be talking about What Delta Lake is & how it provides the solution to all those problems discussed herein Delta Lake as a Solution: Part1.As we all know that Spark is just a processing engine, it doesn’t Continue Reading

Apache Spark: Delta Lake as a Solution – Part I

Reading Time: 3 minutes Today, everyone is talking about Delta Lake. Why? Ever tried to find the answer to this question? Yes or No doesn’t matter, don’t worry here in Part1 we will be discussing the same & also will be targetting the following questions: What are the features missing from Apache Spark? What kind of issues it causes in executing Data Lake? Answering the above questions will definitely Continue Reading

MachineX: Boosting performance with XGBoost

Reading Time: 5 minutes In this blog, we are going to see how XGBoost works and some of the important features of XGBoost with the help of an example. So, many of us heard about tree models and boosting techniques. Let’s put these concepts together and talk about XGBoost, the most powerful machine learning Algorithm out there. XGboost called for eXtreme Gradient Boosted trees. The name XGBoost, though, actually Continue Reading

Knoldus-corona-virus

MachineX: Analysing COVID-19 Pandemic

Reading Time: 5 minutes Introduction COVID-19 disease, caused by the SARS-CoV-2 virus, was identified in December 2019 in China and declared a global pandemic by the WHO(World Health Organization) on 11 March 2020. The disease first originated in Wuhan, China and since then it has spread globally across the world affecting more than 200 countries. Coronavirus disease 2019 (COVID-19) is a highly infectious disease caused by the severe acute respiratory syndrome. The Number Continue Reading

Apache Spark: Handle Corrupt/Bad Records

Reading Time: 3 minutes Most of the time writing ETL jobs becomes very expensive when it comes to handling corrupt records. And in such cases, ETL pipelines need a good solution to handle corrupted records. Because, larger the ETL pipeline is, the more complex it becomes to handle such bad records in between. Corrupt data includes: Missing information Incomplete information Schema mismatch Differing formats or data types Apache Spark: Continue Reading