MachineX: Run ML model prediction faster with Hummingbird

Reading Time: 3 minutes In this blog, we will see how to make our machine learning model’s prediction faster with a recently open-sourced library Hummingbird. Nowadays, we can see a lot of frameworks for deploying or serving the machine learning model into production. As a result, It is a headache for a data scientist to choose between these frameworks, keeping in mind how their model either Sklearn or LightGBM Continue Reading

MachineX: Ultimate guide to NLP (Part 1)

Reading Time: 7 minutes In this blog, we are going to see some basic text operations with NLP, to solve different problems. This Blog is a part of a series Ultimate guide to NLP , which will focus on Basic text pre-processing techniques. Some of the major areas that we will be covering in this series of Blogs include the following: Text Pre-Processing Understanding of Text & Feature Engineering Continue Reading

MachineX: Boosting performance with XGBoost

Reading Time: 5 minutes In this blog, we are going to see how XGBoost works and some of the important features of XGBoost with the help of an example. So, many of us heard about tree models and boosting techniques. Let’s put these concepts together and talk about XGBoost, the most powerful machine learning Algorithm out there. XGboost called for eXtreme Gradient Boosted trees. The name XGBoost, though, actually Continue Reading

MachineX: Demystifying Market Basket analysis

Reading Time: 7 minutes In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules. Introduction to Market Basket analysis Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it Continue Reading

Python Scripts: An Introduction

Reading Time: 7 minutes Introduction Python is a great flexible programming language that can be used in many situations. In this tutorial, we will focus primarily on it’s ability to enhance the Unix/Linux shell environment. Typically in Unix we will create “bash” shell scripts, but we can also create shell scripts using python, and it’s really simple! We can even name our shell scripts with the .sh extension and Continue Reading

Defining your workflow: Why Not Airflow?

Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. Continue Reading

Getting Started with Vault

Reading Time: 3 minutes HashiCorp Vault is a secret management tool which provides a secure and reliable way to store secrets like passwords, access token, secret API key etc.

There are applications that need to interact with third party services and for that it needs various credentials. There are scenarios in which we need different credentials to process different requests. So, where will you store them? Can you really hard-code them and publish them to your sub-versioning tool? Ofcourse not. This is not a recommendable approach.

Build your first web application using Django

Reading Time: 3 minutes In our previous blog Introduction to Django, we discussed the Django’s features and architecture. In this blog, we will create a web application in Django. For starting a new project, go to the folder where you want your project to be and run the command: django-admin startproject django_proj django-admin Django’s command-line utility for administrative tasks.manage.py is automatically created in each Django project. manage.py does the Continue Reading

MachineX :k-Nearest Neighbors(KNN) for classification

Reading Time: 4 minutes In this blog, we are going to go through about one of the widely used classification algorithm called KNN (K-Nearest Neighbors). Since I started doing data science, I observed that most of the problems end up with classification model The main reason behind this biased property is, most of the analytic problems are based on decision making. For instance, to identify loan applicants as low, Continue Reading

Introduction to Django

Reading Time: 3 minutes In this blog, we are going to talk about Django. Before that let’s understand what is web framework and why do we need it? A web framework is a software tool that helps us develop application faster and smarter. It eliminates the need to write a lot of repetitive code and saves time. What is Django? Django is a free open source high-level web framework Continue Reading

MachineX: Data Cleaning in Python

Reading Time: 8 minutes In this Blog, we are going to learn about how to do Data Cleaning in Python. Most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data, which is an inefficient data strategy. The reason data scientists are hired in the first place is to develop algorithms and Continue Reading

Introduction to NumPy

Reading Time: 4 minutes In this blog, I will walk you through the basics of NumPy. If you want to do machine learning then knowledge of NumPy is necessary. It one of the most widely used python library Numeric Python. It is the most useful library if you are dealing with numbers in python. NumPy guarantees great execution speed comparing it with python standard libraries. It comes with a Continue Reading