Reading Time: 7 minutes In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules. Introduction to Market Basket analysis Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it Continue Reading
Reading Time: 7 minutes Introduction Python is a great flexible programming language that can be used in many situations. In this tutorial, we will focus primarily on it’s ability to enhance the Unix/Linux shell environment. Typically in Unix we will create “bash” shell scripts, but we can also create shell scripts using python, and it’s really simple! We can even name our shell scripts with the .sh extension and Continue Reading
Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. Continue Reading
Reading Time: 3 minutes HashiCorp Vault is a secret management tool which provides a secure and reliable way to store secrets like passwords, access token, secret API key etc.
There are applications that need to interact with third party services and for that it needs various credentials. There are scenarios in which we need different credentials to process different requests. So, where will you store them? Can you really hard-code them and publish them to your sub-versioning tool? Ofcourse not. This is not a recommendable approach.
Reading Time: 3 minutes In our previous blog Introduction to Django, we discussed the Django’s features and architecture. In this blog, we will create a web application in Django. For starting a new project, go to the folder where you want your project to be and run the command: django-admin startproject django_proj django-admin Django’s command-line utility for administrative tasks.manage.py is automatically created in each Django project. manage.py does the Continue Reading
Reading Time: 4 minutes In this blog, we are going to go through about one of the widely used classification algorithm called KNN (K-Nearest Neighbors). Since I started doing data science, I observed that most of the problems end up with classification model The main reason behind this biased property is, most of the analytic problems are based on decision making. For instance, to identify loan applicants as low, Continue Reading
Reading Time: 3 minutes In this blog, we are going to talk about Django. Before that let’s understand what is web framework and why do we need it? A web framework is a software tool that helps us develop application faster and smarter. It eliminates the need to write a lot of repetitive code and saves time. What is Django? Django is a free open source high-level web framework Continue Reading
Reading Time: 8 minutes In this Blog, we are going to learn about how to do Data Cleaning in Python. Most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data, which is an inefficient data strategy. The reason data scientists are hired in the first place is to develop algorithms and Continue Reading
Reading Time: 4 minutes In this blog, I will walk you through the basics of NumPy. If you want to do machine learning then knowledge of NumPy is necessary. It one of the most widely used python library Numeric Python. It is the most useful library if you are dealing with numbers in python. NumPy guarantees great execution speed comparing it with python standard libraries. It comes with a Continue Reading
Reading Time: 3 minutes In this blog, I am going to explain pandas which is an open source library for data manipulation, analysis, and cleaning. Pandas is a high-level data manipulation tool developed by Wes McKinney. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data. Pandas is built on the top of NumPy. Five typical steps in the processing and analysis of Continue Reading
Reading Time: 5 minutes It’s been a great year for us at Knoldus Inc. and we would like to thank you for your constant support and valuable interactions in 2018. We are sure you are looking forward to this brand new year as much as we are. All of our successes so far would not have been possible without the support of our team, partners, and of course, our Continue Reading