# MachineX: Demystifying Market Basket analysis

In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules.

## Introduction to Market Basket analysis

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

The approach is based on the theory that customers who buy a certain item are more likely to buy another specific item ).

For example, People who buy Bread usually buy Butter too. The Marketing teams at retail stores should target customers who buy bread and butter and provide an offer to them so that they buy the third item, like eggs.

So if customers buy bread and butter and see a discount or an offer on eggs, they will be encouraged to spend more and buy the eggs. This is what market basket analysis is all about.

This is just a small example. So, if you take 10000 or 20000 items data of your Supermart to a Data Scientist, Just imagine the number of insights you can get. And that is why Association Rule mining is so important.

### Real-life application

Market basket analysis can also be used to cross-sell products. Amazon famously uses an algorithm to suggest items that you might be interested in, based on your browsing history or what other people have purchased.

A well known urban legend is that a supermarket, in the wake of running a business sector bushel examination, found that men were probably going to purchase brew and diapers together. Deals expanded deals by putting lager alongside the diapers.

It sounds straightforward (and much of the time, it is). Be that as it may, entanglements to know about:

• For huge inventories (for example more than 10,000), the mix of things may detonate into the billions, making the math practically outlandish.
• Information is regularly mined from enormous exchange chronicles. A lot of information is normally taken care of by particular measurable programming

### Association Rule Mining

Association Rule Mining basically used when we have to find an association between objects in a given set or to find some hidden pattern in any piece of Information.

Market Basket Analysis or Basket Data Analysis in retailing or clustering are some applications of Association Rule Mining.

The most widely Used way to deal with these examples is Market Basket Analysis. This is a key system utilized by many big companies in the retail sector like Amazon, Flipkart, and so forth to break down users of purchasing behavior by identifying the relationship between the various things that users place in their “shopping containers”. The revelation of these affiliations can assist retailers with creating advertising procedures by picking up knowledge into which things are as often as possible acquired together by clients. The methodologies may include:

• Changing the store layout according to trends
• Cross marketing on online stores
• What are the trending items customers buy
• Customized emails with add-on sales
• Customer behavior analysis
• Catalog design

Note: There is a lot of confusion in everyone’s mind regarding the similarity between Market Basket Analysis and Recommendation Systems

### Difference between Association and Recommendation

As already discussed, the Association rules do not work on an individual’s preference. It always finds the relation between some sets of elements of every transaction. This makes them totally different than recommendation system method called Collaborating filtering.

If you want to learn about the recommendation system, you can go through my previous blog Recommendation Engines.

#### Example:

To understand it better take a look at below snapshot from amazon.com. You notice 2 headings “Frequently Bought Together” and the “Customers who bought this item also bought” on each product’s info page.

Frequently Bought Together → Association Customers who bought this item also bought → Recommendation

So this was the difference between association rules and recommendations.

Now, let’s talk about one of the main association Machine learning algorithms. ie. Apriori Algorithm

## Apriori Algorithm

Let assume that we have a transaction containing a set {Banana, Pineapple, mango} also contain another set {Banana, Mango}. So, according to the principle of Apriori, if {Banana, Pineapple, Mango} is frequent, then {Banana, Mango} must also be frequent.

We have a dataset which is consist of some transactions.

0 -> absence of an item

1-> Presence of an item

In order to find out interesting rules out of multiple possible rules from this small business scenario, we will be using the following matrices:

Support: Support is the popularity(frequency of occurrence) of an item. It can be calculated by a number of transactions containing the item to the total number of transactions. So, if we want to calculate the support for the banana, here it is:

Support(Banana) = (Transactions involving Grapes)/(Total transaction)

Support(Banana) = 0.666

Confidence: Likelihood of occurrence of item B if item A occurs(Conditional Probability).

Confidence(A => B) = (Transactions involving both A and B)/(Transactions involving only A)

Confidence({Banana, Pineapple} => {Mango}) = Support(Banana, Pineapple, Mango)/Support(banana, Pineapple)

= 2/6 / 3/6

= 0.667

Lift: Increase in the ratio of occurence of item B if item A occurs.

Lift(A => B) = Confidence(A, B) / Support(B)

Lift ({Banana, Pineapple} => {Mango}) = 1

So, likelihood of a customer buying both A and B together is ‘lift-value’ times more than the chance if purchasing alone.

• Lift (A=> B) = 1 means that there is no correlation within the item set.
• Lift (A => B) > 1 means that there is a positive correlation within the itemset, i.e., products in the itemset, A, and B, are more likely to be bought together.
• Lift (A => B) < 1 means that there is a negative correlation within the itemset, i.e., products in itemset, A, and B, are unlikely to be bought together.

## Implementation

You can get the data from here.

This dataset containing transaction data of a store with various products.

Install apyori package before importing the library

``````conda install --yes apyori
OR
pip3 install --yes apyori``````

#### Import the packages

``````import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori``````

we have imported all the necessary libraries:

• NumPy and pandas for basic operations
• Matplotlib for data visualization
• apyori for our data modeling

#### Import the data

``store_data = pd.read_csv("store_data.csv",header = None)``

we have read the dataset using pandas into a data frame with the name “store_data”. Now let’s see the data

``store_data.head()``

So, this is our data looks like, it contains all the transaction history of various products.

``store_data.shape``

7501 indicates the total number of transactions with different items bought together. 20 indicates the number of columns to display items

#### Data Preprocessing

Since the Apriori library requires our dataset to be in the form of a list of lists. So the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list. [ [transaction1], [transaction2], . . [transaction7501] ]

Let’s Convert our pandas’ data frame into a list of lists as follows:

``````records = []
for i in range(0,7501):
records.append([str(store_data.values[i,j]) for j in range(0,20)])``````

Let’s see these transaction sets:

``````for sets in records:
print(sets)``````

#### Apriori Algorithm

Parameters of apriori:

• records: list of lists
• min_support: probability value to select the items with support values greater than the value specified by the parameter
• min_confidence: probability value to filter rules with greater confidence than the specified threshold
• min_lift: minimum lift value to shortlist the list of rules
• min_length: minimum number of items you want in your rules
``association_rules = apriori(records, min_support = 0.0055, min_confidence = .3, min_lift = 3, min_length = 2)``

Convert above rules into a list of rules:

``association_results = list(association_rules)``

Now let’s see how many rules had been generated by our algorithm:

``print(len(association_results))``

So, In total, we have 18 rules and those have support, confidence and lift higher than what we expect. Let’s see some of the rules

``print(association_results[5])``

we can see that rule 5 contains (spaghetti, ground beef, frozen vegetables) which have a good association between them.

#### Display the list of rules

``````for item in association_results:
pair = item[0]
items = [x for x in pair]
print("Rule :"+ str(items[0]) + "->" + str(items[1]))
print("Support : {}".format(item[1]))
print("Confidence : {}".format(item[2][0][2]))
print("List : {}".format(item[2][0][3]))
print("\n-------------------------------------------------\n")``````

So, this was all about how to implement the apriori algorithm to find associativity in our set of transactions.

Stay Tunes, happy learning 🙂

### References

#### Written by Shubham Goyal

Shubham Goyal is a Data Scientist at Knoldus Inc. With this, he is an artificial intelligence researcher, interested in doing research on different domain problems and a regular contributor to society through blogs and webinars in machine learning and artificial intelligence. He had also written a few research papers on machine learning. Moreover, a conference speaker and an official author at Towards Data Science.