Market Basket Analysis using Association Rule Mining.

Ritik Vaidande
5 min readNov 7, 2020

Recently, I got an opportunity to work on a project based on Market Basket Analysis, and here I am sharing my experience.

What is Market basket analysis?

Market Basket Analysis is one of the fundamental techniques used by large retailers to uncover the association between items. In other words, it allows retailers to identify the relationship between items which are more frequently bought together.

Association Rules :

Association Rules are widely used to analyze retail basket or transaction data, and are intended to identify strong rules discovered in transaction data using measures of interesting measures, based on the concept of strong rules.

An example of Association Rules

  • Assume there are 100 customers
  • 10 of them bought milk, 8 bought butter and 6 bought both of them.
  • bought milk => bought butter
  • support = P(Milk & Butter) = 6/100 = 0.06
  • confidence = support/P(Butter) = 0.06/0.08 = 0.75
  • lift = confidence/P(Milk) = 0.75/0.10 = 7.5

This example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Key metrics for association rules:

Consider these example :

order 1: apple, egg, milk  
order 2: carrot, milk
order 3: apple, egg, carrot
order 4: apple, egg
order 5: apple, carrot

There are 4 key metrics to consider when evaluating association rules:

1.Support :
This is the percentage of orders that contains the item set. In the example above, there are 5 orders in total and {apple,egg} occurs in 3 of them, so:

support{apple,egg} = 3/5 or 60%

The minimum support threshold required by apriori can be set based on knowledge of your domain.

2.Confidence :
Given two items, A and B, confidence measures the percentage of times that item B is purchased, given that item A was purchased. This is expressed as:

confidence{A->B} = support{A,B} / support{A}

Confidence values range from 0 to 1, where 0 indicates that B is never purchased when A is purchased, and 1 indicates that B is always purchased whenever A is purchased. Note that the confidence measure is directional. This means that we can also compute the percentage of times that item A is purchased, given that item B was purchased:

confidence{B->A} = support{A,B} / support{B}

In our example, the percentage of times that egg is purchased, given that apple was purchased is:

confidence{apple->egg} = support{apple,egg} / support{apple}
= (3/5) / (4/5)
= 0.75 or 75%

A confidence value of 0.75 implies that out of all orders that contain apple, 75% of them also contain egg. Now, we look at the confidence measure in the opposite direction (ie: egg->apple):

confidence{egg->apple} = support{apple,egg} / support{egg}
= (3/5) / (3/5)
= 1 or 100%

3.Lift :

Unlike the confidence metric whose value may vary depending on direction (eg: confidence{A->B} may be different from confidence{B->A}), lift has no direction. This means that the lift{A,B} is always equal to the lift{B,A}:

lift{A,B} = lift{B,A} = support{A,B} / (support{A} * support{B})

In our example, we compute lift as follows:

lift{apple,egg} = lift{egg,apple} = support{apple,egg} / (support{apple} * support{egg})
= (3/5) / (4/5 * 3/5)
= 1.25

In summary, lift can take the following values:

  • Lift = 1; implies no relationship between A and B (i.e., A and B occur together only by chance)
  • Lift > 1; implies that there is a positive relationship between Aand B(i.e., A and B occur together more often than random)
  • Lift < 1; implies that there is a negative relationship between A and B(i.e., A and B occur together less often than random)

In our example, apple and egg occur together 1.25 times more than random, so we conclude that there exists a positive relationship between them

4.Conviction :

The conviction of a rule is defined as

conv{apple,egg} = ( 1 - support{egg}) / 
(1 - confidence{apple-->egg})

It is interpreted as the ratio of the expected frequency that A occurs without B (that is to say, the frequency that the rule makes an incorrect prediction) if A and B were independent divided by the observed frequency of incorrect predictions.

Input Dataset :

https://github.com/stedy/Machine-Learning-with-R-datasets/blob/master/groceries.csv

Let’s look at the code of market basket analysis using Python:

CODE :import numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Data Preprocessing
dataset = pd.read_csv(‘groceries.csv’)transactions = []for i in range(0, 9835):
transactions.append([str(dataset.values[i,j]) for j in range(0, 32)])# Training Apriori on the dataset
from apyori import apriori
rules = apriori(transactions, min_support = 0.007, min_confidence = 0.5, min_lift = 3, min_length = 2)# Visualising the results
results = list(rules)dataset.head()
dataset.shapeOUTPUT:
(9835, 32)CODE :
for a in results:
print("------------------------------------------------------")
print(a)OUTPUT :
RelationRecord(items=frozenset({'citrus fruit', 'other vegetables', 'root vegetables'}), support=0.010371123538383325, ordered_statistics=[OrderedStatistic(items_base=frozenset({'citrus fruit', 'root vegetables'}), items_add=frozenset({'other vegetables'}), confidence=0.5862068965517241, lift=3.0296084222733612)])
-------------------------------------------------------------------------------------------------------------
RelationRecord(items=frozenset({'tropical fruit', 'other vegetables', 'root vegetables'}), support=0.012302999491611592, ordered_statistics=[OrderedStatistic(items_base=frozenset({'tropical fruit', 'root vegetables'}), items_add=frozenset({'other vegetables'}), confidence=0.5845410628019324, lift=3.020999134344196)])
-------------------------------------------------------------------------------------------------------------
RelationRecord(items=frozenset({'nan', 'citrus fruit', 'other vegetables', 'root vegetables'}), support=0.010269445856634469, ordered_statistics=[OrderedStatistic(items_base=frozenset({'nan', 'citrus fruit', 'root vegetables'}), items_add=frozenset({'other vegetables'}), confidence=0.5838150289017341, lift=3.0172468782178425)])
-------------------------------------------------------------------------------------------------------------
RelationRecord(items=frozenset({'tropical fruit', 'nan', 'other vegetables', 'root vegetables'}), support=0.012201321809862735, ordered_statistics=[OrderedStatistic(items_base=frozenset({'tropical fruit', 'nan', 'root vegetables'}), items_add=frozenset({'other vegetables'}), confidence=0.5825242718446603, lift=3.010576044977527)])
-------------------------------------------------------------------------------------------------------------
RelationRecord(items=frozenset({'tropical fruit', 'whole milk', 'other vegetables', 'root vegetables'}), support=0.007015760040671073, ordered_statistics=[OrderedStatistic(items_base=frozenset({'tropical fruit', 'whole milk', 'root vegetables'}), items_add=frozenset({'other vegetables'}), confidence=0.5847457627118644, lift=3.0220570553185424)])

CONCLUSION :

From the output above, we see that the top associations are not surprising, with one flavor of an item being purchased with another flavor from the same item family . As mentioned, one common application of association rules mining is in the domain of recommending systems. Once item pairs have been identified as having positive relationship, recommendations can be made to customers in order to increase sales. And hopefully, along the way, also introduce customers to items they never would have tried before or even imagined existed!

I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Exprience. Thank you www.suvenconsultants.com"

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response