Eddie Gulay

Posted on Aug 2, 2024

Using Association Rules for Recommendation

#recommendation #association #apriori #suggestion

1. Introduction

What is this about 💡

This is a short guide on building a product recommendation system using association rules. For simple next item suggestion from list of previous items. Also good for tasks that just need a quick recommender

Purpose of the Recommendation System

The main goal of this recommendation system is to enhance the shopping experience by providing personalized suggestions to customers. By analyzing past transaction data, we can identify patterns and relationships between different products. These insights allow us to recommend complementary ingredients that customers might be interested in, helping them discover new products and make more informed purchasing decisions.

But we won't do that

This tutorial is designed for data enthusiasts, developers, and anyone interested in doing what was described i last section. And since i have you here, we all are going to suggest food ingredients 😋

2. Prerequisites

Basic Knowledge of Python

Before diving into this tutorial, it is essential to have a basic understanding of Python programming. Familiarity with Python's syntax and basic data structures will help you follow along with the code examples and understand the logic behind the implementation.

A little bit of pandas then yours good skills in copy pasting if you won't mind

Introduction to Association Rule Mining and Its Importance

Association rule mining is a data mining technique used to identify interesting relationships or patterns between different items in large datasets. It is particularly useful in market basket analysis, where the goal is to discover associations between products purchased together.
in this case ingredients that occur together.

3. Setting Up the Environment

Tools and Libraries

We'll be using Python, along with the pandas and mlxtend libraries for data manipulation and association rule mining.

Installation Instructions

First, make sure you have Python installed on your system. You can download and install Python from the official website. Once Python is installed, you'll need to install the necessary libraries. You can do this using pip, the Python package installer.

Open your terminal or command prompt and run the following commands:

pip install pandas mlxtend

4. Preparing the Dataset

Dataset to Use

I had a quick chat with chatGPT and i aked for can you mention 50 traditional Tanzanian foods then now for each traditional Tanzanian food you mentioned, mention its ingredients in a python list. create ingredients = [ [food 1], [food 2] ... ]. and this was the return

ingredients = [
    ["maize flour", "water", "salt"],  # Ugali
    ["beef", "goat meat", "salt", "spices"],  # Nyama Choma
    ["flour", "meat", "vegetables", "spices", "oil"],  # Samosa
    ["flour", "water", "salt", "oil"],  # Chapati 
    # ... more  ...
]

😂 who uses salt in ugali ?, if you do you are weird ...

Creating the Data

We'll use the TransactionEncoder from the mlxtend library to convert the list of transactions into a format suitable for analysis.

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

# Initialize the TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit_transform(ingredients)

# Convert to DataFrame
df = pd.DataFrame(te_ary, columns=te.columns_)

5. Generating Association Rules

Frequent Itemsets

Use the Apriori algorithm to generate frequent itemsets from the transaction data. These itemsets represent combinations of ingredients that appear together frequently.

from mlxtend.frequent_patterns import apriori

# Generate frequent itemsets with a minimum support of 0.2
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)

Association Rules

Next, we will derive association rules from the frequent itemsets. These rules will help us understand the relationships between different products.

from mlxtend.frequent_patterns import association_rules

# Generate association rules with a minimum confidence of 0.6
rules = association_rules(frequent_itemsets, metric="support", min_threshold=0.2)

6. Creating the Recommendation Function

We will define a function to recommend products based on the association rules. This function will take a list of products and return a dictionary of recommended products with their support percentages.

# here ingredients -> products

def recommend_ingredients(products, rules=rules, top_n=10):
    rules['antecedents'] = rules['antecedents'].apply(lambda x: tuple(x))
    rules['consequents'] = rules['consequents'].apply(lambda x: tuple(x))
    recommendations = rules[rules['antecedents'].apply(lambda x: any(product in x for product in products))]
    recommendations = recommendations.sort_values(by=['confidence', 'lift'], ascending=False)
    top_recommendations = recommendations.head(top_n)

    result = []
    for _, row in top_recommendations.iterrows():
        for item in row['consequents']:
            if item not in result:
                result.append(item.lower())
    return result

7. Testing the Recommendation System

Example Usage

Let's test the recommendation system with an example list of ingredients.

product_list = ['oil', 'salt']
prods = recommend_ingredients(product_list)

print(prods)

Expected Output

The output will be a dictionary of recommended products along with their support percentages, showing which ingredients are most frequently associated with the input.

8. Conclusion

Recap

In this tutorial, we have walked through the process of building a recommendation system using association rules. We covered data preparation, frequent itemset generation, rule mining, and how to create a recommendation function based on these rules.

Further Exploration

You can further explore by experimenting with different datasets, adjusting the parameters for the Apriori algorithm, and fine-tuning the recommendation function. This will help you understand the nuances of association rule mining and its application in various domains.

⚠️ Also go read about the terms in used like support, confidence and lift.

Additional Resources

9. Q&A Section

Common Questions

What if my dataset is large?
- For large datasets, consider using more efficient algorithms or sampling methods to handle the data efficiently.
How do I choose the right support and confidence thresholds?
- Experiment with different thresholds to find a balance between generating useful rules and avoiding too many irrelevant ones.
Can I use this method for other types of data?
- Yes, association rule mining can be applied to various types of transactional data, not just kitchen ingredients.

10. Finally

If you are a programmer go finish that project, stop procrastinating. for others it's been nice to have you here 🙂

DEV Community