DEV Community

Aniket Hingane
Aniket Hingane

Posted on

SupplyChainAI: Building an Intelligent Vendor Recommendation Engine (PoC)

Title Diagram

SupplyChainAI: Solving B2B Procurement Headaches with a Simple Recommendation System

TL;DR

I built a Python-based intelligent recommendation engine to solve a real-world B2B problem: finding the right vendor for niche procurement needs. Using TF-IDF and Cosine Similarity, I created a system that matches natural language queries (like "sustainable packaging for electronics") to a database of vendors. This article documents my experiment, the architecture I designed, and the code I wrote.


Introduction

In my day-to-day experience observing various business processes, I’ve often found that B2B procurement feels surprisingly archaic. We live in an era where I can ask a smart speaker to order me a specific brand of toothpaste, yet procurement officers in large enterprises are often stuck traversing massive spreadsheets, sending out generic RFPs (Requests for Proposals), or relying on legacy "Preferred Vendor" lists that haven’t been updated in years.

I observed this friction firsthand and thought, "Why is this so hard? Why can't finding a supplier for 'high-grade industrial steel' be as easy as searching for a restaurant on Google Maps?"

The gap, in my opinion, isn't a lack of data. Enterprises usually have massive internal "Knowledge Bases"—wikis, SharePoint sites, or databases filled with vendor capabilities. The problem is retrieval. The disconnect between the language a buyer uses ("I need fast shipping for frozen fish") and the language a vendor uses ("Cold-chain logistics specialist, ISO 9001 certified") is where the matchmaking fails.

I decided to run an experiment. I vividly remember sitting down with a cup of coffee and thinking, "Can I build a lightweight, intelligent recommendation engine that bridges this semantic gap without training a massive Neural Network?"

I didn't want to use an overkill solution like a fine-tuned GPT-4 for a problem that might be solvable with classical NLP (Natural Language Processing). I wanted to see if the "old school" methods still had bite.

This post details my journey in building SupplyChainAI, a personal Proof of Concept (PoC) that aims to solve this very specific, very real business problem.


What's This Article About?

This article is more than just a code tutorial; it is a comprehensive log of my engineering thought process. I believe that writing code is easy, but designing the right solution is hard.

Here is what I cover in detail:

  1. The Problem Space: A deep dive into why keyword search fails in B2B contexts.
  2. The Solution Architecture: Why I chose a content-based recommendation approach over collaborative filtering.
  3. The Tech Stack: A defense of Scikit-Learn in the age of LLMs.
  4. The Code: A line-by-line walkthrough of the Python implementation.
  5. The Verification: How I tested the system against "weird" queries.

I am treating this as a strictly personal exploration. I want to emphasize that this is not production code from my employer, but rather a weekend hackathon project where I allowed myself to make mistakes and learn from them.


Tech Stack

For this experiment, I had to make some hard choices. The AI landscape is shifting daily. Should I use a Vector Database like Pinecone? Should I use OpenAI's embeddings API?

I thought about it, and I decided to go lean.

  • Python 3.12: My absolute go-to for rapid prototyping. The typing improvements in recent versions make the code so much more readable.
  • Scikit-Learn: I chose to use TfidfVectorizer and cosine_similarity.
    • Why not BERT? BERT requires heavy compute and is slower for simple matching tasks.
    • Why not OpenAI? I wanted this to run locally, offline, and for free. TF-IDF is mathematically elegant and incredibly fast for datasets under 1 million rows.
  • Mermaid.js: Documentation is code. I strictly believe that if you can't diagram it, you don't understand it.

In my opinion, developers often reach for the shiny new tool (LLMs) when a sharp knife (TF-IDF) will do the job better. This project is a testament to that belief.


Why Read It?

If you are a developer, a data scientist, or just a tinkerer looking to apply Machine Learning to boring, real-world business problems, this might be interesting to you.

I wrote this to show that Business AI doesn't always have to be a chat-bot. Sometimes, it's a quiet, invisible engine that takes a messy input and returns a structured, useful output. If you've ever wondered how recommendation systems work "under the hood" without the magic of black-box APIs, this is for you.


Let's Design

Before I wrote a single line of code, I stepped back to design the system. I needed a clean separation of concerns.

I visualized the system as three distinct agents:

  1. The User: The procurement officer with a need.
  2. The Knowledge Base: The static repository of truth.
  3. The Engine: The active processor that connects the two.

Here is the high-level architecture I sketched out:

Architecture Diagram

The Thought Process Behind the Architecture

I put the Knowledge Base as a separate entity because, in a real-world scenario, this wouldn't be a Python list. It would be a connection to an ERP system like SAP or Oracle. By abstracting it now, I ensured that my RecommendationEngine class wouldn't care where the data came from, only what it looked like.

I also spent time thinking about the data flow. How does the query travel?

Sequence Diagram

  1. Load: The system wakes up and ingests the "worldview" (the vendors).
  2. Fit: The engine builds a vocabulary. It learns that "logistics" is a common word, but "cryogenic" is rare and therefore important.
  3. Search: The user's query is transformed into the same mathematical space as the vendors.
  4. Rank: We measure the distance between the query and every vendor, organizing them by "closeness".

Let's Get Cooking

I structured my project into three main components. I will explain my code block by block, detailing exactly why I wrote it that way.

1. The Knowledge Base (src/knowledge_base.py)

First, I needed data. Since I didn't have access to a real enterprise database, I created a KnowledgeBase class to manage a rich set of mock vendors.

I started by defining a Vendor data class. I am a huge fan of Python's dataclasses because they reduce boilerplate. All I wanted was a structured way to hold vendor metadata.

# src/knowledge_base.py

import dataclasses
from typing import List

@dataclasses.dataclass
class Vendor:
    id: str
    name: str
    category: str
    description: str
    location: str
    rating: float
Enter fullscreen mode Exit fullscreen mode

Next, the KnowledgeBase class itself. I designed this to be the "Source of Truth".

class KnowledgeBase:
    """
    Acts as the data retriever / store for our vendors.
    In a real app, this would query a Vector DB or SQL database.
    For this PoC, we hold everything in memory.
    """
    def __init__(self):
        self.vendors: List[Vendor] = []
        self._load_mock_data()

    def _load_mock_data(self):
        """Loads a diverse set of mock B2B vendors."""
        # I simulated a diverse market: Logistics, Packaging, Manufacturing, etc.
        # This diversity is crucial to test if the AI can distinguish between
        # "Logistics for food" vs "Logistics for steel".
        self.vendors = [
            Vendor(
                id="v1", 
                name="SpeedyTrans Logistics", 
                category="Logistics",
                description="Global shipping and freight forwarding with a focus on cross-border e-commerce and rapid customs clearance. Specialized in cold chain logistics.",
                location="Germany",
                rating=4.8
            ),
            Vendor(
                id="v2", 
                name="TechPack Solutions", 
                category="Packaging",
                description="Sustainable packaging materials for electronics and fragile goods. Offers biodegradable bubble wrap and custom-sized boxes.",
                location="USA",
                rating=4.5
            ),
            # ... and so on. I added about 8 distinct profiles.
        ]

    def get_all_vendors(self) -> List[Vendor]:
        return self.vendors

    def get_descriptions(self) -> List[str]:
        # This helper method extracts just the text we want to analyze.
        return [v.description for v in self.vendors]
Enter fullscreen mode Exit fullscreen mode

I manually crafted the descriptions to include specific keywords. For instance, I added "cold chain" to SpeedyTrans and "biodegradable" to TechPack. I intuitively knew that these specific "anchors" would be what the TF-IDF algorithm latches onto.

2. The Recommendation Engine (src/recommendation_engine.py)

This is the brain of the operation. This is where the math happens.

I chose TF-IDF (Term Frequency-Inverse Document Frequency) for a specific reason. In a Knowledge Base, some words are useless. "The", "we", "provide", "services". These appear in every vendor description. If a user searches "Provide services for shipping", checking for the word "provide" is a waste of time.

TF-IDF automatically downweights these common words and upweights the rare ones. If Quantum Chipsets is the only vendor with the word "silicon", that word becomes a very powerful signal.

# src/recommendation_engine.py

from typing import List, Dict, Any
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from .knowledge_base import Vendor

class RecommendationEngine:
    def __init__(self):
        # Stop words removal helps clean up common noise like "the", "and", etc.
        # This was a critical step I added after my first failed test where "and" 
        # was matching irrelevant results.
        self.vectorizer = TfidfVectorizer(stop_words='english')
        self.vendor_vectors = None
        self.vendors: List[Vendor] = []

    def fit(self, vendors: List[Vendor]):
        """
        'Trains' the engine by vectorizing the vendor descriptions.
        """
        self.vendors = vendors
        descriptions = [v.description for v in vendors]

        # fit_transform converts text to a matrix of numbers.
        # Each row is a vendor, each column is a word in the entire vocabulary.
        self.vendor_vectors = self.vectorizer.fit_transform(descriptions)
        print(f"Engine fitted with {len(vendors)} vendors.")

    def search(self, query: str, top_k: int = 3) -> List[Dict[str, Any]]:
        """
        Searches for vendors matching the query string.
        Returns a list of dictionaries with vendor info and score.
        """
        if not self.vendors or self.vendor_vectors is None:
            return []

        # 1. Transform the user's query into the SAM vector space as the vendors.
        # Note: We use 'transform', NOT 'fit_transform'. The model is already frozen.
        query_vector = self.vectorizer.transform([query])

        # 2. Calculate Cosine Similarity.
        # Imagine every vendor is a dot on a 2D graph. The query is another dot.
        # We are measuring the angle between the query dot and every vendor dot.
        cosine_similarities = cosine_similarity(query_vector, self.vendor_vectors).flatten()

        # 3. Sort the results.
        # argsort gives us indices of the sorted array. 
        # [:-top_k-1:-1] is Python magic slice to get the last K items in reverse order (highest score first).
        related_docs_indices = cosine_similarities.argsort()[:-top_k-1:-1]

        results = []
        for match_index in related_docs_indices:
            score = cosine_similarities[match_index]

            # I found that sometimes you get a 0.0 score match if there are no shared words.
            # It's better to filter those out than show a "0% match".
            if score > 0.0: 
                vendor = self.vendors[match_index]
                results.append({
                    "vendor": vendor,
                    "score": float(score)
                })

        return results
Enter fullscreen mode Exit fullscreen mode

In my experience, cosine_similarity is the gold standard for text comparison. Unlike Euclidean distance, which cares about magnitude (how long the text is), Cosine cares about direction (content overlap). This means a short description like "Fast shipping" can match perfectly with a query "Fast shipping", even if another vendor has a 1000-word essay that happens to mention "shipping" once.

3. The Orchestrator (src/workflow.py)

Finally, I wrote a workflow script to tie it all together. I wanted a script that I could run in the terminal to visually verify the outputs.

# src/workflow.py

import sys
import os

# Ensure we can import from src
sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

from src.knowledge_base import KnowledgeBase
from src.recommendation_engine import RecommendationEngine

def main():
    print("--- Starting SupplyChainAI Recommendation Engine ---")

    # 1. Initialize Knowledge Base
    kb = KnowledgeBase()
    vendors = kb.get_all_vendors()
    print(f"Loaded {len(vendors)} vendors from Knowledge Base.")

    # 2. Initialize and Fit Recommendation Engine
    engine = RecommendationEngine()
    engine.fit(vendors)

    # 3. Define Test Queries
    # I specifically chose these queries to be tricky.
    # "frozen goods" should match "cold chain".
    # "chips" should match "silicon" or "electronics".
    queries = [
        "Need a fast logistics provider for frozen goods in Europe",
        "Sustainable packaging for electronics",
        "High quality precision manufacturing for chips",
        "Automated assembly line robots",
        "Cloud hosting for healthcare data"
    ]

    # 4. Run Queries
    print("\n--- Running Search Queries ---")
    for query in queries:
        print(f"\nQuery: '{query}'")
        results = engine.search(query)

        if not results:
            print("  No matching vendors found.")
        else:
            for res in results:
                vendor = res['vendor']
                score = res['score']
                print(f"  > [{score:.4f}] {vendor.name} ({vendor.location}) - {vendor.category}")
                print(f"    Desc: {vendor.description[:80]}...")

    print("\n--- Demo Completed ---")

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Let's Setup

If you want to run this experiment yourself, I've detailed the steps below. I've hosted the code on GitHub so you can clone it and play with the mock data yourself.

  1. Clone the Repo:

    git clone https://github.com/aniket-work/SupplyChainAI.git
    cd SupplyChainAI
    
  2. Install Dependencies:
    You only need a few libraries. I kept requirements.txt minimal.

    pip install -r requirements.txt
    
  3. Run the Demo:

    python src/workflow.py
    

Full instructions are available in the README.


Let's Run

The moment of truth. I ran the script, and checking the output was genuinely satisfying.

For the query "High quality precision manufacturing for chips", the system correctly identified Precise Moulding and Quantum Chipsets.

Query: 'High quality precision manufacturing for chips'
  > [0.3210] Precise Moulding (China) - Manufacturing
    Desc: Custom injection moulding for plastic parts. High precision prototyping and mass...
  > [0.2893] Quantum Chipsets (Taiwan) - Electronics
    Desc: High-performance silicon chips for AI processing units and industrial automation...
Enter fullscreen mode Exit fullscreen mode

Notice the semantic leap here. The query mentioned "chips". Quantum Chipsets obviously matched. But Precise Moulding matched because of the word "precision" and likely "manufacturing" concepts hidden in my vector space.

Even better was the result for "Cloud hosting for healthcare data".

Query: 'Cloud hosting for healthcare data'
  > [0.5222] SecureCloud Host (UK) - IT Services
    Desc: Enterprise-grade cloud hosting and cybersecurity solutions. HIPAA and GDPR compl...
Enter fullscreen mode Exit fullscreen mode

It found the only relevant vendor with a high confidence score of 0.5222.


Closing Thoughts

Building SupplyChainAI was a fun weekend experiment, but it left me with a profound realization.

We often overcomplicate business software. We think we need the latest 100-billion parameter model to understand user intent. But for a domain as constrained and specific as "Vendor Procurement", simpler probabilistic models often win. They are explainable, they are fast, and they are incredibly cheap to run.

My PoC proved that you can build a helpful recommendation assistant in under 200 lines of Python.

If I were to take this further—say, for a real client project—I would look into connecting this KnowledgeBase class to a live SQL database, or perhaps swapping the TF-IDF vectorizer for a lightweight HuggingFace embedding model like all-MiniLM-L6-v2 to capture even deeper semantic meaning (like knowing that "lorry" and "truck" are the same thing).

But for now, this little script does the job. And in my experience, "doing the job" is exactly what businesses care about.

I hope this walkthrough inspires you to try building your own recommendation engines for the "boring" problems around you.


Disclaimer

The views and opinions expressed here are solely my own and do not represent the views, positions, or opinions of my employer or any organization I am affiliated with. The content is based on my personal experience and experimentation and may be incomplete or incorrect. Any errors or misinterpretations are unintentional, and I apologize in advance if any statements are misunderstood or misrepresented.

Top comments (0)