Building LLM-Powered Recommendation Systems: A Technical Guide

#engineering #oxlo #ai

We are building a lightweight recommendation layer that uses an LLM to rank items from a small product catalog based on a user's recent history and stated preferences. It is useful for teams that need explainable suggestions without deploying a full embedding pipeline or training a collaborative filtering model.

What you'll need

Python 3.10 or newer
The OpenAI SDK: pip install openai
An Oxlo.ai API key from https://portal.oxlo.ai

Step 1: Define the catalog and user profile

I start with a hardcoded catalog of electronics and a short user history. Keeping everything in plain Python dictionaries makes the tutorial easy to port to your own database later.

CATALOG = [
    {"id": "p1", "name": "Sony WH-1000XM5", "category": "headphones", "price": 348, "tags": ["noise-canceling", "wireless", "over-ear"]},
    {"id": "p2", "name": "Apple AirPods Pro 2", "category": "headphones", "price": 249, "tags": ["noise-canceling", "wireless", "in-ear"]},
    {"id": "p3", "name": "Audio-Technica ATH-M50x", "category": "headphones", "price": 149, "tags": ["studio", "wired", "over-ear"]},
    {"id": "p4", "name": "Logitech MX Master 3S", "category": "mouse", "price": 99, "tags": ["wireless", "ergonomic", "productivity"]},
    {"id": "p5", "name": "Keychron Q1 Pro", "category": "keyboard", "price": 199, "tags": ["mechanical", "wireless", "hot-swappable"]},
]

USER_PROFILE = {
    "recent_views": ["p1", "p4"],
    "bought_last_month": ["p5"],
    "preferred_categories": ["headphones", "keyboard"],
    "budget_hint": "under 300",
}

Step 2: Build a minimal retrieval layer

Before sending everything to the model, I filter the catalog to the user's preferred categories. This cuts token usage and keeps the context focused.

def retrieve_candidates(catalog, user_profile, max_items=8):
    preferred = set(user_profile.get("preferred_categories", []))
    candidates = [
        item for item in catalog
        if item["category"] in preferred
    ]
    # Simple diversification: sort by price ascending, then take top N
    candidates.sort(key=lambda x: x["price"])
    return candidates[:max_items]

candidates = retrieve_candidates(CATALOG, USER_PROFILE)

Step 3: Write the system prompt

The system prompt tells the model how to behave, how to format JSON, and what factors to weigh. I keep it strict so the output is predictable.

SYSTEM_PROMPT = """
You are a product recommendation engine.
Given a JSON user profile and a JSON list of candidate products, rank the top 3 products for this user.
Return ONLY a JSON object with this exact structure:
{
  "recommendations": [
    {
      "product_id": "string",
      "rank": 1,
      "reason": "One sentence explaining why this fits the user."
    }
  ]
}
Consider the user's recent views, purchase history, preferred categories, and budget hint.
Do not recommend items the user already bought.
"""

Step 4: Call Oxlo.ai for ranking and explanation

I format the user message as a JSON string containing both the profile and the filtered candidates. Oxlo.ai's flat per-request pricing means I do not need to worry about prompt length when I add extra context, so I can include full product descriptions without ballooning cost.

import json
from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

user_message = json.dumps({
    "user_profile": USER_PROFILE,
    "candidates": candidates
}, indent=2)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ],
    temperature=0.2,
)

output = response.choices[0].message.content
recommendations = json.loads(output)
print(json.dumps(recommendations, indent=2))

Step 5: Add conversational follow-up

A real system should handle follow-up questions. I append the assistant's previous response to the message history and ask for a cheaper alternative. Because Oxlo.ai supports multi-turn conversations with no cold starts, the second request is just as fast as the first.

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": user_message},
    {"role": "assistant", "content": output},
    {"role": "user", "content": "Can you suggest something cheaper? My budget is tighter now."},
]

follow_up = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=messages,
    temperature=0.2,
)

print(follow_up.choices[0].message.content)

Run it

Save the complete script as recs.py, export your key, and run it.

export OXLO_API_KEY="sk-oxlo.ai-..."
python recs.py

Expected output from Step 4:

{
  "recommendations": [
    {
      "product_id": "p2",
      "rank": 1,
      "reason": "The user prefers wireless noise-canceling headphones and has not yet purchased the AirPods Pro 2, which fits the under-300 budget."
    },
    {
      "product_id": "p1",
      "rank": 2,
      "reason": "They recently viewed the WH-1000XM5, indicating strong interest, though it is at the top of their budget."
    },
    {
      "product_id": "p3",
      "rank": 3,
      "reason": "A solid wired studio option at a lower price point, useful as a secondary pair for their setup."
    }
  ]
}

Next steps

Swap the keyword filter in Step 2 with semantic retrieval using Oxlo.ai's embeddings endpoint. Send product names and descriptions through bge-large or e5-large, store the vectors in your database, and retrieve candidates by cosine similarity.

Alternatively, add an explicit feedback loop. Log each recommendation with a thumbs-up or thumbs-down, append that signal to the user profile JSON, and pass it back in the next request so the model learns implicit preferences without retraining.