DEV Community: tzedaka

How We Pass Listing Context to an LLM...

tzedaka — Thu, 09 Apr 2026 10:09:01 +0000

We built Shopao, an AI agent that automatically answers buyer questions on MercadoLibre listings. When a buyer posts a question, the system intercepts the webhook, assembles context from the listing, the seller's business profile, and their catalog — then either sends a reply directly to MeLi or holds it in the seller's portal for one-click approval.

This post covers the specific decisions we made in wiring real marketplace data into an LLM reasoning loop, including one bug that took us an embarrassing amount of time to debug.

The Stack

Backend: Spring Boot 3 (REST API + webhook handler)
Agent service: Spring Boot 3 + LangChain4j (stateless reasoning loop)
LLM: swappable via config — currently OpenAI gpt-4o-mini, with Gemini and Ollama also supported
Marketplace: MercadoLibre (Argentina/LATAM)

The agent service is fully stateless — it receives a request, runs the LangChain4j reasoning loop, and returns a suggested reply. No session, no DB access inside the loop.

1. How the Prompt Is Constructed

The most important architectural decision: the LLM only sees the buyer's question and the item ID in the user message. Everything else arrives through tool calls during the reasoning loop.

// LangChain4j @AiService interface
@SystemMessage(fromResource = "/prompts/system-message.txt")
@UserMessage(fromResource = "/prompts/user-message.txt")
String answer(@V("questionText") String questionText, @V("itemId") String itemId);

The user message template is minimal — question text, item ID, and a workflow instruction. The LLM then calls three tools in order:

QuestionContextTool — hits MeLi's public API, formats listing data
SellerProfileTool — returns seller policies (warranty, shipping, returns) from a ThreadLocal
RelatedProductsTool — returns catalog context for upsell/cross-sell, called only when relevant

We intentionally don't pre-inject listing data as a JSON blob into the user message. The tool-calling pattern lets the LLM skip the related products tool on purely technical questions, saving tokens and ~200ms per request.

2. Handling Incomplete Listing Data

MercadoLibre listings vary wildly in data quality. Some have 40 structured attributes; some have a title and nothing else. A naive implementation passes the raw API response to the LLM and lets it deal with nulls — we don't do that.

Three-layer defense:

Layer 1 — Description fallback. The description endpoint returns 404 for many listings. Feign throws on 4xx by default, so we catch it and return the literal string "(no description)". An empty string would be worse — the model might try to fabricate one.

Layer 2 — Attribute filtering at render time. MeLi's API frequently returns attribute objects where name is populated but value_name is null. We drop those silently:

for (Attribute attr : attributes) {
    if (attr.getValueName() != null && !attr.getValueName().isBlank()) {
        sb.append("- ").append(attr.getName()).append(": ").append(attr.getValueName()).append("\n");
    }
}

Sending - Brand: null to the LLM is worse than omitting it — it takes up tokens and can confuse the model into treating "null" as a real value.

Layer 3 — Tool-level fallback. If the entire tool call fails (network timeout, MeLi outage), the tool returns "Product context not available for item {id}." instead of throwing. The agent falls back to whatever it can infer from the seller profile alone — imperfect, but it keeps the pipeline alive.

3. Passing Per-Request State Without Polluting the LLM Context

The seller's OAuth token, business profile, and email address all need to be reachable during the reasoning loop — but none of them should appear in any LLM message.

We use a set of ThreadLocal values, loaded before the LLM call and cleared in a finally block:

AgentServiceImpl.generateResponse():
  → set ThreadLocals: accessToken, sellerProfile, itemProfile, sellerEmail, questionId
  → call aiService.answer(questionText, itemId)
  → check escalation flag
  → finally: clear all ThreadLocals

The Feign interceptor for MeLi's API reads the OAuth token from the same ThreadLocal and injects it as an Authorization header — no token ever touches the LLM conversation, no database lookup needed mid-loop.

4. Latency Breakdown

The webhook returns 200 OK immediately. Everything after is asynchronous:

Webhook received → 200 OK                              < 5ms
Async thread starts:
  Token validation (fast path: DB read)               ~20–50ms
  Fetch question from MeLi API                        ~100–200ms
  DB write (PROCESSING status)                        ~5ms
  Fetch seller + item profiles from DB                ~10ms
  Build related products context (DB query)           ~10ms
  Agent call (dominant cost):
    QuestionContextTool → getItem()                   ~100–250ms
    QuestionContextTool → getItemDescription()        ~80–200ms
    LLM inference (gpt-4o-mini, ~400–800 tokens in)  ~600ms–2.5s
  Post answer to MeLi (if auto-send)                  ~100–200ms
  DB update (final status)                            ~5ms
─────────────────────────────────────────────────────
Total: 1.2s – 5s

The two MeLi API calls inside QuestionContextTool run sequentially — they could be parallelized. We haven't done it yet because it requires moving parallelism into the assembler layer explicitly, and current P95 latency is acceptable. It's the clearest optimization still on the table.

5. The Auto-Send Decision

The most sensitive gate in the pipeline: send the LLM's answer directly to MeLi, or hold it in the Shopao seller portal for review.

Three independent conditions must all be true to auto-send:

autoRespondEnabled — a per-seller flag (defaults to true)
Work schedule — timezone-aware blackout windows the seller configures as JSON ([{"days":["MON",...], "startHour":9, "endHour":18}]), with support for overnight ranges like 22:00–06:00
No escalation sentinel — if the LLM returned __ESCALATED__, the question is always held blank regardless of the other settings

If any condition fails, the question lands in READY_FOR_REVIEW. The seller approves or edits with one click.

6. The Escalation System — and What Broke

This was the most interesting problem to debug.

Early design: custom exception. The escalation tool threw a RuntimeException with super(null) — the idea was to signal "stop, don't answer" by interrupting the LangChain4j loop.

// Early version — this broke
public String escalate(String reason) {
    throw new EscalationSignal(); // RuntimeException with null message
}

LangChain4j 1.2.0 catches exceptions thrown from tool methods and wraps them in a ToolExecutionResultMessage. Internally it calls e.getMessage(), which returned null for our exception. That null hit ToolExecutionResultMessage.from(id, null), which threw IllegalArgumentException, which surfaced as a 500 to the caller. Every escalation killed the request.

Current design: sentinel string + ThreadLocal flag.

// Tool returns a sentinel string and sets a flag
public String escalate(String reason) {
    TRIGGERED.set(true);       // ThreadLocal flag
    sendEscalationEmail();     // notify seller immediately
    return "__ESCALATED__";    // LangChain4j sees this as a normal tool result
}

// Caller checks the flag after the LLM returns
String result = aiService.answer(questionText, itemId);
if (EscalationTool.wasTriggered()) {
    return ESCALATED_SENTINEL; // discard whatever text the LLM generated post-escalation
}

We check the flag after answer() returns because the LLM sometimes echoes the sentinel string back as part of its response text. Checking the flag lets us discard that and return cleanly.

The escalation tool also sends a transactional email to the seller with a direct link to the question in the portal. If the email fails, it logs a warning and continues — the escalation happens regardless.

7. Preventing Over-Escalation

We learned the hard way that LLMs escalate on borderline questions whenever you give them an opening. The tool description is now deliberately over-specified:

ONLY call this tool when ALL of these conditions are met:
(1) You already called the product context tool AND the seller profile tool.
(2) The answer is genuinely impossible — not just uncertain, but impossible.
(3) The question matches one of these specific categories:
    price negotiation; post-sale issue; request for private contact info;
    product the seller doesn't carry; real-time ERP data unavailable in the listing.

When in doubt: answer. Do not escalate.

The key distinction is "genuinely impossible — not just uncertain". The system prompt backs this up with concrete examples: if the listing says "compatible with iPhone 15" and the buyer asks about the 15 Pro, the agent should answer — the Pro is part of the same family. Reasonable inference is expected. Fabrication is not.

8. Multi-SKU Variants and the Stock Tradeoff

MercadoLibre listings have a variations array with attribute_combinations per variant. A phone listing might have 12 variations: 4 colors × 3 storage tiers. We pass these as a flat list:

AVAILABLE VARIANTS:
- Color: Negro Almacenamiento: 128GB
- Color: Negro Almacenamiento: 256GB
- Color: Blanco Almacenamiento: 128GB
- Color: Blanco Almacenamiento: 256GB

We deliberately don't include per-variant stock levels or per-variant prices. That would require a separate API call per variation ID — too slow for the real-time pipeline. The LLM knows which combinations exist but not whether each is currently in stock. For the vast majority of questions this doesn't matter; for "do you have the white 256GB?" the model answers affirmatively if the variant exists and hedges if it can't confirm stock.

9. OAuth Token Safety Under Concurrent Webhooks

MercadoLibre issues single-use refresh tokens — use one to refresh, and the old token is immediately invalidated. Two concurrent webhook threads hitting a near-expired token simultaneously would cause the second refresh to fail with invalid_token, requiring a full seller re-auth.

The fix: per-account synchronized block with double-checked locking. The first thread acquires the lock, refreshes, and persists the new token before releasing. Any thread that was waiting re-reads from the DB inside the lock, finds the token already fresh, and returns early without touching the refresh endpoint.

We also apply a 5-minute buffer — we refresh before the token actually expires to avoid edge-case 401s mid-request. If the refresh itself fails (seller revoked access), we mark the account needsReconnect=true and surface a 401 so the seller re-authenticates through the Shopao OAuth flow.

Full context on why we built this: chatbot vs. AI agent for MercadoLibre sellers

Keyword bot vs. LLM agent for e-commerce Q&A: a technical breakdown

tzedaka — Thu, 09 Apr 2026 09:28:41 +0000

Most automation tools for MercadoLibre sellers fall into one of two categories: keyword-based chatbots (MercadoBot, Yobot, JaimeBot) or LLM-powered agents. From the outside, they look similar — buyer asks a question, system sends a reply. Under the hood, they're completely different architectures with very different failure modes.

This post breaks down how each works technically, where each fails, and why the difference matters at scale.

How a keyword bot works

The core logic is a rule engine. At setup, the seller configures a list of keyword → response pairs. At runtime:

def keyword_bot_reply(question: str, rules: list[dict]) -> str | None:
    question_lower = question.lower()
    for rule in rules:
        if any(kw.lower() in question_lower for kw in rule["keywords"]):
            return rule["response"]
    return None  # no match → fallback or no reply

That's essentially it. The system checks if the incoming text contains a preconfigured string. If it does, it returns the associated template. If it doesn't, the question goes unanswered or gets a generic fallback.

The structural problem: the bot has no access to the product listing. It responds with text the seller wrote at configuration time — which may be outdated, incomplete, or wrong for a specific variant. The seller has to manually anticipate every phrasing a buyer might use.

At small scale with simple catalogs (10 products, predictable questions), this works fine. At scale or with technical products, it breaks.

How an LLM agent works

An LLM agent doesn't pattern-match — it reasons. The key architectural difference is context injection: before generating a reply, the agent retrieves real data from the product listing and injects it into the prompt.

def agent_reply(question: str, listing_id: str) -> str:
    # 1. Fetch live listing context from MeLi API
    listing = mercadolibre_api.get_listing(listing_id)
    context = {
        "title": listing["title"],
        "attributes": listing["attributes"],       # voltage, dimensions, compatibility...
        "variations": listing["variations"],       # sizes, colors, models
        "description": listing["description"],
        "seller_profile": get_seller_profile(),    # warranty, shipping, return policy
    }

    # 2. Build prompt with context
    prompt = f"""
    You are a sales assistant for a MercadoLibre seller.

    Product context:
    {json.dumps(context, ensure_ascii=False)}

    Buyer question: {question}

    Answer the question using only the product context provided.
    If the information is not available in the context, say so clearly.
    """

    # 3. Call LLM
    return llm.complete(prompt)

The agent can answer questions about voltage, compatibility, variants, and warranty without any preconfigured rules — because it reads the actual listing before generating the reply.

Failure modes by question type

This is where the architectural difference becomes practical:

Question type	Example	Keyword bot	LLM agent
Exact keyword match	"¿tiene garantía?"	✅ works	✅ works
Synonym / paraphrase	"¿qué cobertura tiene en fallas?"	❌ no match	✅ reads warranty from profile
Technical spec	"¿sirve para 220V?"	❌ unless pre-configured	✅ reads voltage attribute
Variant combination	"¿el azul también viene en XL?"	❌	✅ reads variations
Compatibility	"¿funciona con mi HP 14s?"	❌	✅ reads compatibility attributes
Out-of-scope	"¿hacen instalación a domicilio?"	❌	✅ returns "not available" cleanly

For catalogs with simple, predictable questions, keyword bots cover 60–70% of cases. For technical catalogs (electronics, tools, auto parts), the unmatched rate is typically 30–50%.

The context window problem

LLM agents have their own failure mode: context quality.

The agent is only as good as the data injected into the prompt. If the listing has incomplete attributes — no voltage listed, missing dimensions, vague description — the agent has nothing to work with and will either hallucinate or return a non-answer.

// Bad listing attributes
attributes = [
    {"id": "BRAND", "value": "Samsung"},
    {"id": "MODEL", "value": "Galaxy Tab"},
]
// Agent can't answer: "¿funciona con 220V?" — voltage not in context

// Good listing attributes  
attributes = [
    {"id": "BRAND", "value": "Samsung"},
    {"id": "MODEL", "value": "Galaxy Tab"},
    {"id": "VOLTAGE", "value": "110V/220V"},
    {"id": "CONNECTIVITY", "value": "WiFi, Bluetooth 5.0"},
    {"id": "COMPATIBLE_DEVICES", "value": "Android, Windows, Mac"},
]
// Agent answers voltage, compatibility questions correctly

This means the quality of automated replies is directly tied to how complete the listing data is — which is actually a useful forcing function for sellers to maintain better catalog hygiene.

MercadoLibre's native AI: a hybrid case

MeLi introduced an AI suggestion feature in 2024–2025. It reads the listing and generates a suggested reply — which is genuinely good quality for standard questions.

The catch: it doesn't auto-send. The seller has to approve each suggestion manually. Architecturally, it's an LLM agent without the final delivery step. For sellers with off-hours volume (60% of questions arrive outside business hours per Ventiapp 2025 data), this doesn't solve the automation problem.

The delta between MeLi's native AI and external agents like Shopao is one step in the pipeline: auto-send vs. manual approval.

When to use each

Keyword bot: small catalog, simple questions, budget-constrained, seller willing to invest setup time per product category.

LLM agent: technical catalog, high off-hours volume, multi-variant products, seller wants zero configuration per question type.

Hybrid (keyword first, LLM fallback): high volume scenarios where you want to minimize LLM API calls for trivially answerable questions. The keyword layer handles FAQ-type questions cheaply; the LLM handles everything else.

def hybrid_reply(question: str, listing_id: str, rules: list[dict]) -> str:
    # Try cheap keyword match first
    quick_reply = keyword_bot_reply(question, rules)
    if quick_reply:
        return quick_reply

    # Fall back to LLM for complex/unmatched questions
    return agent_reply(question, listing_id)

Takeaway

The choice isn't really "chatbot vs. AI" — it's about whether your reply system has access to real product context at inference time. A keyword bot configured with correct answers can outperform a poorly-prompted LLM agent. But a keyword bot fundamentally cannot answer questions it wasn't configured for, while an LLM agent with good listing data can handle the full distribution of buyer questions without any manual rule configuration.

For technical catalogs on MercadoLibre, the unmatched question rate with keyword bots is high enough that the difference in conversion is measurable.

Full comparison including pricing and real case studies: shopao.io/blog

Tags: #ai #machinelearning #ecommerce #python