tzedaka

Posted on Apr 9 • Originally published at shopao.io

How We Pass Listing Context to an LLM...

#langchain4j #mercadolibre #springboot #automation

We built Shopao, an AI agent that automatically answers buyer questions on MercadoLibre listings. When a buyer posts a question, the system intercepts the webhook, assembles context from the listing, the seller's business profile, and their catalog — then either sends a reply directly to MeLi or holds it in the seller's portal for one-click approval.

This post covers the specific decisions we made in wiring real marketplace data into an LLM reasoning loop, including one bug that took us an embarrassing amount of time to debug.

The Stack

Backend: Spring Boot 3 (REST API + webhook handler)
Agent service: Spring Boot 3 + LangChain4j (stateless reasoning loop)
LLM: swappable via config — currently OpenAI gpt-4o-mini, with Gemini and Ollama also supported
Marketplace: MercadoLibre (Argentina/LATAM)

The agent service is fully stateless — it receives a request, runs the LangChain4j reasoning loop, and returns a suggested reply. No session, no DB access inside the loop.

1. How the Prompt Is Constructed

The most important architectural decision: the LLM only sees the buyer's question and the item ID in the user message. Everything else arrives through tool calls during the reasoning loop.

// LangChain4j @AiService interface
@SystemMessage(fromResource = "/prompts/system-message.txt")
@UserMessage(fromResource = "/prompts/user-message.txt")
String answer(@V("questionText") String questionText, @V("itemId") String itemId);

The user message template is minimal — question text, item ID, and a workflow instruction. The LLM then calls three tools in order:

QuestionContextTool — hits MeLi's public API, formats listing data
SellerProfileTool — returns seller policies (warranty, shipping, returns) from a ThreadLocal
RelatedProductsTool — returns catalog context for upsell/cross-sell, called only when relevant

We intentionally don't pre-inject listing data as a JSON blob into the user message. The tool-calling pattern lets the LLM skip the related products tool on purely technical questions, saving tokens and ~200ms per request.

2. Handling Incomplete Listing Data

MercadoLibre listings vary wildly in data quality. Some have 40 structured attributes; some have a title and nothing else. A naive implementation passes the raw API response to the LLM and lets it deal with nulls — we don't do that.

Three-layer defense:

Layer 1 — Description fallback. The description endpoint returns 404 for many listings. Feign throws on 4xx by default, so we catch it and return the literal string "(no description)". An empty string would be worse — the model might try to fabricate one.

Layer 2 — Attribute filtering at render time. MeLi's API frequently returns attribute objects where name is populated but value_name is null. We drop those silently:

for (Attribute attr : attributes) {
    if (attr.getValueName() != null && !attr.getValueName().isBlank()) {
        sb.append("- ").append(attr.getName()).append(": ").append(attr.getValueName()).append("\n");
    }
}

Sending - Brand: null to the LLM is worse than omitting it — it takes up tokens and can confuse the model into treating "null" as a real value.

Layer 3 — Tool-level fallback. If the entire tool call fails (network timeout, MeLi outage), the tool returns "Product context not available for item {id}." instead of throwing. The agent falls back to whatever it can infer from the seller profile alone — imperfect, but it keeps the pipeline alive.

3. Passing Per-Request State Without Polluting the LLM Context

The seller's OAuth token, business profile, and email address all need to be reachable during the reasoning loop — but none of them should appear in any LLM message.

We use a set of ThreadLocal values, loaded before the LLM call and cleared in a finally block:

AgentServiceImpl.generateResponse():
  → set ThreadLocals: accessToken, sellerProfile, itemProfile, sellerEmail, questionId
  → call aiService.answer(questionText, itemId)
  → check escalation flag
  → finally: clear all ThreadLocals

The Feign interceptor for MeLi's API reads the OAuth token from the same ThreadLocal and injects it as an Authorization header — no token ever touches the LLM conversation, no database lookup needed mid-loop.

4. Latency Breakdown

The webhook returns 200 OK immediately. Everything after is asynchronous:

Webhook received → 200 OK                              < 5ms
Async thread starts:
  Token validation (fast path: DB read)               ~20–50ms
  Fetch question from MeLi API                        ~100–200ms
  DB write (PROCESSING status)                        ~5ms
  Fetch seller + item profiles from DB                ~10ms
  Build related products context (DB query)           ~10ms
  Agent call (dominant cost):
    QuestionContextTool → getItem()                   ~100–250ms
    QuestionContextTool → getItemDescription()        ~80–200ms
    LLM inference (gpt-4o-mini, ~400–800 tokens in)  ~600ms–2.5s
  Post answer to MeLi (if auto-send)                  ~100–200ms
  DB update (final status)                            ~5ms
─────────────────────────────────────────────────────
Total: 1.2s – 5s

The two MeLi API calls inside QuestionContextTool run sequentially — they could be parallelized. We haven't done it yet because it requires moving parallelism into the assembler layer explicitly, and current P95 latency is acceptable. It's the clearest optimization still on the table.

5. The Auto-Send Decision

The most sensitive gate in the pipeline: send the LLM's answer directly to MeLi, or hold it in the Shopao seller portal for review.

Three independent conditions must all be true to auto-send:

autoRespondEnabled — a per-seller flag (defaults to true)
Work schedule — timezone-aware blackout windows the seller configures as JSON ([{"days":["MON",...], "startHour":9, "endHour":18}]), with support for overnight ranges like 22:00–06:00
No escalation sentinel — if the LLM returned __ESCALATED__, the question is always held blank regardless of the other settings

If any condition fails, the question lands in READY_FOR_REVIEW. The seller approves or edits with one click.

6. The Escalation System — and What Broke

This was the most interesting problem to debug.

Early design: custom exception. The escalation tool threw a RuntimeException with super(null) — the idea was to signal "stop, don't answer" by interrupting the LangChain4j loop.

// Early version — this broke
public String escalate(String reason) {
    throw new EscalationSignal(); // RuntimeException with null message
}

LangChain4j 1.2.0 catches exceptions thrown from tool methods and wraps them in a ToolExecutionResultMessage. Internally it calls e.getMessage(), which returned null for our exception. That null hit ToolExecutionResultMessage.from(id, null), which threw IllegalArgumentException, which surfaced as a 500 to the caller. Every escalation killed the request.

Current design: sentinel string + ThreadLocal flag.

// Tool returns a sentinel string and sets a flag
public String escalate(String reason) {
    TRIGGERED.set(true);       // ThreadLocal flag
    sendEscalationEmail();     // notify seller immediately
    return "__ESCALATED__";    // LangChain4j sees this as a normal tool result
}

// Caller checks the flag after the LLM returns
String result = aiService.answer(questionText, itemId);
if (EscalationTool.wasTriggered()) {
    return ESCALATED_SENTINEL; // discard whatever text the LLM generated post-escalation
}

We check the flag after answer() returns because the LLM sometimes echoes the sentinel string back as part of its response text. Checking the flag lets us discard that and return cleanly.

The escalation tool also sends a transactional email to the seller with a direct link to the question in the portal. If the email fails, it logs a warning and continues — the escalation happens regardless.

7. Preventing Over-Escalation

We learned the hard way that LLMs escalate on borderline questions whenever you give them an opening. The tool description is now deliberately over-specified:

ONLY call this tool when ALL of these conditions are met:
(1) You already called the product context tool AND the seller profile tool.
(2) The answer is genuinely impossible — not just uncertain, but impossible.
(3) The question matches one of these specific categories:
    price negotiation; post-sale issue; request for private contact info;
    product the seller doesn't carry; real-time ERP data unavailable in the listing.

When in doubt: answer. Do not escalate.

The key distinction is "genuinely impossible — not just uncertain". The system prompt backs this up with concrete examples: if the listing says "compatible with iPhone 15" and the buyer asks about the 15 Pro, the agent should answer — the Pro is part of the same family. Reasonable inference is expected. Fabrication is not.

8. Multi-SKU Variants and the Stock Tradeoff

MercadoLibre listings have a variations array with attribute_combinations per variant. A phone listing might have 12 variations: 4 colors × 3 storage tiers. We pass these as a flat list:

AVAILABLE VARIANTS:
- Color: Negro Almacenamiento: 128GB
- Color: Negro Almacenamiento: 256GB
- Color: Blanco Almacenamiento: 128GB
- Color: Blanco Almacenamiento: 256GB

We deliberately don't include per-variant stock levels or per-variant prices. That would require a separate API call per variation ID — too slow for the real-time pipeline. The LLM knows which combinations exist but not whether each is currently in stock. For the vast majority of questions this doesn't matter; for "do you have the white 256GB?" the model answers affirmatively if the variant exists and hedges if it can't confirm stock.

9. OAuth Token Safety Under Concurrent Webhooks

MercadoLibre issues single-use refresh tokens — use one to refresh, and the old token is immediately invalidated. Two concurrent webhook threads hitting a near-expired token simultaneously would cause the second refresh to fail with invalid_token, requiring a full seller re-auth.

The fix: per-account synchronized block with double-checked locking. The first thread acquires the lock, refreshes, and persists the new token before releasing. Any thread that was waiting re-reads from the DB inside the lock, finds the token already fresh, and returns early without touching the refresh endpoint.

We also apply a 5-minute buffer — we refresh before the token actually expires to avoid edge-case 401s mid-request. If the refresh itself fails (seller revoked access), we mark the account needsReconnect=true and surface a 401 so the seller re-authenticates through the Shopao OAuth flow.

Full context on why we built this: chatbot vs. AI agent for MercadoLibre sellers