Building Language Understanding Systems for Chatbots with LLM

#learnai #oxlo #ai

We are building a language understanding layer for a customer support chatbot that classifies user intent and extracts structured entities before generating a response. This pattern replaces brittle keyword matching with an LLM-powered router that actually grasps context. Teams running high-volume support workloads can deploy this on Oxlo.ai and pay per request instead of per token, which keeps costs predictable even when the context window grows.

What you'll need

Python 3.10 or newer
An Oxlo.ai API key from https://portal.oxlo.ai
The OpenAI SDK: pip install openai

Step 1: Define the intent schema and system prompt

We start by locking down the intents and entities our bot must recognize. I keep this in a Python dictionary so the prompt stays in sync with the code.

INTENTS = [
    "billing_issue",
    "technical_support",
    "account_access",
    "refund_request",
    "general_question"
]

ENTITIES = ["order_id", "email", "product_name", "severity"]

SYSTEM_PROMPT = """You are the language understanding layer for a customer support chatbot.
Analyze the user's message and output strictly valid JSON with no markdown formatting.

Fields:
- intent: one of {intents}
- confidence: float 0.0 to 1.0
- entities: object with keys {entities}; use null if missing
- next_action: one of ["ask_clarification", "provide_solution", "escalate_human"]
- reasoning: one sentence explaining your classification

Rules:
- If an order_id is mentioned, always extract it.
- If the user sounds frustrated, set severity to "high" and next_action to "escalate_human".
- Do not output markdown code blocks, only raw JSON.
""".format(intents=", ".join(INTENTS), entities=", ".join(ENTITIES))

Step 2: Build the understanding function with Oxlo.ai

Now we wire the prompt to an Oxlo.ai model. I use Llama 3.3 70B because it follows structured output instructions reliably, and on Oxlo.ai each call costs one flat request regardless of how much prompt context we include.

import json
from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

def understand(message: str) -> dict:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": message},
        ],
        temperature=0.1,
        max_tokens=512,
    )
    raw = response.choices[0].message.content.strip()
    # Strip any accidental markdown fences
    if raw.startswith("

```"):
        raw = raw.split("\n", 1)[1].rsplit("```

", 1)[0].strip()
    return json.loads(raw)

Step 3: Route based on extracted intent and entities

With structured intent data coming back, we can route to the right handler instead of generating a generic reply every time. This keeps responses accurate and avoids unnecessary LLM calls for simple routing decisions.

def route(intent_data: dict, original_message: str) -> str:
    intent = intent_data.get("intent")
    action = intent_data.get("next_action")
    entities = intent_data.get("entities", {})

    if action == "escalate_human":
        return "I'm connecting you with a specialist now. Please hold."

    if intent == "billing_issue" and entities.get("order_id"):
        return f"I see you're asking about order {entities['order_id']}. Let me pull up your billing details."

    if intent == "technical_support":
        return "I can help troubleshoot that. What operating system are you running?"

    if intent == "account_access":
        return "I can help you regain access. Can you confirm the email on your account?"

    return "Thanks for reaching out. Can you tell me more so I can point you in the right direction?"

Step 4: Add session memory for multi-turn context

Real conversations are not single shots. We keep the last three understanding results in memory so the classifier does not flip intent mid-thread when the user replies with a short follow-up like "Yes, that one."

class ChatSession:
    def __init__(self):
        self.history = []

    def step(self, user_message: str):
        # Summarize recent turns for context
        context = ""
        for h in self.history[-3:]:
            context += f"Previous message: {h['user']}\nPrevious intent: {h['intent']['intent']}\n"

        enriched = context + f"Current message: {user_message}"
        intent_data = understand(enriched)
        reply = route(intent_data, user_message)

        self.history.append({
            "user": user_message,
            "intent": intent_data,
            "reply": reply
        })
        return reply, intent_data

Run it

Here is how to spin up a session and push a few messages through. The printed JSON lets us verify that the understanding layer is actually extracting structure before the user sees a reply.

if __name__ == "__main__":
    session = ChatSession()

    tests = [
        "I was charged twice for order #48291 and I need a refund now",
        "Actually I also cannot log into my account",
        "Yes, I tried resetting it already",
    ]

    for msg in tests:
        reply, data = session.step(msg)
        print(f"User: {msg}")
        print(f"Understanding: {json.dumps(data, indent=2)}")
        print(f"Bot: {reply}")
        print("-" * 40)

Example output from the first message:

User: I was charged twice for order #48291 and I need a refund now
Understanding: {
  "intent": "billing_issue",
  "confidence": 0.97,
  "entities": {
    "order_id": "48291",
    "email": null,
    "product_name": null,
    "severity": "high"
  },
  "next_action": "escalate_human",
  "reasoning": "User mentioned a duplicate charge with a specific order ID and expressed urgency."
}
Bot: I'm connecting you with a specialist now. Please hold.
----------------------------------------

Wrap-up

This understanding layer turns an LLM into a structured router. A solid next step is to wire the escalate_human path into a real ticketing system like Zendesk or Jira. Another is to cache frequent intent patterns in Redis so you skip the LLM call entirely on exact repeats, cutting costs further while keeping Oxlo.ai as the fallback for novel queries.

For pricing details on running this at scale, see https://oxlo.ai/pricing.