We are going to build a support ticket triage agent that classifies incoming messages, extracts key entities, and drafts a response in a single inference pass. A traditional machine learning stack would need a dedicated classifier, a named entity recognition model, and a template engine stitched together with feature pipelines and versioning. We will replace that stack with one LLM call through Oxlo.ai.
What you'll need
- Python 3.10 or newer
- The OpenAI SDK:
pip install openai - An Oxlo.ai API key from https://portal.oxlo.ai
Step 1: Set up the Oxlo.ai client
Initialize the OpenAI SDK against Oxlo.ai and verify connectivity. I use Llama 3.3 70B as the workhorse model because it follows complex instructions reliably and runs with no cold starts on Oxlo.ai.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.environ.get("OXLO_API_KEY")
)
if not client.api_key:
raise ValueError("Set the OXLO_API_KEY environment variable.")
# Verify connectivity with a lightweight call
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "Reply OK"},
{"role": "user", "content": "ping"},
],
)
print(response.choices[0].message.content)
Step 2: Encode the triage policy in a system prompt
This prompt is the only training we need. It defines the taxonomy, priority rules, and JSON schema that the LLM will emit. No feature engineering or labeled dataset is required.
SYSTEM_PROMPT = """
You are a support triage agent. Analyze the user ticket and output a single JSON object with these exact keys:
- category: one of Billing, Technical, Account, or Other
- priority: one of Low, Medium, High, Critical
- product_area: a short string like Auth, API, Dashboard, or Unknown
- entities: a list of objects, each with "name" and "type" (for example {"name": "Pro Plan", "type": "product"})
- draft_response: a brief, helpful first response written in the same language as the ticket
Rules:
- Critical priority for outages, security reports, or payment failures.
- Output only valid JSON. No markdown fences, no commentary.
"""
Step 3: Build the triage function
This is the core replacement for a traditional ML pipeline. Where we would otherwise vectorize text, run a scikit-learn classifier for category, run spaCy for entities, and render a Jinja template for the response, one Oxlo.ai request returns structured data.
import json
def triage_ticket(ticket_text: str) -> dict:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": ticket_text},
],
)
raw = response.choices[0].message.content.strip()
# Strip accidental markdown fences
if raw.startswith("
```"):
raw = raw.split("```
")[1]
if raw.startswith("json"):
raw = raw[4:]
return json.loads(raw.strip())
Step 4: Process a batch of tickets
In production, tickets arrive continuously. Because Oxlo.ai uses request-based pricing, long tickets with stack traces do not inflate cost the way they would on token-based providers, so we can pass the full raw text without aggressive truncation.
if __name__ == "__main__":
tickets = [
"I was charged twice for the Pro Plan this month. My account ID is 88231. Please fix this immediately.",
"The /v1/batch endpoint returns a 500 error every time I send more than 1000 items. This is blocking our nightly sync.",
"How do I reset my password? I tried the link but it expired.",
]
for t in tickets:
result = triage_ticket(t)
print(json.dumps(result, indent=2))
Step 5: Escalate ambiguous cases with a reasoning model
Traditional classifiers fail silently on edge cases. When a ticket contains conflicting signals, we can route it to a reasoning model like Qwen 3 32B on Oxlo.ai to get explicit chain-of-thought analysis before finalizing structured output.
def triage_with_reasoning(ticket_text: str) -> dict:
reasoning_prompt = """
You are a senior support engineer. Think step by step about this ticket.
First, analyze the user's intent and urgency signals.
Then output a JSON object with:
- reasoning: your step-by-step analysis as a string
- category, priority, product_area, entities, draft_response (same rules as before)
Output only valid JSON.
"""
response = client.chat.completions.create(
model="qwen-3-32b",
messages=[
{"role": "system", "content": reasoning_prompt},
{"role": "user", "content": ticket_text},
],
)
raw = response.choices[0].message.content.strip()
if raw.startswith("
```"):
raw = raw.split("```
")[1]
if raw.startswith("json"):
raw = raw[4:]
return json.loads(raw.strip())
Run it
Execute the batch script. Below is representative output from Oxlo.ai showing the structured JSON for two tickets.
$ python triage.py
OK
{
"category": "Billing",
"priority": "High",
"product_area": "Billing",
"entities": [
{"name": "Pro Plan", "type": "product"},
{"name": "88231", "type": "account_id"}
],
"draft_response": "I have located the duplicate charge on account 88231 and initiated a refund. You should see the credit within 3 to 5 business days."
}
{
"category": "Technical",
"priority": "Critical",
"product_area": "API",
"entities": [
{"name": "/v1/batch", "type": "endpoint"}
],
"draft_response": "Thank you for reporting this. Our engineering team is investigating the 500 error on the batch endpoint. As a workaround, please cap requests at 900 items while we deploy a fix."
}
{
"category": "Account",
"priority": "Low",
"product_area": "Auth",
"entities": [
{"name": "password reset link", "type": "feature"}
],
"draft_response": "You can request a new password reset link from the sign-in page. The link expires after 24 hours for security. Let me know if you need further help."
}
Wrap-up and next steps
Next, wire this into a FastAPI endpoint that receives webhooks from your helpdesk, or add a Pydantic model to validate the JSON schema before writing to Postgres. If you are processing thousands of tickets daily, Oxlo.ai's flat per-request pricing removes the cost uncertainty that comes with long stack traces and conversation history. See https://oxlo.ai/pricing for plan details.
Top comments (0)