Using LLMs for Text Classification Tasks

#learnai #oxlo #ai

We are going to build a support ticket classifier that reads unstructured customer messages and returns structured category, urgency, and confidence scores. This replaces fragile keyword rules and removes the need to train and deploy a separate model for every new classification schema. I run it on Oxlo.ai because one flat cost per request, not per token, means a five-word subject line and a five-paragraph error log cost exactly the same (https://oxlo.ai/pricing), which keeps batch processing predictable.

What you'll need

Python 3.10 or newer
An Oxlo.ai API key from https://portal.oxlo.ai
The OpenAI SDK: pip install openai

Step 1: Configure the Oxlo.ai client

I keep the API key in an environment variable and instantiate the client with Oxlo.ai's base URL. A quick smoke test confirms the endpoint is alive and that there are no cold starts on the model.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY")
)

# Smoke test
resp = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "hello"}]
)
print(resp.choices[0].message.content)

Step 2: Design the system prompt

The system prompt is the entire classifier. It defines the categories, the urgency scale, and the exact JSON keys the application will parse, so I treat it as a config file I version control.

SYSTEM_PROMPT = """You are a support ticket classifier.
Analyze the user's message and return a single JSON object with exactly these keys:
  category: one of ["Technical", "Billing", "Feature Request"]
  urgency: one of ["Low", "Medium", "High", "Critical"]
  summary: a concise summary of the issue in 12 words or fewer
  confidence: a float between 0.0 and 1.0 representing classification certainty

Classification rules:
- Technical covers bugs, crashes, errors, and integration problems.
- Billing covers invoices, refunds, failed charges, and plan changes.
- Feature Request covers asks for new functionality or improvements.
Return only raw JSON. Do not wrap the output in markdown fences."""

Step 3: Build the classifier function

I use JSON mode so the model is constrained to valid JSON, then parse it with the standard library. A low temperature keeps the output deterministic.

import json

def classify_ticket(text: str) -> dict:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text},
        ],
        response_format={"type": "json_object"},
        temperature=0.1,
    )
    raw = response.choices[0].message.content
    return json.loads(raw)

# Single test
ticket = "I was charged twice on March 1st and I need a refund."
result = classify_ticket(ticket)
print(json.dumps(result, indent=2))

Step 4: Process a batch

In production you will classify more than one ticket, so I loop over a list and print a simple report. Because Oxlo.ai bills per request, the cost of this batch is simply the number of tickets, regardless of how verbose each ticket is.

tickets = [
    "The API returns a 500 error when I upload files larger than 1MB.",
    "Can you add webhook signatures so I can verify events?",
    "My invoice shows the wrong company name and tax ID.",
    "Dashboard is completely blank after I log in. I cleared cache already.",
]

for t in tickets:
    out = classify_ticket(t)
    print(f"{out['urgency']:8} | {out['category']:15} | {out['summary']}")

Step 5: Add confidence guardrails

LLMs can hallucinate categories on ambiguous input. I wrap the core call with a confidence threshold and a generic exception handler so low-confidence or malformed outputs land in a human review queue instead of the wrong automation path.

def classify_safe(text: str, min_confidence: float = 0.85) -> dict:
    try:
        out = classify_ticket(text)
        if out.get("confidence", 0.0) < min_confidence:
            out["category"] = "Needs Review"
            out["urgency"] = "Medium"
        return out
    except Exception as exc:
        return {
            "category": "Error",
            "urgency": "Medium",
            "summary": str(exc),
            "confidence": 0.0,
        }

# Re-run with guardrails
for t in tickets:
    out = classify_safe(t)
    print(f"{out['category']:15} | conf={out['confidence']} | {out['summary']}")

Run it

Save the script as classifier.py, export your key, and execute it.

$ export OXLO_API_KEY="sk-oxlo.ai-..."
$ python classifier.py

Expected output:

High     | Billing         | Double charge refund request
High     | Technical       | API 500 error on large uploads
Low      | Feature Request | Add webhook signatures
Medium   | Technical       | Dashboard blank after login

Wrap-up and next steps

This classifier is now a drop-in replacement for brittle regex rules or expensive per-token classification pipelines. On Oxlo.ai, you can swap llama-3.3-70b to deepseek-r1-671b or kimi-k2.6 when you need deeper reasoning for ambiguous edge cases, and the cost structure stays flat per request. Two concrete next steps: wire the classify_safe output into a Slack webhook for human review, or extend the schema with a custom taxonomy and feed the results into your CRM.

Top comments (1)

Marcus Kim • Jun 17

This is where AI-assisted development gets interesting to me: the faster the code appears, the more important the review criteria become. I want the tool to know the expected behavior, edge cases, and failure signs before it starts changing files, otherwise speed can hide fragility.