Building an IT Service Management Tool with LLMs: A Step-by-Step Guide

#engineering #oxlo #ai

We are building a lightweight IT service management triage agent that turns raw help-desk tickets into structured incident reports. It categorizes issues, assigns priority, and suggests runbook steps so IT teams can cut initial response time. The entire tool runs on Oxlo.ai's request-based API, which keeps costs flat even when ticket descriptions get long.

What you'll need

Python 3.10 or newer
The OpenAI SDK: pip install openai
An Oxlo.ai API key from https://portal.oxlo.ai

Step 1: Configure the Oxlo.ai client

First, I set up the OpenAI-compatible client pointing at Oxlo.ai and verify connectivity with a small ping. I am using llama-3.3-70b here because it handles short system commands reliably and starts instantly with no cold starts.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY")
)

# Verify the endpoint is alive
resp = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "ping"}],
    max_tokens=10
)
print("Oxlo.ai client ready:", resp.choices[0].message.content)

Step 2: Define the incident schema and system prompt

Next, I lock down the output format. The model must return strict JSON with a category, priority, summary, and runbook steps. Keeping the schema rigid prevents downstream parsing errors.

import json

SYSTEM_PROMPT = """You are an IT service management triage agent. Analyze the incident report and return valid JSON with exactly these keys:
- category: one of [Hardware, Software, Network, Security, Access]
- priority: one of [P1-Critical, P2-High, P3-Medium, P4-Low]
- summary: a one-sentence summary of the issue
- runbook_steps: a list of 2-4 immediate remediation steps

Respond ONLY with raw JSON. Do not wrap the output in markdown code fences."""

print("Prompt loaded.")

Step 3: Build the core triage function

Now I wire the prompt into a reusable function. I use qwen-3-32b because its reasoning capabilities handle noisy ticket text well. I also strip accidental markdown fences so json.loads never chokes.

def triage_incident(description):
    response = client.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Incident report:\n{description}"},
        ],
        temperature=0.2,
        max_tokens=512,
    )
    
    raw = response.choices[0].message.content.strip()
    if raw.startswith("

```"):
        raw = raw.split("\n", 1)[1].rsplit("```

", 1)[0].strip()
    
    return json.loads(raw)

# Smoke test
ticket = "VPN client disconnects every 5 minutes on macOS 14. No error messages."
result = triage_incident(ticket)
print(json.dumps(result, indent=2))

Step 4: Add lightweight incident history

Repeating incidents waste time. I keep a short in-memory log of previous tickets and feed the last three into context. For this step I switch to kimi-k2.6, which excels at reasoning over longer context windows. Because Oxlo.ai charges per request rather than per token, adding those extra history lines does not inflate cost.

INCIDENT_HISTORY = []

def triage_with_history(description):
    history_block = ""
    if INCIDENT_HISTORY:
        history_block = "Previously today:\n"
        for i, inc in enumerate(INCIDENT_HISTORY[-3:], 1):
            history_block += f"{i}. {inc['summary']} -> {inc['status']}\n"
    
    user_msg = f"{history_block}\nNew incident report:\n{description}"
    
    response = client.chat.completions.create(
        model="kimi-k2.6",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT + " If the new incident resembles a previous one, note the similarity and suggest the same fix first."},
            {"role": "user", "content": user_msg},
        ],
        temperature=0.2,
        max_tokens=512,
    )
    
    raw = response.choices[0].message.content.strip()
    if raw.startswith("

```"):
        raw = raw.split("\n", 1)[1].rsplit("```

", 1)[0].strip()
    
    parsed = json.loads(raw)
    INCIDENT_HISTORY.append({
        "summary": parsed.get("summary"),
        "status": "open"
    })
    return parsed

Step 5: Wrap it in a CLI

Finally, I add argument parsing so the team can pipe tickets straight from email or Slack into the agent. This turns the script into a real utility.

import argparse

def main():
    parser = argparse.ArgumentParser(description="ITSM Triage Agent on Oxlo.ai")
    parser.add_argument("ticket", help="Raw incident description")
    args = parser.parse_args()
    
    result = triage_with_history(args.ticket)
    
    print("=== INCIDENT TRIAGE ===")
    print(f"Category      : {result['category']}")
    print(f"Priority      : {result['priority']}")
    print(f"Summary       : {result['summary']}")
    print("Runbook steps :")
    for step in result["runbook_steps"]:
        print(f"  - {step}")

if __name__ == "__main__":
    main()

Run it

Save the full script as itsm_agent.py, export your key, and pass a ticket string:

export OXLO_API_KEY="YOUR_OXLO_API_KEY"
python itsm_agent.py "Printer on 4th floor showing offline, red blinking light. Multiple users reported."

Expected output:

=== INCIDENT TRIAGE ===
Category      : Hardware
Priority      : P3-Medium
Summary       : 4th floor printer offline with red blinking light affecting multiple users
Runbook steps :
  - Check printer power and cable connections
  - Restart printer and verify network connectivity
  - Inspect toner levels and clear any paper jams
  - Escalate to hardware vendor if fault persists

Next steps

Store INCIDENT_HISTORY in SQLite instead of memory so the agent persists across restarts. You could also add a Slack webhook that forwards P1 tickets directly to the on-call channel. Oxlo.ai's flat per-request pricing makes it cheap to run this agent continuously, even when tickets arrive with lengthy log attachments. For details on request-based plans, see https://oxlo.ai/pricing.