We are building a network incident triage agent that ingests raw syslog and alarm text from cell towers and core routers, then returns structured severity scores, probable root causes, and next-step runbooks. It is aimed at NOC engineers who need to cut through alert noise during outages. Because the agent often processes large syslog dumps, Oxlo.ai's request-based pricing keeps costs flat regardless of how many lines you feed into the context window.
What you'll need
Python 3.10 or newer, an Oxlo.ai API key from https://portal.oxlo.ai, and the OpenAI SDK installed with pip install openai.
Step 1: Design the incident schema
Start with a Pydantic model so the LLM returns structured data that downstream automation can rely on. This removes fragile string parsing from your pipeline.
from pydantic import BaseModel, Field
from typing import List
class NetworkIncident(BaseModel):
severity: str = Field(description="One of: CRITICAL, HIGH, MEDIUM, LOW")
category: str = Field(description="e.g., BGP, RF, Backhaul, Power, Core")
root_cause: str = Field(description="Short technical summary, max 20 words")
affected_services: List[str] = Field(description="Impacted cell sectors, VLANs, or service IDs")
next_steps: List[str] = Field(description="Ordered remediation actions")
Step 2: Write the system prompt
The system prompt is the only manual SRE knowledge we inject. It enforces valid JSON output and constrains the model to telecom-specific categories.
SYSTEM_PROMPT = """You are a senior telecom SRE with 10 years of experience in RAN and core networks.
Analyze the provided syslog or alarm text and produce a structured incident assessment.
Rules:
- Output ONLY valid JSON. No markdown, no preamble.
- severity must be one of: CRITICAL, HIGH, MEDIUM, LOW.
- category must be one of: BGP, RF, Backhaul, Power, Core, Transport, Other.
- root_cause must be a single sentence.
- affected_services must be a list of strings.
- next_steps must be an ordered list of concrete remediation actions.
If the log is ambiguous, mark severity as MEDIUM and set root_cause to \"Ambiguous - manual triage required\"."""
Step 3: Build the triage client
Wire the prompt to Oxlo.ai using the OpenAI SDK. I use Llama 3.3 70B here because it follows structured instructions reliably and handles technical jargon well.
import json
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
def triage_incident(syslog_text: str) -> dict:
user_message = f"Syslog:\n{syslog_text}"
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
temperature=0.1,
max_tokens=1024,
)
raw = response.choices[0].message.content.strip()
# Strip accidental markdown fences
raw = raw.removeprefix("
```json").removeprefix("```
").removesuffix("
```").strip()
return json.loads(raw)
Step 4: Add context from a knowledge base
To reduce hallucination, prepend relevant runbook snippets based on keyword matches. In production you would replace this with a vector store, but a simple lookup is enough to show the pattern.
KB = {
"BGP": "BGP peer flaps often indicate upstream provider maintenance or a failing optic on port xe-0/0/1.",
"RF": "High VSWR alarms usually trace to loose antenna connectors or water ingress in the jumper cable.",
"POWER": "Rectifier faults at remote sites typically follow battery degradation or AC input swings.",
}
def triage_with_context(syslog_text: str) -> dict:
hits = [f"{k}: {v}" for k, v in KB.items() if k in syslog_text.upper()]
kb_block = "\n".join(hits) if hits else "No matching runbook entries."
user_message = f"""Relevant runbook context:
{kb_block}
Syslog:
{syslog_text}"""
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
temperature=0.1,
max_tokens=1024,
)
raw = response.choices[0].message.content.strip()
raw = raw.removeprefix("```
json").removeprefix("
```").removesuffix("```
").strip()
return json.loads(raw)
Step 5: Wrap it in a CLI
Add a small argparse interface so you can pipe raw alarm files directly into the agent from your NOC workstation.
import argparse
import json
def main():
parser = argparse.ArgumentParser(description="Triage a telecom syslog file via Oxlo.ai")
parser.add_argument("file", help="Path to syslog or alarm text")
args = parser.parse_args()
log_text = open(args.file, "r").read()
result = triage_with_context(log_text)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()
Run it
Create a sample alarm and invoke the script. The agent returns structured JSON you can feed directly into a ticketing system.
$ cat alarm.txt
2024-05-21T14:03:11Z cell-tower-42 RF_LINK_FAIL VSWR=3.4 Antenna-3G-SEC-07
$ python triage.py alarm.txt
Example output:
{
"severity": "HIGH",
"category": "RF",
"root_cause": "High VSWR on Antenna-3G-SEC-07 indicates a physical connector or cable fault.",
"affected_services": [
"3G Sector 07"
],
"next_steps": [
"Dispatch field team to inspect antenna connector on SEC-07",
"Check jumper cable for water ingress or kinks",
"Verify torque specs on all RF connectors",
"Escalate to RAN engineering if VSWR persists after reseating"
]
}
Next steps
Wire the JSON output into a PagerDuty or Slack webhook so your NOC receives structured alerts with severity already ranked. If you later need to analyze hours of continuous syslog history in a single shot, switch the model to Kimi K2.6 on Oxlo.ai and take advantage of its 131K context window without worrying about per-token cost escalation.
Top comments (0)