We are going to build a support ticket agent that reads a customer message and drafts a helpful reply. This is the simplest way to see what a large language model actually does in production, and it is a practical first project for any engineer learning LLMs by shipping code.
What you'll need
- Python 3.10 or newer.
- The OpenAI SDK. Install it with
pip install openai. - An Oxlo.ai API key from https://portal.oxlo.ai. The free tier includes 60 requests per day, which is plenty for this tutorial.
Step 1: Set up the Oxlo.ai client
I always verify the endpoint with a quick ping before adding any logic. This confirms your Oxlo.ai key and network path are working, and it shows the drop-in compatibility with the OpenAI SDK.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "Hello, are you there?"},
],
)
print(response.choices[0].message.content)
Step 2: Write the system prompt
The system prompt is the only persistent instruction the model receives on every call. Keep it short and specific so the agent stays on task and replies in the tone you want.
SYSTEM_PROMPT = """You are a tier-1 support agent for a SaaS analytics platform.
Your job is to read the customer's message, identify the issue type, and draft a concise, helpful reply.
If you do not know the answer, say so honestly and escalate to engineering.
Always be polite. Keep responses under 100 words."""
Step 3: Send a single request
Now we pass a realistic support ticket to see how the model reasons with a single turn. No memory yet, just a request and a response.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
user_message = (
"Hi, my dashboard won't load after the latest update. "
"I cleared my cache but the spinner just sits there. "
"Account email is alice@example.com. Help?"
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
print(response.choices[0].message.content)
Step 4: Add conversation memory
Real conversations have history. We append each exchange to a messages list so the model sees context on every subsequent call. This is the core mechanic that turns a stateless API into an agent.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
]
user_message = (
"Hi, my dashboard won't load after the latest update. "
"I cleared my cache but the spinner just sits there."
)
messages.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=messages,
)
assistant_reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_reply})
print(assistant_reply)
# Follow-up
messages.append({"role": "user", "content": "It still hangs and I see a 500 error in the console."})
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=messages,
)
print(response.choices[0].message.content)
Step 5: Package the agent
Wrapping the logic in a class keeps state tidy and makes the agent reusable across files. Here is the complete, runnable agent you can drop into any project.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
SYSTEM_PROMPT = """You are a tier-1 support agent for a SaaS analytics platform.
Your job is to read the customer's message, identify the issue type, and draft a concise, helpful reply.
If you do not know the answer, say so honestly and escalate to engineering.
Always be polite. Keep responses under 100 words."""
class SupportAgent:
def __init__(self):
self.messages = [{"role": "system", "content": SYSTEM_PROMPT}]
def reply(self, user_message):
self.messages.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=self.messages,
)
text = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": text})
return text
if __name__ == "__main__":
agent = SupportAgent()
print(agent.reply(
"My dashboard won't load after the latest update. I cleared my cache."
))
print("---")
print(agent.reply(
"It still hangs and I see a 500 error in the console."
))
Run it
Save the Step 5 script as support_agent.py and run it from your terminal. You should see two distinct replies that reference earlier context.
$ python support_agent.py
Thanks for reaching out. A spinning loader after an update often means a stuck CDN edge node. Please try a hard refresh with Ctrl+F5 and confirm if the issue persists. If it does, I will open a priority ticket for you right away.
---
A 500 error in the console confirms this is a server-side regression. I have escalated this to our engineering team and CC'd your account. You should receive a follow-up within 30 minutes. I appreciate your patience while we resolve this.
Next steps
Add function calling so the agent can query account status or create tickets through internal APIs. Oxlo.ai supports tool use on models like qwen-3-32b and kimi-k2.6, so you can give your agent real powers without switching providers.
When you are ready to deploy, Oxlo.ai's flat per-request pricing keeps costs predictable even as your system prompts and conversation histories grow. Long context does not inflate your bill the way token-based metering would. See https://oxlo.ai/pricing to compare plans.
Top comments (0)