Ranjith Sagar

Posted on Apr 14

Support sage agent

#ai #python #support #llm

Meet Support Sage: The AI Agent That Remembers Every Ticket Your Team Ever Closed

It's 2:47 PM on a Tuesday. A new support engineer — three weeks into the job — pings the team Slack channel: "Hey, has anyone seen this Zapier integration error before? Customer says it stopped syncing overnight."

Two people react with the 👀 emoji. One senior engineer thinks, "We fixed this exact thing in January." But nobody can find the ticket. It was closed, marked resolved, and its resolution lived exclusively in the memory of someone who left the company in March. The new hire spends two hours triaging a problem that had already been solved. The customer waits. The wheel reinvents itself.

This is the support knowledge problem — and it's not a tooling problem, it's a memory problem. Ticketing systems are graveyards of institutional knowledge. Wikis go stale. Runbooks get skipped. And every time someone experienced leaves, they take a piece of your resolution history with them.

That's what I built Support Sage to solve.

What Support Sage Does

Support Sage is an AI support agent that remembers. When a new ticket arrives, it doesn't just pattern-match against static documentation — it recalls actual resolved tickets your team has closed, identifies which past resolutions are most relevant, and generates two things: a polished, ready-to-send customer reply and a concise set of internal resolution steps for whoever picks up the ticket.

The "sage" name is intentional. A sage is a wise advisor — but it's also an herb. And like the herb, Sage grows with use. Every ticket your team resolves and stores makes the next response sharper.

Architecture

The system has four moving parts:

[Vanilla HTML Frontend]
        |
        v
[FastAPI Backend]  ──── retain endpoint ────>  [Hindsight Cloud]
        |                                              |
        v                                              |
[Groq API (llama3-70b)]  <── recalled context ────────┘
        |
        v
[Customer Reply + Internal Steps]

When a support engineer closes a ticket, they POST the resolution to /retain. That resolution is embedded and stored in Hindsight, an open-source agent memory layer built on top of Vectorize. When a new ticket arrives at /resolve, Sage queries Hindsight for the most semantically similar past resolutions, injects them into a prompt, and sends the enriched context to Groq's llama3-70b for generation.

The entire backend is about 180 lines of Python. The intelligence isn't in the code — it's in the memory layer.

The Core Pattern: Retain and Recall

Agent memory is what separates a stateless LLM call from an agent that compounds value over time. Hindsight gives you that memory as a managed service, with a clean API that abstracts away vector storage, embedding, and retrieval. Here's how the retain/recall pattern looks in practice.

Retaining a Resolved Ticket

When a ticket is closed, the support engineer submits the issue description, root cause, and resolution steps. The /retain endpoint packages this and stores it:

@app.post("/retain")
async def retain_ticket(ticket: ResolvedTicket):
    """Store a resolved ticket in Hindsight memory."""
    memory_text = f"""
ISSUE: {ticket.issue_description}
ROOT CAUSE: {ticket.root_cause}
RESOLUTION: {ticket.resolution_steps}
TAGS: {', '.join(ticket.tags)}
"""
    payload = {
        "pipeline_id": HINDSIGHT_PIPELINE_ID,
        "document": {
            "content": memory_text,
            "metadata": {
                "ticket_id": ticket.ticket_id,
                "resolved_at": ticket.resolved_at,
                "tags": ticket.tags
            }
        }
    }
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{HINDSIGHT_BASE_URL}/retain",
            json=payload,
            headers={"Authorization": f"Bearer {HINDSIGHT_API_KEY}"}
        )
    return {"status": "stored", "ticket_id": ticket.ticket_id}

Recalling Relevant Context

When a new ticket arrives, Sage queries for the top semantically similar past resolutions before calling the LLM:

async def recall_similar_tickets(issue_description: str, top_k: int = 3) -> list[str]:
    """Retrieve the most relevant resolved tickets from Hindsight."""
    payload = {
        "pipeline_id": HINDSIGHT_PIPELINE_ID,
        "query": issue_description,
        "top_k": top_k
    }
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{HINDSIGHT_BASE_URL}/recall",
            json=payload,
            headers={"Authorization": f"Bearer {HINDSIGHT_API_KEY}"}
        )
    results = response.json().get("results", [])
    return [r["content"] for r in results]

Generating the Response

The recalled context is injected into the Groq prompt as institutional memory:

@app.post("/resolve")
async def resolve_ticket(ticket: NewTicket):
    recalled = await recall_similar_tickets(ticket.issue_description)

    memory_context = "\n\n".join(recalled) if recalled else "No similar tickets found."

    prompt = f"""You are Support Sage, an expert support agent with access to
your team's historical resolution database.

PAST RESOLVED TICKETS (most relevant):
{memory_context}

NEW TICKET:
{ticket.issue_description}

Generate:
1. CUSTOMER REPLY: A professional, empathetic message to send directly to the customer.
2. INTERNAL STEPS: Concise resolution steps for the engineer picking this up.

Base your response on the historical patterns above where applicable."""

    response = groq_client.chat.completions.create(
        model="llama3-70b-8192",
        messages=[{"role": "user", "content": prompt}]
    )
    return parse_sage_response(response.choices[0].message.content)

The full Hindsight docs cover additional configuration options — including pipeline setup, embedding model selection, and metadata filtering for multi-tenant deployments.

Before and After: The OAuth Token Moment

This is where the rubber meets the road. Here's what Sage returns on a fresh deployment with zero stored tickets:

New Ticket: "Our Zapier integration stopped syncing data overnight. No error messages in Zapier, but records aren't flowing through."

Sage (no memory):

"It sounds like there may be a connectivity issue with your Zapier integration. I'd recommend trying the following: 1) Check that your API credentials are still valid, 2) Try disconnecting and reconnecting the integration, 3) Verify that your Zapier zap is still enabled. If the issue persists, please contact Zapier support."

Generic. Useless. The kind of reply that makes customers feel unseen.

Now here's Sage after five resolved tickets have been retained — including one from January where the root cause was: "Zapier OAuth tokens expire after 90 days; customer had connected the integration in October. Resolution: navigate to Settings > Integrations > Zapier > Reconnect, then re-authenticate via OAuth. Token silently expires without error notification."

Sage (with memory):

Customer Reply:

"Thanks for reaching out! Based on the symptoms you're describing — especially the silent failure with no error messages — this looks like an OAuth token expiry issue. Our Zapier integration tokens expire after 90 days, and unfortunately Zapier doesn't surface an error when this happens, so it can appear to stop working without warning. To fix this: navigate to Settings → Integrations → Zapier → Reconnect and complete the OAuth re-authentication flow. Your zaps should resume syncing within a few minutes. Let us know if you run into anything!"

Internal Steps:

Confirm token age: check integration connected_at timestamp in admin panel

Direct customer to Settings → Integrations → Zapier → Reconnect

After reconnect, verify sync in Zapier task history (allow 5 min)

If token was < 90 days old, escalate — may indicate a revocation issue

The difference is not prompt engineering. It's memory. Sage recalled the exact issue, the exact mechanism (90-day expiry), and the exact path to resolution. In seconds. Without paging anyone.

Lessons Learned

1. The memory layer earns its weight immediately. I initially considered rolling my own retrieval with a local vector store, but Hindsight's managed pipeline — including chunking, embedding, and similarity search — saved several days of infrastructure work. For a support agent specifically, where retrieval precision matters more than latency, having a dedicated memory service rather than bolting on vector search as an afterthought made the architecture cleaner and the results noticeably better.

2. Resolution quality at write-time determines everything at read-time. The biggest performance variable isn't the LLM — it's how well engineers document resolutions when they close tickets. A resolution that says "fixed the config issue" gives Sage nothing to work with. One that says "updated the SMTP relay port from 465 to 587 after confirming TLS was required by the customer's mail provider" gives it everything. I added a structured resolution form — issue description, root cause, exact steps, tags — specifically to enforce quality at the input stage.

3. Groq's speed changes the UX equation. Running llama3-70b through Groq means end-to-end response time (recall + generation) typically lands under two seconds. This matters because the frontend isn't a background job — it's a synchronous assistant that a support engineer is actively waiting on. Slow generation would break the flow. Fast generation makes it feel like asking a colleague.

4. Semantic recall surfaces non-obvious connections. One thing that surprised me: Hindsight occasionally surfaces tickets that don't look obviously related by keywords but are semantically close. A ticket about "dashboard not loading after password change" recalled a previous resolution about "session tokens invalidated after SSO configuration update." Different symptoms, same root mechanic. A keyword search would have missed it entirely.

The Honest Limitation

Sage is only as wise as the resolutions your team writes.

If your engineers close tickets with one-line notes, Sage will generate one-line advice. If your historical resolutions are vague, incomplete, or flat-out wrong, Sage will confidently reproduce that vagueness with perfect grammar. The model doesn't know what it doesn't know — it just retrieves and synthesizes.

This is the garbage in, garbage out problem applied to institutional memory, and it's worth being direct about. Sage amplifies whatever resolution culture your team already has. If that culture is weak, the first thing to fix isn't the AI — it's the closing hygiene. A structured resolution template, a peer review step on complex tickets, even a weekly "resolution quality" spot-check — these compound over time into a memory store that actually earns the name sage.

The system is also bounded by the domain of what's been seen before. Novel failure modes — new integrations, infrastructure changes, edge-case bugs — will fall back to generic LLM reasoning until similar tickets accumulate. That's expected behavior, not a bug. Memory-augmented agents are a complement to expertise, not a replacement for it.

Support Sage is a small system that solves a real problem: institutional knowledge that walks out the door every time someone does. The retain/recall pattern is deceptively simple, but the compounding value is real. Every ticket your team resolves and stores is a piece of expertise that never leaves — and the next time a new hire hits that 2:47 PM wall, Sage already knows the answer.

The herb grows every time you water it.

DEV Community