How I Built an AI Email Triage Agent That Creates Linear Issues and Fires Slack Alerts

#python #ai #automation #fastapi

How I Built an AI Email Triage Agent That Creates Linear Issues and Fires Slack Alerts

Most engineering teams I talk to have the same problem: support emails pile up, someone manually reads them, decides what kind of issue it is, guesses at priority, creates a Linear ticket, and then pings the right person on Slack. That whole chain takes 5-15 minutes per email and happens inconsistently across team members.

The obvious answer sounds like "just automate it." But after building this system, I'd reframe the actual problem: triage is a synchronization problem, not just an automation problem. Engineering teams using both GitHub and Linear need the judgment calls - priority, story points, team assignment - to happen at the moment the email arrives, not hours later when a human finally gets to it. If a production outage report hits your inbox at 2am, the ticket should already exist with P0 priority and the on-call engineer assigned before anyone opens their laptop.

This article walks through how I built exactly that with pydantic-ai, FastAPI, Gmail IMAP, Linear API, and Slack webhooks.

The Problem: Manual Triage Doesn't Scale

Here is what the broken version looks like in practice.

An inbound email arrives: "Hey, users in the EU region can't log in, getting 503 errors, this started about 20 minutes ago." Someone on your team reads it, decides it's a bug of type auth, marks it P0, creates a Linear issue, and posts in #incidents on Slack. That whole process works fine with two engineers and 10 emails a day.

At 40 emails a day, it starts breaking. Things get missed. A critical auth failure sits in an inbox for 45 minutes because the person who usually triages it is in a meeting. Another email gets classified as P2 when it should be P0 because the triager didn't read closely enough. A third gets a Linear ticket created, but no Slack alert fires because the person forgot that step.

The deeper problem: your team's triage logic exists entirely in people's heads. There is no consistent definition of what makes something P0 vs P1. Different engineers make different calls. New hires make worse calls. And none of this is auditable.

What you actually want: a system that reads the email, applies your team's specific classification logic, creates the Linear issue with correct metadata, and fires Slack alerts for anything critical - all within seconds of the email arriving, at any hour.

The Approach: pydantic-ai + FastAPI for Structured Judgment

The key architectural insight here is that LLM outputs need to be trustworthy enough to trigger side effects. If your agent classifies an email as P0 and your code blindly creates a Linear issue and pages an on-call engineer, you need that classification to be a proper typed object, not a string you parse with regex.

This is exactly where pydantic-ai earns its place. Unlike plain OpenAI API calls where you prompt-engineer your way to JSON and hope it validates, pydantic-ai lets you define the output schema as a Pydantic model and the library enforces it. The agent either returns a valid TicketClassification object or raises an exception you can handle. No silent failures where priority comes back as "high" instead of "P1".

Why not LangChain? I've used it. The abstraction layer is heavy, debugging is painful, and structured output handling requires enough boilerplate that you end up writing similar amounts of code with less visibility into what's happening. pydantic-ai is thinner and more explicit - closer to writing a typed Python function that happens to call an LLM.

Why FastAPI over a simple script? Because you want this running as a service. Gmail IMAP polling runs on a background task. The Linear and Slack integrations are async HTTP calls. FastAPI gives you a health endpoint, request logging, and a clean place to hang background tasks with lifespan context managers. It also makes the system testable - you can POST a fake email payload to your /triage endpoint in tests without touching Gmail at all.

The design decision that makes this reliable: the classification and the side effects are separated. The agent produces a TicketClassification. Then separate, deterministic functions consume that object to create the Linear issue and fire the Slack alert. The LLM never touches the API clients directly. If Linear is down, your classification still works. If the classification fails validation, nothing gets created.

The Central Code Pattern

Here is the core of the system - the classification agent and the downstream dispatch:

from pydantic import BaseModel
from pydantic_ai import Agent
from enum import Enum

class Priority(str, Enum):
    P0 = "P0"
    P1 = "P1"
    P2 = "P2"
    P3 = "P3"

class IssueType(str, Enum):
    BUG = "bug"
    FEATURE = "feature"
    QUESTION = "question"
    INCIDENT = "incident"

class TicketClassification(BaseModel):
    issue_type: IssueType
    priority: Priority
    title: str
    summary: str
    suggested_team: str
    story_points: int
    requires_immediate_alert: bool

# The agent - structured output enforced by pydantic-ai
triage_agent = Agent(
    "openai:gpt-4o",
    result_type=TicketClassification,
    system_prompt="""
    You are an engineering triage agent. Classify inbound support emails.
    P0: production down, data loss, security breach affecting users.
    P1: major feature broken, significant user impact, no workaround.
    P2: partial functionality broken, workaround exists.
    P3: minor issue, cosmetic, low user impact.
    story_points: 1-8 based on estimated fix complexity.
    requires_immediate_alert: true only for P0 and P1 incidents.
    """
)

async def process_email(raw_email: str) -> None:
    # Classification - LLM call with validated output
    result = await triage_agent.run(raw_email)
    classification = result.data  # This is a real TicketClassification object

    # Deterministic side effects - no LLM involved past this point
    linear_issue = await create_linear_issue(classification)

    if classification.requires_immediate_alert:
        await fire_slack_alert(classification, linear_issue.url)

A few things worth noting here.

result_type=TicketClassification is doing the heavy lifting. pydantic-ai will retry the LLM call with validation feedback if the output doesn't conform to the schema. You get a real typed object back, not a dict.

The requires_immediate_alert boolean is intentional. You could derive this from priority in code (if classification.priority in [Priority.P0, Priority.P1]), but having the LLM make this call explicitly means it can account for context. A P2 email that mentions "this is affecting 10,000 users right now" might warrant an alert that pure priority logic would miss.

story_points gives Linear a starting estimate. It won't always be right, but having something there is better than creating every issue with no estimate.

Integration: Gmail to Linear to Slack

The data flow runs like this. A background task polls Gmail via IMAP every 60 seconds, fetching unread emails from a designated support address. Each email gets decoded from MIME format, stripped to plain text, and pushed to the classification pipeline. After classification, the Linear issue gets created via Linear's GraphQL API with the full metadata. If requires_immediate_alert is true, a Slack webhook posts to your #incidents or #support channel with the issue link, priority, and summary.

The Linear integration uses their GraphQL API directly rather than an SDK - the SDK adds little value here and the GraphQL call is straightforward:

async def create_linear_issue(c: TicketClassification) -> LinearIssue:
    mutation = """
    mutation CreateIssue($input: IssueCreateInput!) {
      issueCreate(input: $input) {
        issue { id url title }
      }
    }
    """
    variables = {
        "input": {
            "title": c.title,
            "description": c.summary,
            "priority": PRIORITY_MAP[c.priority],
            "estimate": c.story_points,
            "teamId": TEAM_ID_MAP[c.suggested_team],
        }
    }
    # ... execute mutation

Gotcha worth knowing: Gmail IMAP with UNSEEN search will re-fetch emails if your process restarts before you mark them as read. You need to mark emails as SEEN immediately after fetching, before classification, not after. Otherwise a process crash between fetch and classification means you'll process the same email twice and create duplicate Linear issues.

Tradeoffs and Limitations

This system has real limitations you should know going in.

The classification quality depends heavily on your system prompt. GPT-4o is good, but it doesn't know your product's specific failure modes. A generic prompt will produce generic classifications. You will need to iterate on the prompt with real emails from your inbox before this is actually useful.

IMAP polling has latency. Sixty-second poll intervals mean a P0 incident email might sit for up to a minute before it creates a ticket. For most teams this is fine. For true real-time needs, you'd want Gmail push notifications via Pub/Sub instead.

This approach is probably overkill if your team gets fewer than 20 support emails per day. A simple Zapier workflow with a pre-defined category mapping would give you 80% of the value with zero maintenance. Use this when you have volume, varied email content, and need nuanced classification that rule-based systems get wrong.

Cost is real but small. At ~1000 tokens per email with GPT-4o, 100 emails/day runs about $2-3/day. Not a blocker, but worth accounting for.

Try It Yourself

I packaged this as an open-source template on GitHub: https://github.com/Reactance0083/pydantic-ai-email-linear-auto-triage

The scaffold gives you the agent structure, FastAPI setup, and integration patterns to get started. The full production version with tests, error handling, retry logic, duplicate detection, and complete docs is available here: https://reactance0083.gumroad.com/l/dcror

If you're running a similar triage setup or have tried other approaches (Zapier, custom scripts, other agent frameworks), I'd genuinely like to hear how it's working. Drop a comment with what broke first - that's usually the most useful part of any production story.