How I Built an Email Auto-Triage System with pydantic-ai, FastAPI, and Linear
Support email is a graveyard of good intentions. Every team I've worked with has some version of the same problem: a shared inbox accumulates emails, someone manually reads them, decides it's a bug or a billing question, copies the text into a Linear ticket, assigns a priority based on gut feel, and maybe pings Slack if it seems urgent. This process takes 5-10 minutes per email on a good day, and it scales terribly.
This article walks through the architecture and key code patterns for an automated triage pipeline that handles the full loop: classify incoming emails, create structured Linear issues, and fire Slack alerts for anything critical, all without a human in the loop.
The Problem: Manual Triage Doesn't Scale
Here's the concrete scenario that motivated this build.
A small SaaS team receives 80-150 support emails per day. Three categories consistently matter: bugs (customer-reported crashes or broken features), billing issues (failed charges, incorrect invoices), and feature requests (nice-to-haves that need product review). Everything else is general inquiry or noise.
Without automation, what happens is this: emails pile up overnight. The first engineer on in the morning spends 45 minutes triaging before writing a single line of code. A P0 bug report from a paying customer that arrived at 2 AM sits unread until 9 AM. Billing issues that should route to a different Slack channel get lost in the engineering queue. Feature requests never make it into the backlog because nobody wants to do the copy-paste work.
The real cost isn't the minutes per email. It's the decisions made inconsistently, the critical tickets that sit too long, and the cognitive load that comes with context-switching into support mode at the start of every day. Manual triage is a process that looks manageable until you actually measure it.
The Architecture: pydantic-ai + FastAPI as the Spine
The core insight here is that email triage is a structured extraction problem, not a generative one. You're not asking an LLM to write anything creative. You're asking it to read text and fill out a form with specific fields: category, priority, summary, suggested assignee. That's exactly what pydantic-ai is designed for.
Why pydantic-ai over LangChain or plain OpenAI requests?
LangChain adds a lot of abstraction for problems that don't need it. Output parsers in LangChain feel bolted on. Plain OpenAI API calls require you to write JSON schema definitions manually and then validate the output yourself, which inevitably means writing brittle string parsing.
pydantic-ai lets you define a Pydantic model as your expected output, and the library handles the prompting strategy and validation loop. If the LLM returns something malformed, pydantic-ai retries with the validation error included in context. In practice, this means you get typed, validated objects back from every agent call rather than dictionaries you hope have the right keys.
FastAPI wraps the whole thing as a webhook endpoint. Gmail sends events via IMAP polling (or you can swap in a push webhook), the FastAPI handler processes the email through the agent, and then fires the Linear and Slack API calls. This keeps the pipeline stateless and easy to deploy.
The key design decision: each email gets one agent call that returns a fully structured triage object. There's no chain of calls, no memory, no conversation state. This makes the system predictable, cheap to run, and easy to debug. A single email costs roughly 300-500 input tokens, which at current GPT-4o-mini pricing is fractions of a cent.
The Central Code Pattern: Structured Triage with pydantic-ai
Here's the core of the system, simplified but real:
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from enum import Enum
from typing import Optional
class TicketCategory(str, Enum):
BUG = "bug"
BILLING = "billing"
FEATURE_REQUEST = "feature_request"
GENERAL = "general"
class TicketPriority(str, Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
class TriageResult(BaseModel):
category: TicketCategory
priority: TicketPriority
summary: str = Field(
description="One sentence summary of the issue, max 100 characters"
)
customer_sentiment: str = Field(
description="Brief assessment: frustrated, neutral, or positive"
)
suggested_team: str = Field(
description="Which team should own this: engineering, billing, or product"
)
needs_immediate_slack_alert: bool = Field(
description="True only if CRITICAL priority or customer mentions churn/legal"
)
TRIAGE_AGENT = Agent(
model="openai:gpt-4o-mini",
result_type=TriageResult,
system_prompt="""
You are a support triage specialist. Analyze incoming support emails and
classify them accurately. Be conservative with CRITICAL priority - only
use it for active outages, data loss, or customers threatening to cancel.
Billing issues are almost always HIGH, not CRITICAL, unless the customer
reports fraudulent charges.
""",
)
async def triage_email(subject: str, body: str, sender: str) -> TriageResult:
email_content = f"""
From: {sender}
Subject: {subject}
Body:
{body[:2000]} # truncate to keep tokens predictable
"""
result = await TRIAGE_AGENT.run(email_content)
return result.data
A few things worth explaining here:
The Field(description=...) on each model field is not just documentation. pydantic-ai passes these descriptions into the schema that guides the LLM's output. This is how you constrain the model's behavior without writing verbose few-shot examples. The description on needs_immediate_slack_alert embeds your business logic directly into the type definition.
Body truncation at 2000 characters is deliberate. Support emails are either short (the important signal is in the first paragraph) or extremely long (forwarded threads, attached logs in pasted text). Truncating keeps costs predictable and prevents occasional emails from burning through your token budget.
The system_prompt includes explicit guidance about when NOT to use CRITICAL. Without this, LLMs tend to over-escalate because they have no sense of what your alert fatigue threshold is.
Integration: Gmail to Linear to Slack
The data flow works like this:
- A FastAPI background task polls Gmail via IMAP every 60 seconds, fetching unread emails from the support inbox.
- Each email runs through
triage_email()and returns aTriageResult. - The result maps to a Linear issue via the Linear GraphQL API. Category becomes the label, priority maps to Linear's 1-4 scale, and the summary becomes the issue title.
- If
needs_immediate_slack_alertis true, the pipeline posts to a#critical-supportSlack channel with the sender, summary, and a direct link to the newly created Linear issue.
async def process_email(email: ParsedEmail):
triage = await triage_email(email.subject, email.body, email.sender)
linear_issue = await create_linear_issue(
title=triage.summary,
description=email.body,
priority=PRIORITY_MAP[triage.priority],
label=triage.category.value,
team=triage.suggested_team,
)
if triage.needs_immediate_slack_alert:
await post_slack_alert(
channel="#critical-support",
message=f"*Critical ticket created*\nFrom: {email.sender}\n"
f"Issue: {triage.summary}\nLinear: {linear_issue.url}",
)
The gotcha worth knowing: Linear's GraphQL API requires you to fetch team IDs and label IDs before you can create issues. These IDs are workspace-specific and not human-readable. The production version caches these at startup rather than fetching them on every email, which matters when you're processing a burst of 20 emails after an incident.
Tradeoffs and Limitations
This approach works well for teams with relatively consistent email volume and well-defined categories. It does not handle a few things cleanly:
Thread context is lost. Each email is processed independently. If a customer replies to an existing thread, the system will create a duplicate Linear issue rather than appending to the existing one. You need email threading logic (matching by subject or Message-ID header) to solve this, which adds meaningful complexity.
LLM classification has a tail of errors. On roughly 3-5% of emails in testing, the category is wrong. Ambiguous emails ("Your tool deleted all my data but I also want to request a refund and ask about your enterprise plan") get assigned to whichever category the model prioritizes. You still want a human review queue for anything below HIGH priority.
IMAP polling is not ideal for high volume. If you're processing thousands of emails per day, you'll want to switch to Gmail's Pub/Sub push notifications or a proper email processing service. Polling every 60 seconds is fine for most support inboxes.
For very low email volume, this is probably over-engineered. A simple filter rule plus a Zapier workflow might be the right call.
Closing
This pipeline eliminated the morning triage ritual for the team that tested it. Engineers stopped starting their days by reading email. Critical tickets started landing in Slack within two minutes of arrival rather than hours later.
I packaged this as an open-source template you can deploy in an afternoon:
GitHub scaffold: https://github.com/Reactance0083/pydantic-ai-email-linear-auto-triage
The scaffold gives you the core architecture. The full production version with proper error handling, retry logic, email thread deduplication, test suite, and deployment config is available here:
Full production code: https://reactance0083.gumroad.com/l/dcror
If you've built something similar or run into different edge cases with LLM-based classification in production, I'd genuinely like to hear about it in the comments. Particularly curious whether anyone has solved the thread-matching problem cleanly.
Top comments (0)