DEV Community

Cover image for I Built an AI Email Agent: Here’s What Nobody Told Me
Kartik Hirijaganer
Kartik Hirijaganer

Posted on

I Built an AI Email Agent: Here’s What Nobody Told Me

“People hand you these single little messages that are no heavier than a river pebble. But it doesn’t take long until you have acquired a pile of pebbles that’s taller than you and heavier than you could ever hope to move.” — Merlin Mann

Multiple inboxes. Multiple accounts. A pile of notifications I kept meaning to unsubscribe from but never did.

By the time I actually checked my email each morning, I was already staring at 300+ unread messages, a mix of stuff that mattered, stuff that used to matter, and a graveyard of newsletters I’d forgotten I signed up for. It compounds fast.

Paid tools felt like overkill. Free ones didn’t fit how I actually worked. So I built it myself, and honestly, with AI tools in the mix, the barrier to actually shipping is lower than ever.

I spent a few days mapping out the problem: which features actually mattered and what the architecture should look like.

What Briefed actually does
Briefed is a personal AI email agent. Every morning, it reads your Gmail and hands you back a brief, not a notification, not a summary dump, a brief. Like an executive assistant wrote it for you.

It does four things:

Classifies. Every email gets sorted into one of four buckets: must-read, good-to-read, ignore, or waste. Against a classification rubric you own and can edit.

Summarizes. The must-read pile gets condensed. You read the gist, not the thread.

Clusters. Thirty newsletter rows become one digest entry. Your inbox stops looking like a fire hose.

Recommends. It scores your noisiest senders by volume, engagement rate, and wasted-email signals, then recommends which ones to unsubscribe from, with reasoning.

The constraint I never compromised on: it never acts without you. No auto-archiving. No auto-unsubscribing. Every destructive action requires explicit confirmation.

The thing nobody talks about with AI products
Here’s the truth about building production AI systems.

The LLM call is the easy part. This is the hard part:

  • 3 retries with exponential backoff and jitter, applied only to retryable errors
  • A circuit breaker that trips after 5 consecutive failures and fails fast instead of cascading
  • Per-model hard caps (100 Haiku calls/day, cost control is not optional)
  • A catalog-driven fallback chain: Gemini 2.5 Flash primary, Claude Haiku fallback, model swap without touching application code
  • Per-call cost and token logging so every LLM call is auditable

The stuff nobody sees until it breaks.

The security decision I’m most glad I made

Every user’s Gmail token and email content is encrypted at rest. But not just encrypted, envelope-encrypted, per row, with two customer-managed KMS keys.

The encryption context binds {user_id, table, row_id} to every operation. A leaked ciphertext can’t be replayed across rows or users.

This took extra days to implement correctly. It’s the kind of decision that feels over-engineered until the day you’re glad you made it.

Shipping for real
Briefed was designed and built solo. Backend, frontend, infrastructure, CI, and documentation. What surprised me wasn’t the complexity of any single piece; it was that every piece had to work together before any of it felt real.

It runs in production on AWS Lambda with SnapStart, behind CloudFront and AWS WAF, at roughly $8–11 a month, including two customer-managed KMS keys.

The stack:

  • Backend: FastAPI · Python 3.11 · Pydantic v2 · SQLAlchemy 2.0 async · Alembic
  • AI layer: OpenRouter → Gemini 2.5 Flash + Claude Haiku fallback
  • Frontend: React 18 · TypeScript · Vite
  • Data: Supabase Postgres with per-row envelope encryption
  • Infra: AWS Lambda · SQS · EventBridge Scheduler · CloudFront + WAF · Terraform

What I’d tell someone starting this today
Document your decisions before you regret them. Every time I revisited a choice, I asked: Why Lambda over Fargate? Why OpenRouter? Why two KMS keys instead of one? Without documenting them, I’d be reverse-engineering my own thinking three months later.

Chaos drills before launch, not after. I drilled DLQ replay, KMS key revocation, secret rotation, and circuit breaker trips before calling the project done. Each drill uncovered at least one gap.

Try it yourself
🔗 Live demo: https://d2vki955e8ckrc.cloudfront.net/
📦 Full source: https://github.com/Kartik-Hirijaganer/Briefed

What’s the gap between your LLM prototype and a production system?

Top comments (0)