How I auto-triage 200 emails a day with Aider and Nylas

#aiagents #productivity #ai #tutorial

My inbox averages 200 messages a workday. Half are noise. A quarter need a fast acknowledgement. The remainder need real work. The split is mostly stable, so the triage rules are mostly stable, so it is a good fit for an LLM.

I wired Aider to it. Aider is the AI pair-programming CLI — it has a shell, it can call commands, and it speaks Python natively. Pairing it with the Nylas CLI gives a triage pipeline that runs on my laptop in the background and surfaces only the messages I should personally read.

The triage rules

Three buckets:

Bucket	Action	What lands here
🔥 Action	Star, leave unread	Customer escalations, oncall pages, anything from my CEO
👀 Skim	Mark read, archive	Newsletters, build notifications, "FYI"
🗑 Drop	Mark read, archive, mark spam if confident	Cold sales, recruiter spam, marketing

A modest LLM gets these right >95% of the time. Aider drives it.

The script

# /opt/triage/triage.py
import json
import subprocess
import sys

def llm_classify(subject: str, snippet: str, sender: str) -> str:
    prompt = f"""Classify this email into one of: ACTION, SKIM, DROP.
ACTION: needs my response or attention soon.
SKIM: informational, can wait.
DROP: spam, recruiter, marketing, newsletter.

From: {sender}
Subject: {subject}
Snippet: {snippet}

Reply with only one word."""
    out = subprocess.run(
        ["aider", "--message", prompt, "--no-auto-commits", "--yes-always"],
        capture_output=True, text=True, timeout=30
    )
    label = out.stdout.strip().split()[-1].upper()
    return label if label in ("ACTION", "SKIM", "DROP") else "SKIM"

def main():
    raw = subprocess.check_output(
        ["nylas", "email", "list", "--unread", "--limit", "50", "--json"]
    )
    msgs = json.loads(raw)
    for m in msgs:
        sender = m["from"][0]["email"]
        bucket = llm_classify(m["subject"], m.get("snippet", ""), sender)
        if bucket == "ACTION":
            subprocess.run(["nylas", "email", "mark-starred", m["id"]])
        elif bucket == "SKIM":
            subprocess.run(["nylas", "email", "mark-read", m["id"]])
        elif bucket == "DROP":
            subprocess.run(["nylas", "email", "mark-read", m["id"]])
            subprocess.run(["nylas", "email", "delete", m["id"], "--yes"])
        print(f"{bucket}: {m['subject'][:60]}")

if __name__ == "__main__":
    main()

50-line python file. Nothing clever.

Run it

# Manual trigger
python /opt/triage/triage.py

# Every 5 minutes
crontab -e
# Add:
*/5 * * * * /usr/bin/python3 /opt/triage/triage.py >> /var/log/triage.log 2>&1

The LLM call takes ~2 seconds per message; on a 50-message batch that is roughly 100 seconds. Cron's */5 is plenty of breathing room.

Why Aider specifically

Three reasons:

It treats prompts as commands. aider --message '...' is a one-liner. No SDK to import, no auth ceremony.
It is local and fast. I am calling it 200 times a day. Browser-loop tools rule this out.
It is bring-your-own-key. I run it with Anthropic Sonnet for triage, switch to Opus when I want it to draft replies.

If you prefer the OpenAI o1 model or a local Llama, swap the aider line for llm "..." (Simon Willison's tool), or directly call the API. The pipeline is provider-agnostic.

Why the Nylas CLI specifically

The CLI gives me the same surface across Gmail, Outlook, Exchange, Yahoo, iCloud, and IMAP — without writing six different SDK integrations. Adding a second account is one nylas auth login command. The script does not change.

It also exposes --json on every list command. That makes it pipeable into Python's json.loads without parsing prose. No HTML-stripping, no MIME decoding, just structured data.

What I get out of it

After 60 days running this on my main inbox:

Time saved per workday: ~35 minutes (estimated by stopwatch on a sample week)
False positives (action emails wrongly archived): 4 in 60 days, all when subject lines were ambiguous ("Quick question" from a sender I had not seen before)
False negatives (drop emails left in inbox): too many to count, mostly recruiter LinkedIn forwards. I tightened the prompt twice and they faded.

The wins compound: less time triaging means I read action mail sooner, which means faster replies, which means fewer follow-ups.

What it does not do

Draft replies: I tried. The replies sounded like me from a distance, like me from up close they sounded like a chatbot. I removed it.
Schedule meetings: handled by calendar-schedule-ai which is a separate command and worth its own writeup.
Deal with attachments: passes through untouched. I read attachments manually.

A starter prompt

Classify this email into one of: ACTION, SKIM, DROP.

ACTION = something I personally need to do or reply to within 24 hours.
SKIM = useful but not urgent.
DROP = spam, sales outreach, recruiters, marketing, automated build/deploy notifications.

If unsure, prefer ACTION (false positive is cheap; false negative is expensive).

Sender: {sender}
Subject: {subject}
First 200 chars: {snippet}

That instruction (lower 95% confidence threshold for ACTION) is the most important tuning. Send too much to ACTION and you get noise; send too little and you miss escalations. The bias toward ACTION costs you 30 seconds of skimming. The other direction costs you a customer.