My inbox averages 200 messages a workday. Half are noise. A quarter need a fast acknowledgement. The remainder need real work. The split is mostly stable, so the triage rules are mostly stable, so it is a good fit for an LLM.
I wired Aider to it. Aider is the AI pair-programming CLI — it has a shell, it can call commands, and it speaks Python natively. Pairing it with the Nylas CLI gives a triage pipeline that runs on my laptop in the background and surfaces only the messages I should personally read.
The triage rules
Three buckets:
| Bucket | Action | What lands here |
|---|---|---|
| 🔥 Action | Star, leave unread | Customer escalations, oncall pages, anything from my CEO |
| 👀 Skim | Mark read, archive | Newsletters, build notifications, "FYI" |
| 🗑 Drop | Mark read, archive, mark spam if confident | Cold sales, recruiter spam, marketing |
A modest LLM gets these right >95% of the time. Aider drives it.
The script
# /opt/triage/triage.py
import json
import subprocess
import sys
def llm_classify(subject: str, snippet: str, sender: str) -> str:
prompt = f"""Classify this email into one of: ACTION, SKIM, DROP.
ACTION: needs my response or attention soon.
SKIM: informational, can wait.
DROP: spam, recruiter, marketing, newsletter.
From: {sender}
Subject: {subject}
Snippet: {snippet}
Reply with only one word."""
out = subprocess.run(
["aider", "--message", prompt, "--no-auto-commits", "--yes-always"],
capture_output=True, text=True, timeout=30
)
label = out.stdout.strip().split()[-1].upper()
return label if label in ("ACTION", "SKIM", "DROP") else "SKIM"
def main():
raw = subprocess.check_output(
["nylas", "email", "list", "--unread", "--limit", "50", "--json"]
)
msgs = json.loads(raw)
for m in msgs:
sender = m["from"][0]["email"]
bucket = llm_classify(m["subject"], m.get("snippet", ""), sender)
if bucket == "ACTION":
subprocess.run(["nylas", "email", "mark-starred", m["id"]])
elif bucket == "SKIM":
subprocess.run(["nylas", "email", "mark-read", m["id"]])
elif bucket == "DROP":
subprocess.run(["nylas", "email", "mark-read", m["id"]])
subprocess.run(["nylas", "email", "delete", m["id"], "--yes"])
print(f"{bucket}: {m['subject'][:60]}")
if __name__ == "__main__":
main()
50-line python file. Nothing clever.
Run it
# Manual trigger
python /opt/triage/triage.py
# Every 5 minutes
crontab -e
# Add:
*/5 * * * * /usr/bin/python3 /opt/triage/triage.py >> /var/log/triage.log 2>&1
The LLM call takes ~2 seconds per message; on a 50-message batch that is roughly 100 seconds. Cron's */5 is plenty of breathing room.
Why Aider specifically
Three reasons:
-
It treats prompts as commands.
aider --message '...'is a one-liner. No SDK to import, no auth ceremony. - It is local and fast. I am calling it 200 times a day. Browser-loop tools rule this out.
- It is bring-your-own-key. I run it with Anthropic Sonnet for triage, switch to Opus when I want it to draft replies.
If you prefer the OpenAI o1 model or a local Llama, swap the aider line for llm "..." (Simon Willison's tool), or directly call the API. The pipeline is provider-agnostic.
Why the Nylas CLI specifically
The CLI gives me the same surface across Gmail, Outlook, Exchange, Yahoo, iCloud, and IMAP — without writing six different SDK integrations. Adding a second account is one nylas auth login command. The script does not change.
It also exposes --json on every list command. That makes it pipeable into Python's json.loads without parsing prose. No HTML-stripping, no MIME decoding, just structured data.
What I get out of it
After 60 days running this on my main inbox:
- Time saved per workday: ~35 minutes (estimated by stopwatch on a sample week)
- False positives (action emails wrongly archived): 4 in 60 days, all when subject lines were ambiguous ("Quick question" from a sender I had not seen before)
- False negatives (drop emails left in inbox): too many to count, mostly recruiter LinkedIn forwards. I tightened the prompt twice and they faded.
The wins compound: less time triaging means I read action mail sooner, which means faster replies, which means fewer follow-ups.
What it does not do
- Draft replies: I tried. The replies sounded like me from a distance, like me from up close they sounded like a chatbot. I removed it.
- Schedule meetings: handled by calendar-schedule-ai which is a separate command and worth its own writeup.
- Deal with attachments: passes through untouched. I read attachments manually.
A starter prompt
Classify this email into one of: ACTION, SKIM, DROP.
ACTION = something I personally need to do or reply to within 24 hours.
SKIM = useful but not urgent.
DROP = spam, sales outreach, recruiters, marketing, automated build/deploy notifications.
If unsure, prefer ACTION (false positive is cheap; false negative is expensive).
Sender: {sender}
Subject: {subject}
First 200 chars: {snippet}
That instruction (lower 95% confidence threshold for ACTION) is the most important tuning. Send too much to ACTION and you get noise; send too little and you miss escalations. The bias toward ACTION costs you 30 seconds of skimming. The other direction costs you a customer.
Next steps
- Build an AI email triage agent in Python — full reference implementation
- Build an LLM agent with email and calendar tools — broader agent surface
- Why AI agents need email — the case for agent inboxes
- Full command reference
Top comments (0)