DEV Community

Cover image for How I auto-triage 200 emails a day with Aider and Nylas
Qasim Muhammad
Qasim Muhammad

Posted on

How I auto-triage 200 emails a day with Aider and Nylas

My inbox averages 200 messages a workday. Half are noise. A quarter need a fast acknowledgement. The remainder need real work. The split is mostly stable, so the triage rules are mostly stable, so it is a good fit for an LLM.

I wired Aider to it. Aider is the AI pair-programming CLI — it has a shell, it can call commands, and it speaks Python natively. Pairing it with the Nylas CLI gives a triage pipeline that runs on my laptop in the background and surfaces only the messages I should personally read.

The triage rules

Three buckets:

Bucket Action What lands here
🔥 Action Star, leave unread Customer escalations, oncall pages, anything from my CEO
👀 Skim Mark read, archive Newsletters, build notifications, "FYI"
🗑 Drop Mark read, archive, mark spam if confident Cold sales, recruiter spam, marketing

A modest LLM gets these right >95% of the time. Aider drives it.

The script

# /opt/triage/triage.py
import json
import subprocess
import sys

def llm_classify(subject: str, snippet: str, sender: str) -> str:
    prompt = f"""Classify this email into one of: ACTION, SKIM, DROP.
ACTION: needs my response or attention soon.
SKIM: informational, can wait.
DROP: spam, recruiter, marketing, newsletter.

From: {sender}
Subject: {subject}
Snippet: {snippet}

Reply with only one word."""
    out = subprocess.run(
        ["aider", "--message", prompt, "--no-auto-commits", "--yes-always"],
        capture_output=True, text=True, timeout=30
    )
    label = out.stdout.strip().split()[-1].upper()
    return label if label in ("ACTION", "SKIM", "DROP") else "SKIM"

def main():
    raw = subprocess.check_output(
        ["nylas", "email", "list", "--unread", "--limit", "50", "--json"]
    )
    msgs = json.loads(raw)
    for m in msgs:
        sender = m["from"][0]["email"]
        bucket = llm_classify(m["subject"], m.get("snippet", ""), sender)
        if bucket == "ACTION":
            subprocess.run(["nylas", "email", "mark-starred", m["id"]])
        elif bucket == "SKIM":
            subprocess.run(["nylas", "email", "mark-read", m["id"]])
        elif bucket == "DROP":
            subprocess.run(["nylas", "email", "mark-read", m["id"]])
            subprocess.run(["nylas", "email", "delete", m["id"], "--yes"])
        print(f"{bucket}: {m['subject'][:60]}")

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

50-line python file. Nothing clever.

Run it

# Manual trigger
python /opt/triage/triage.py

# Every 5 minutes
crontab -e
# Add:
*/5 * * * * /usr/bin/python3 /opt/triage/triage.py >> /var/log/triage.log 2>&1
Enter fullscreen mode Exit fullscreen mode

The LLM call takes ~2 seconds per message; on a 50-message batch that is roughly 100 seconds. Cron's */5 is plenty of breathing room.

Why Aider specifically

Three reasons:

  1. It treats prompts as commands. aider --message '...' is a one-liner. No SDK to import, no auth ceremony.
  2. It is local and fast. I am calling it 200 times a day. Browser-loop tools rule this out.
  3. It is bring-your-own-key. I run it with Anthropic Sonnet for triage, switch to Opus when I want it to draft replies.

If you prefer the OpenAI o1 model or a local Llama, swap the aider line for llm "..." (Simon Willison's tool), or directly call the API. The pipeline is provider-agnostic.

Why the Nylas CLI specifically

The CLI gives me the same surface across Gmail, Outlook, Exchange, Yahoo, iCloud, and IMAP — without writing six different SDK integrations. Adding a second account is one nylas auth login command. The script does not change.

It also exposes --json on every list command. That makes it pipeable into Python's json.loads without parsing prose. No HTML-stripping, no MIME decoding, just structured data.

What I get out of it

After 60 days running this on my main inbox:

  • Time saved per workday: ~35 minutes (estimated by stopwatch on a sample week)
  • False positives (action emails wrongly archived): 4 in 60 days, all when subject lines were ambiguous ("Quick question" from a sender I had not seen before)
  • False negatives (drop emails left in inbox): too many to count, mostly recruiter LinkedIn forwards. I tightened the prompt twice and they faded.

The wins compound: less time triaging means I read action mail sooner, which means faster replies, which means fewer follow-ups.

What it does not do

  • Draft replies: I tried. The replies sounded like me from a distance, like me from up close they sounded like a chatbot. I removed it.
  • Schedule meetings: handled by calendar-schedule-ai which is a separate command and worth its own writeup.
  • Deal with attachments: passes through untouched. I read attachments manually.

A starter prompt

Classify this email into one of: ACTION, SKIM, DROP.

ACTION = something I personally need to do or reply to within 24 hours.
SKIM = useful but not urgent.
DROP = spam, sales outreach, recruiters, marketing, automated build/deploy notifications.

If unsure, prefer ACTION (false positive is cheap; false negative is expensive).

Sender: {sender}
Subject: {subject}
First 200 chars: {snippet}
Enter fullscreen mode Exit fullscreen mode

That instruction (lower 95% confidence threshold for ACTION) is the most important tuning. Send too much to ACTION and you get noise; send too little and you miss escalations. The bias toward ACTION costs you 30 seconds of skimming. The other direction costs you a customer.

Next steps

Top comments (0)