DEV Community

Cover image for n8n daily email insight generator
Matheus D. Santos
Matheus D. Santos

Posted on

n8n daily email insight generator

n8n and Bright Challenge: Unstoppable Workflow

n8n Daily Email Insight Generator — Real-Time Triage & Bright Data Enrichment

Submission for the AI Agents Challenge powered by n8n and Bright Data
Owner: Matheus Santos — Version 0.2
Timeline: Aug 24, 2025 → Aug 31, 2025
Status: Completed (prototype)


Overview

I built an n8n workflow that automatically ingests new emails, classifies and prioritizes them with an LLM (Gemini), enriches results using Bright Data Marketplace datasets, and writes concise, actionable entries into a Google Sheet. The workflow runs on a schedule (three times/day) and produces an “email insights” sheet that teams can use for triage, tracking, and follow-up.


What I built

A production-style automation that:

  • pulls emails via IMAP,
  • cleans and normalizes message payloads,
  • classifies topic & priority with a chat model,
  • enriches classification with Bright Data Marketplace dataset lookups,
  • merges and finalizes the enriched output with an LLM pass,
  • and writes tasks, dataset evidence, and actions into Google Sheets (create/update/append).

Key requirement satisfied: the workflow uses n8n’s AI/Chat model nodes and Bright Data Verified Node(s) (both used as part of the enrichment & toolchain).


Live demo & workflow

If you want to try it quickly: import the workflow JSON into your n8n instance, configure IMAP, Bright Data, Gemini, and Google Sheets credentials, then run a test email.


Why this is useful

  • Saves time by turning inbox noise into action items and evidence.
  • Surface business-critical mail (investment, billing, client requests) with a consistent triage policy.
  • Provides provenance: Bright Data dataset matches are logged so you can verify sender/company context.
  • Easily extensible: add more enrichment sources, task integrations (Asana/Trello), or dashboards.

Architecture (high level)

[Email Provider/IMAP Trigger]
        ↓
[Email Cleaning Function]  (extract snippet, from, to, date, subject)
        ↓
[Chat Model: Classify (Gemini)] → (validate & clean)
        ↓
[Branch A: Prepare Bright Data queries]
        ↓
[Bright Data: List Datasets / Marketplace Lookups] → [Clean results]
        ↓
[Merge: classification + enrichment]
        ↓
[Chat Model: Final enrichment / summarize (Gemini) — may call Bright Data as a tool]
        ↓
[Final Cleaning function]
        ↓
[Chat Model → Google Sheets op generator]
        ↓
[Google Sheets: Tasks, DatasetMatches, EmailActions]
Enter fullscreen mode Exit fullscreen mode

Technical implementation

  1. IMAP Email Trigger — watches inbox for new messages.
  2. Function node — Email Content Cleaning — normalizes the raw payload to { content, From, To, subject, dates }.
  3. Chat model (Gemini) — Categorize — prompt classifies the email into one of the categories and assigns priority (High / Medium / Low). Output is strict JSON (so downstream parsing is reliable).
  4. Output cleaning — function removes fences/extra characters from model output.
  5. Decision & enrichment prep — an LLM checks whether the classification has enough signals; if not, it suggests Bright Data queries (domains, keywords).
  6. Bright Data — Marketplace (list datasets) — uses suggested queries to retrieve datasets to validate or enrich classification.
  7. Enrichment cleaning — function strips unnecessary fields (IDs) and normalizes dataset objects.
  8. Merge — combines classification JSON with dataset results.
  9. Final summarization (Gemini) — produces final summary, recommended actions, and Google Sheets tasks.
  10. Google Sheets operations — a final chat model node or function maps final JSON to append/update rows in Tasks, DatasetMatches, and EmailActions.

Example of what the workflow writes (example JSON)

{
  "category": "Investment",
  "priority": "High",
  "confidence": 0.92,
  "summary": "OLITEF 2025 registration closing soon — investment education program from Tesouro Direto; consider applying before deadline.",
  "reasons": ["sender domain: tesourodireto.com", "contains 'Inscrições' and deadline-like phrase"],
  "suggested_enrichments": [
    { "type": "domain_lookup", "query": "tesourodireto.com", "purpose": "verify official sender" },
    { "type": "web_search", "query": "OLITEF 2025 Tesouro Direto inscrições", "purpose": "confirm deadline" }
  ],
  "actions": [
    { "type":"label","detail":"Investment,High","urgency":"immediate" },
    { "type":"create_task","detail":"Review OLITEF registration page","urgency":"routine" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Bright Data usage (what I did)

  • Marketplace / list datasets — query for company profiles, news, and web snapshots that match inferred domains/keywords from emails.
  • Tool usage inside LLM — the LLM is allowed to request Bright Data as a verification tool for low-confidence or Investment-class emails.
  • Why it matters: Bright Data adds real-time web signals (company pages, public announcements) to reduce false positives and raise model confidence.

How to reproduce / run locally

  1. Clone repo and import /workflows/n8n Email categorize automation.json.
  2. Add credentials: IMAP, Gemini (or other LLM), Bright Data (verified node), Google Sheets, Slack (optional).
  3. Adjust Priority Rules (JSON/Set node) to include your high-priority senders & keywords.
  4. Run the workflow manually or enable schedule (every 8 hours → 3×/day) and test with sample messages.
  5. Export workflow JSON & Gist when submitting.

Challenges & learnings

  • Tuning prompts is critical — very large prompts or mixed content sometimes returned noisy output; I solved that by enforcing strict JSON outputs and adding cleaning nodes.
  • Bright Data Marketplace required trial & exploration — choosing the right dataset filters matters (domain-first approach helps).
  • Doing this challenge was incredible, it was my first time using n8n and bright data and I couldn't be more impressed with the power of these tools together. One of the challenges I faced was using the Chat Model nodes, sometimes I asked too much or send a big prompt, and they didn't return useful insight or output. Another challenge was learning how to use bright data, like I said I've used that, so I didn't know what to do at first. :) I am really happy with my project, of course there are things to improve, but it is a great start. I hope this project help someone else, feel free to reach out and talk about it. Not only that, but I love to learn, and that's a stunning topic to talk about!

Links & contact

  • Repo: repository
  • Contact: Matheus Santos — feel free to reach out for questions or collaboration!

Top comments (2)

Collapse
 
bernert profile image
BernerT

How do you manage privacy and cost when enriching emails with Bright Data and running multiple LLM passes? Next, consider a post on productionizing this—secrets management, retries/monitoring, and an Asana/Trello integration walkthrough.

Collapse
 
matheusdsantossi profile image
Matheus D. Santos

I tried to manage cost and privacy by minimizing what I send to external services (only small snippets and metadata), caching enrichment results, and routing only low-confidence or investment-related emails to the more expensive LLM + Bright Data enrichment. For rate limits I saw some options like using n8n patterns (loop + Wait node, backoff, and batching) and Bright Data’s spend/usage controls. In production, I plan to use secret management (n8n credentials + an external vault), centralized logging/alerts, and retries with exponential backoff. Thanks for the suggestion!