Srija Vuppala

Posted on Mar 29

I Built an AI Agent That Monitors the Internet for Buying Signals and Sends Personalized Emails — Fully Automated

#agents #ai #automation #showdev

From "company just raised funding" to "personalized cold email sent" — zero manual steps.

If you've ever worked in sales or growth, you know the grind: spend hours on Google looking for companies that might be buying, find a contact, guess their email, write a cold email, follow up three times, repeat. It's 70% research and 30% actual selling.

I wanted to eliminate the 70%.

This is my second AI agent project. My first was a standalone outreach agent. This one goes further — it combines a full intent-based lead generation pipeline with a personalized outreach automation system, and wraps everything in a Discord bot so you can manage campaigns without touching a terminal.

Here's exactly how it works, what I built, and what I learned.

What It Does (The Big Picture)

The agent runs a 6-stage pipeline:

Monitor the internet for PropTech buying signals
Classify each signal using an LLM and score purchase intent
Find the right decision-maker at the company
Discover their email address
Generate a personalized 4-email outreach sequence
Send and manage the campaign via a Discord bot

Everything writes to Excel, Google Sheets, and SQLite in real-time. A daily scheduler runs the whole thing at 9AM automatically.

Stage 1 — Signal Monitoring

The agent monitors three types of sources using RSS feeds parsed with feedparser:

Google News RSS — PropTech and real estate keyword sets, catching funding rounds, digital transformation announcements, app launches, CRM adoption stories globally
PropTech Industry Publications — Inman, Propmodo, HousingWire — high-quality signals from authoritative sources
Real Estate News Sites — CRE Herald, Financial Post Real Estate — broader market signals including India-specific coverage

Time-windowed collection is one of the more important design decisions here. On the first run, it collects articles from the last 14 days to build an initial base. Every subsequent run only pulls articles published since the last run timestamp. Every processed URL is stored in SQLite — if the same article appears across multiple feeds or on a re-run, it's automatically skipped. No duplicates, no missed signals.

Stage 2 — AI Intent Classification

Each collected article is sent to Groq (Llama 3.3 70B) for intent analysis. To keep API usage efficient, signals are batched 10 at a time — one API call classifies 10 articles simultaneously.

The LLM returns structured data for each signal:

Field	Description
Company name	Extracted from the article
Signal category	funding / product launch / digital transformation / CRM adoption / hiring tech team
Why relevant	One-line explanation of the buying signal
Urgency score	1–10, based on how directly it indicates purchase intent

Scoring logic the LLM applies:

Is the company actively spending money on tech right now?
Is there a concrete trigger — launch, contract, product announcement?
Is the company in the PropTech / real estate digitalization space?
Is this an India-based company? (India signals are prioritized for ICP fit)

Score thresholds:

Score	Action
≥ 7	Hot lead — saved to DB + instant Discord webhook alert
5–6	Warm lead — saved to DB, written to Excel and Sheets
< 5	Discarded

This batching approach reduced my Groq API calls by ~10x compared to one-call-per-article.

Stage 3 — Contact Discovery

For every qualified lead, the agent finds the company website and the right decision-maker — entirely using DuckDuckGo search, no paid APIs.

Website Discovery
Runs a DuckDuckGo search for the company name and identifies the official domain using slug-based matching — so century21 resolves to century21.com and not an unrelated result. Subpage URLs are stripped back to the homepage root.

LinkedIn Contact Discovery
Runs a DuckDuckGo search targeting LinkedIn for C-level contacts at the company. Rather than picking the first result, every candidate is scored:

Signal	Points
Role keyword in snippet (CEO, CTO, Founder, Co-Founder, MD, Head of Technology)	+3
Company name matches in the LinkedIn snippet	+2
Founder/executive bonus	+1

A candidate must hit a minimum score to be selected. The highest-scoring candidate wins. Extracted: full name, job title, LinkedIn profile URL.

This scoring approach significantly improved accuracy over naive "first result" selection.

Stage 4 — Email Discovery (3-Step Fallback)

Step	Method	Confidence
1	DuckDuckGo search for `"[Name]" [company] email` — finds publicly listed addresses in search snippets	High
2	Website scraping — fetches `/contact`, `/team`, `/about` pages and extracts emails from HTML	Medium
3	Pattern generation — constructs `first.last@domain.com` from contact name + company domain	Low

If no email is found through any step, the lead is marked as not found. Low-confidence (pattern) emails should be verified before a large send — I'd integrate NeverBounce or ZeroBounce in a production version.

Accuracy from my runs:

Direct search (high confidence): ~15%
Website scraping (medium): ~20%
Pattern generation (low): ~50%
Not found: ~15%

Stage 5 — AI Email Generation

For every lead with a discovered email, the agent generates a complete 4-email outreach sequence. Emails are written by the LLM using the specific buying signal as context — no generic templates.

Email	Send On	Purpose
Original	Day 0	Personalized cold email referencing the exact signal (e.g. funding round, product launch)
Followup 1	Day 3	Value-add insight, continues the same thread
Followup 2	Day 7	Softer check-in from a different angle
Followup 3	Day 14	Final graceful close, leaves the door open

All followups are sent inside the same Gmail thread using References and In-Reply-To headers — they appear as a proper reply chain in the recipient's inbox.

AI provider rotation is built in: Groq → DeepSeek → OpenRouter in sequence. If one provider hits a rate limit, the next is used automatically without interrupting the run. This was essential for running large batches without babysitting the process.

Stage 6 — Discord Campaign Bot

Once emails are generated, a Discord bot with slash commands manages the full outreach campaign.

/send          → send original or followup email for a lead
/leads         → list all leads + email status
/status        → campaign dashboard (generated / sent / replied)
/sheets        → Google Sheets tab info + live link
/preview       → preview full email sequence for any lead
/check_replies → scan Gmail inbox for replies, update DB
/pending       → list all emails ready to send

A few design decisions here worth noting:

/send enforces followup order — Followup 2 cannot be sent before Followup 1
A random delay of 60 seconds to 3 minutes is applied between sends to prevent Gmail flagging
/check_replies scans Gmail via IMAP, matches replies to sent emails by thread, and updates the DB — so your campaign status is always accurate
/status gives a real-time dashboard: total leads, emails generated, sent, replied, and reply rate

Data Storage & Outputs

Everything writes in real-time — not in a batch at the end of the run:

Output	Purpose
`leads.xlsx`	Local structured lead file, color-coded by intent score
Google Sheets	Live team-accessible spreadsheet; auto-creates new tabs after every 500 rows
`leads.db` (SQLite)	Powers deduplication, tracks email campaign status and reply state

Output columns: Company Name · Contact Name · Title · LinkedIn URL · Company Website · Contact Email · Email Status · Signal Source · Signal Summary · Intent Score · Date Found

Project Structure

AI_AGENT/
│
├── main.py                    ← Entry point — interactive CLI menu
├── scheduler.py               ← Daily automated run at 9AM IST
├── agents/
│   ├── signal_monitor.py      ← RSS collection
│   ├── intent_classifier.py   ← Groq LLM classification + scoring
│   ├── contact_finder.py      ← Website + LinkedIn contact discovery
│   ├── email_discovery.py     ← 3-step email address finder
│   ├── email_generator.py     ← AI email + followup generator
│   ├── email_sender.py        ← Gmail SMTP send + IMAP reply check
│   ├── excel_writer.py        ← Real-time Excel writer
│   ├── sheets_writer.py       ← Google Sheets sync
│   ├── discord_alerts.py      ← Webhook hot lead alerts
│   └── discord_bot.py         ← Slash command campaign bot
└── utils/
    ├── database.py            ← SQLite schema + all DB functions
    ├── ai_router.py           ← Multi-provider AI key rotation
    └── rate_limiter.py        ← Centralized random delays

Overall Accuracy

Metric	Result
Signal relevance rate	~70% of collected articles are true buying signals
Contact discovery rate	~65% of leads get a LinkedIn contact found
Email discovery (all methods)	~85%
AI email generation success rate	~85%
Deduplication accuracy	100%

What I'd Improve Next

Email verification — Integrate NeverBounce/ZeroBounce to cut bounce rate from ~40% to ~5%
LinkedIn direct scraping — Use Playwright with a logged-in session instead of search snippets
Auto-retry failed email generation — Strip control characters from AI responses and retry automatically
More signal sources — Add Crunchbase, AngelList, LinkedIn Company Updates
Web dashboard — Replace the Discord bot with a FastAPI + React UI for non-technical team members

Key Lessons

Batching LLM calls matters more than you think. Going from one API call per article to one call per 10 articles cut costs and latency dramatically without losing classification quality.

Resilience > features. The hardest parts weren't the AI — they were the boring infrastructure: rate limit fallbacks across 3 providers, deduplication across runs, followup order enforcement, keeping Excel and Sheets in sync in real-time. None of this is glamorous but all of it is what makes the agent actually usable.

DuckDuckGo is surprisingly capable. Building a scored candidate selection system on top of DuckDuckGo search results gets you 65% contact discovery without spending a dollar on the LinkedIn API or Hunter.io.

No generic templates. The biggest unlock for cold email quality was giving the LLM the specific buying signal as context. An email that says "I saw you just raised a Series A and are expanding your tech team" performs completely differently from a generic "I'd love to connect."

The full repo is on GitHub. Happy to answer questions in the comments.

Tags: #python #ai #automation #machinelearning #llm #proptech #sales #buildinpublic #agentic #opensource

DEV Community