From "company just raised funding" to "personalized cold email sent" — zero manual steps.
If you've ever worked in sales or growth, you know the grind: spend hours on Google looking for companies that might be buying, find a contact, guess their email, write a cold email, follow up three times, repeat. It's 70% research and 30% actual selling.
I wanted to eliminate the 70%.
This is my second AI agent project. My first was a standalone outreach agent. This one goes further — it combines a full intent-based lead generation pipeline with a personalized outreach automation system, and wraps everything in a Discord bot so you can manage campaigns without touching a terminal.
Here's exactly how it works, what I built, and what I learned.
What It Does (The Big Picture)
The agent runs a 6-stage pipeline:
- Monitor the internet for PropTech buying signals
- Classify each signal using an LLM and score purchase intent
- Find the right decision-maker at the company
- Discover their email address
- Generate a personalized 4-email outreach sequence
- Send and manage the campaign via a Discord bot
Everything writes to Excel, Google Sheets, and SQLite in real-time. A daily scheduler runs the whole thing at 9AM automatically.
Stage 1 — Signal Monitoring
The agent monitors three types of sources using RSS feeds parsed with feedparser:
- Google News RSS — PropTech and real estate keyword sets, catching funding rounds, digital transformation announcements, app launches, CRM adoption stories globally
- PropTech Industry Publications — Inman, Propmodo, HousingWire — high-quality signals from authoritative sources
- Real Estate News Sites — CRE Herald, Financial Post Real Estate — broader market signals including India-specific coverage
Time-windowed collection is one of the more important design decisions here. On the first run, it collects articles from the last 14 days to build an initial base. Every subsequent run only pulls articles published since the last run timestamp. Every processed URL is stored in SQLite — if the same article appears across multiple feeds or on a re-run, it's automatically skipped. No duplicates, no missed signals.
Stage 2 — AI Intent Classification
Each collected article is sent to Groq (Llama 3.3 70B) for intent analysis. To keep API usage efficient, signals are batched 10 at a time — one API call classifies 10 articles simultaneously.
The LLM returns structured data for each signal:
| Field | Description |
|---|---|
| Company name | Extracted from the article |
| Signal category | funding / product launch / digital transformation / CRM adoption / hiring tech team |
| Why relevant | One-line explanation of the buying signal |
| Urgency score | 1–10, based on how directly it indicates purchase intent |
Scoring logic the LLM applies:
- Is the company actively spending money on tech right now?
- Is there a concrete trigger — launch, contract, product announcement?
- Is the company in the PropTech / real estate digitalization space?
- Is this an India-based company? (India signals are prioritized for ICP fit)
Score thresholds:
| Score | Action |
|---|---|
| ≥ 7 | Hot lead — saved to DB + instant Discord webhook alert |
| 5–6 | Warm lead — saved to DB, written to Excel and Sheets |
| < 5 | Discarded |
This batching approach reduced my Groq API calls by ~10x compared to one-call-per-article.
Stage 3 — Contact Discovery
For every qualified lead, the agent finds the company website and the right decision-maker — entirely using DuckDuckGo search, no paid APIs.
Website Discovery
Runs a DuckDuckGo search for the company name and identifies the official domain using slug-based matching — so century21 resolves to century21.com and not an unrelated result. Subpage URLs are stripped back to the homepage root.
LinkedIn Contact Discovery
Runs a DuckDuckGo search targeting LinkedIn for C-level contacts at the company. Rather than picking the first result, every candidate is scored:
| Signal | Points |
|---|---|
| Role keyword in snippet (CEO, CTO, Founder, Co-Founder, MD, Head of Technology) | +3 |
| Company name matches in the LinkedIn snippet | +2 |
| Founder/executive bonus | +1 |
A candidate must hit a minimum score to be selected. The highest-scoring candidate wins. Extracted: full name, job title, LinkedIn profile URL.
This scoring approach significantly improved accuracy over naive "first result" selection.
Stage 4 — Email Discovery (3-Step Fallback)
| Step | Method | Confidence |
|---|---|---|
| 1 | DuckDuckGo search for "[Name]" [company] email — finds publicly listed addresses in search snippets |
High |
| 2 | Website scraping — fetches /contact, /team, /about pages and extracts emails from HTML |
Medium |
| 3 | Pattern generation — constructs first.last@domain.com from contact name + company domain |
Low |
If no email is found through any step, the lead is marked as not found. Low-confidence (pattern) emails should be verified before a large send — I'd integrate NeverBounce or ZeroBounce in a production version.
Accuracy from my runs:
- Direct search (high confidence): ~15%
- Website scraping (medium): ~20%
- Pattern generation (low): ~50%
- Not found: ~15%
Stage 5 — AI Email Generation
For every lead with a discovered email, the agent generates a complete 4-email outreach sequence. Emails are written by the LLM using the specific buying signal as context — no generic templates.
| Send On | Purpose | |
|---|---|---|
| Original | Day 0 | Personalized cold email referencing the exact signal (e.g. funding round, product launch) |
| Followup 1 | Day 3 | Value-add insight, continues the same thread |
| Followup 2 | Day 7 | Softer check-in from a different angle |
| Followup 3 | Day 14 | Final graceful close, leaves the door open |
All followups are sent inside the same Gmail thread using References and In-Reply-To headers — they appear as a proper reply chain in the recipient's inbox.
AI provider rotation is built in: Groq → DeepSeek → OpenRouter in sequence. If one provider hits a rate limit, the next is used automatically without interrupting the run. This was essential for running large batches without babysitting the process.
Stage 6 — Discord Campaign Bot
Once emails are generated, a Discord bot with slash commands manages the full outreach campaign.
/send → send original or followup email for a lead
/leads → list all leads + email status
/status → campaign dashboard (generated / sent / replied)
/sheets → Google Sheets tab info + live link
/preview → preview full email sequence for any lead
/check_replies → scan Gmail inbox for replies, update DB
/pending → list all emails ready to send
A few design decisions here worth noting:
-
/sendenforces followup order — Followup 2 cannot be sent before Followup 1 - A random delay of 60 seconds to 3 minutes is applied between sends to prevent Gmail flagging
-
/check_repliesscans Gmail via IMAP, matches replies to sent emails by thread, and updates the DB — so your campaign status is always accurate -
/statusgives a real-time dashboard: total leads, emails generated, sent, replied, and reply rate
Data Storage & Outputs
Everything writes in real-time — not in a batch at the end of the run:
| Output | Purpose |
|---|---|
leads.xlsx |
Local structured lead file, color-coded by intent score |
| Google Sheets | Live team-accessible spreadsheet; auto-creates new tabs after every 500 rows |
leads.db (SQLite) |
Powers deduplication, tracks email campaign status and reply state |
Output columns: Company Name · Contact Name · Title · LinkedIn URL · Company Website · Contact Email · Email Status · Signal Source · Signal Summary · Intent Score · Date Found
Project Structure
AI_AGENT/
│
├── main.py ← Entry point — interactive CLI menu
├── scheduler.py ← Daily automated run at 9AM IST
├── agents/
│ ├── signal_monitor.py ← RSS collection
│ ├── intent_classifier.py ← Groq LLM classification + scoring
│ ├── contact_finder.py ← Website + LinkedIn contact discovery
│ ├── email_discovery.py ← 3-step email address finder
│ ├── email_generator.py ← AI email + followup generator
│ ├── email_sender.py ← Gmail SMTP send + IMAP reply check
│ ├── excel_writer.py ← Real-time Excel writer
│ ├── sheets_writer.py ← Google Sheets sync
│ ├── discord_alerts.py ← Webhook hot lead alerts
│ └── discord_bot.py ← Slash command campaign bot
└── utils/
├── database.py ← SQLite schema + all DB functions
├── ai_router.py ← Multi-provider AI key rotation
└── rate_limiter.py ← Centralized random delays
Overall Accuracy
| Metric | Result |
|---|---|
| Signal relevance rate | ~70% of collected articles are true buying signals |
| Contact discovery rate | ~65% of leads get a LinkedIn contact found |
| Email discovery (all methods) | ~85% |
| AI email generation success rate | ~85% |
| Deduplication accuracy | 100% |
What I'd Improve Next
- Email verification — Integrate NeverBounce/ZeroBounce to cut bounce rate from ~40% to ~5%
- LinkedIn direct scraping — Use Playwright with a logged-in session instead of search snippets
- Auto-retry failed email generation — Strip control characters from AI responses and retry automatically
- More signal sources — Add Crunchbase, AngelList, LinkedIn Company Updates
- Web dashboard — Replace the Discord bot with a FastAPI + React UI for non-technical team members
Key Lessons
Batching LLM calls matters more than you think. Going from one API call per article to one call per 10 articles cut costs and latency dramatically without losing classification quality.
Resilience > features. The hardest parts weren't the AI — they were the boring infrastructure: rate limit fallbacks across 3 providers, deduplication across runs, followup order enforcement, keeping Excel and Sheets in sync in real-time. None of this is glamorous but all of it is what makes the agent actually usable.
DuckDuckGo is surprisingly capable. Building a scored candidate selection system on top of DuckDuckGo search results gets you 65% contact discovery without spending a dollar on the LinkedIn API or Hunter.io.
No generic templates. The biggest unlock for cold email quality was giving the LLM the specific buying signal as context. An email that says "I saw you just raised a Series A and are expanding your tech team" performs completely differently from a generic "I'd love to connect."
The full repo is on GitHub. Happy to answer questions in the comments.
Tags: #python #ai #automation #machinelearning #llm #proptech #sales #buildinpublic #agentic #opensource

Top comments (0)