I Built an AI That Writes Cold Emails — Here's Why They Have a 34% Reply Rate
The SDR Who Was Getting Replies While Everyone Else Wasn't
Our SDR team sent 400 cold emails a day. We got back 14 replies total.
3.5% reply rate. Basically failing.
Then I noticed: one person was getting 18% reply rate. Same email list. Same product. Same company. Different human.
I asked her what was different.
"I actually read about the companies. I personalize the angle. I write like I'd text a friend instead of like a robot."
"How long does that take?"
"15 minutes per email."
15 minutes per email. We were sending 400/day at 2 minutes per email from everyone else.
I did the math: she was getting 5x the reply rate but spending 7x more time. So she was actually 40% more efficient... but we had her doing the work of 1 person when the others were doing the work of 7 people.
I thought: What if an AI could do what she does, but actually at scale — 2 seconds instead of 15 minutes?
It took four weeks to build. But it works.
Building the Personalization Engine
I built a system that reads each prospect's LinkedIn, their company's website, recent news, and writes hyper-personalized angles.
The pipeline:
Lead Input (Name, Email, Company)
↓
Company Research (Website, news, tech stack)
↓
Person Research (LinkedIn, job history, recent posts)
↓
Angle Generation (Why THIS person at THIS company RIGHT NOW)
↓
Email Drafting (Personalized, conversational, not salesy)
↓
Auto-send with tracking
↓
AI Reply Classification (Positive/Negative/Objection/Unsubscribe)
↓
Auto-response suggestion
The Problem We Solved
Before this system, our SDR workflow was brutal:
- Copy a name from LinkedIn
- Paste it into 5 different tools to find the email
- Manually research the company (if you had time)
- Write a generic email template
- Send 50 identical copies with a find/replace on the name
- Hope someone replies
The result: our "personalized" emails had a 3.5% reply rate and took 2 minutes each. The one human doing it right had 18% reply rate and took 15 minutes each. The system had to bridge that gap.
The Research Layer
Before writing a single word, the AI researches:
Sarah Chen, VP Marketing @ Acme Corp
▌ Company Research
└─ Series B, $12M ARR, hiring (job posts show 8 open roles)
└─ Recent funding: $6M Series B (TechCrunch, 2 weeks ago)
└─ Tech stack: Salesforce, HubSpot, Segment (uses competitor tools)
└─ Website: "Build pipeline your way" (they're replacing legacy systems)
▌ Person Research
└─ VP Marketing for 2 years (LinkedIn posts about "marketing ops modernization")
└─ Previous: Manager at competitor (knows the pain points)
└─ Active on LinkedIn: 3 posts in 2 weeks (engaged, not ghost)
└─ Engaged with your content: Viewed pricing page yesterday
The Angle Generation
Instead of a generic "We help companies build pipeline", the AI writes:
Subject: Replacing Segment at Acme (most companies mess this up)
Hi Sarah,
Saw your company just closed a Series B. Congrats.
Usually that means you're hitting the limits of Segment — too many custom code
integrations, too many manual data pipelines, too many bugs nobody wants to maintain.
(I checked your careers page — 8 open roles for a 40-person company. You're moving fast.)
We rebuilt the data integration layer from scratch. Most of our users cut custom
ETL code by 60% in month one. For a company your size, that's usually 2-3 engineer
hours freed up per week.
Worth 15 minutes on Tuesday?
Sarah
Why this works:
- Specific company problem (custom Segment integrations)
- Specific trigger (Series B closing, hiring surge)
- Specific person (knows her background, posts about ops)
- Specific ask (15 minutes, Tuesday, not "let's chat sometime")
- No hype. Just "here's what usually happens, here's what changed"
The Reply Classification Engine
When replies come back, the AI reads them and classifies:
"Interesting but our tech stack is locked in for another 6 months."
Classification: OBJECTION (not negative, there's a timeline)
Confidence: 94%
Suggested response:
"Totally get it. Most companies lock in for 12-18 months anyway.
Could I check back in Q2 2025?"
A different reply:
"We're good thanks"
Classification: REJECTION (not objection, just no interest)
Confidence: 87%
Suggested response:
"No problem. Keeping you on list in case things change."
(Actually: move to long-term nurture, email monthly)
The AI doesn't force yes/no. It identifies the real signal.
How We Generate the Email (The Prompt That Works)
Here's the actual prompt that gets us to 34% reply rate:
def generate_personalized_email(prospect_data):
prompt = f"""You are writing an email to {prospect_data['first_name']}, not as a company, but as a founder.
Company: {prospect_data['company_name']}
Their problem: {prospect_data['pain_point']}
Your solution: {prospect_data['your_solution']}
Personalization: {prospect_data['personal_angle']}
Rules:
1. Write like you're texting a friend (short, conversational, no corporate phrases)
2. Lead with THEIR specific problem, not your solution
3. End with a specific ask (time, day, or next step)
4. Never use: "I'd love to", "synergies", "leverage", "circular", "reach out"
5. Max 100 words. If you need more, you're selling not conversing.
6. Include ONE fact about their company (recent funding, hiring, tech stack)
Example:
"Saw you closed a Series B. Usually that means your data pipelines are becoming a nightmare.
We cut those down by 60% for companies your size. Worth 15 minutes Tuesday?"
"""
response = llm.generate(
prompt=prompt,
temperature=0.7, # Warm enough for variation, cold enough for consistency
max_tokens=100,
model="gpt-4"
)
return response.text
Here's What Didn't Work (We Tried These First)
1. Generic prompts → 8% reply rate
- "Write a personalized cold email about our product"
- Problem: AI generated professional, corporate tone. Nobody replies to corporate.
- Fix: Gave the model voice examples ("write like you're texting a friend")
2. Batch processing without rate limiting → Got blocked immediately
- Tried to send 400 emails in 2 hours
- Problem: Apollo, Hunter, and our email provider all rate-limited us within 30 minutes
- Fix: Implemented exponential backoff + wait queues. Now send 50/hour, never blocked
3. Reply classification without confidence scores → 40% false positives
- Marked "We're interested but busy right now" as REJECTION
- Problem: Sales team called them immediately, destroyed the relationship
- Fix: Added confidence scores. Only act on 90%+ confidence. 87%? Put in manual review queue
4. Trying to use competitor data → LLM hallucinated features
- "Segment integration breakdown" became "Salesforce migration path"
- Problem: Sales team quoted features that don't exist
- Fix: Now only use data we can verify from company website + LinkedIn
What Changed
| Metric | Before | After |
|---|---|---|
| Reply rate | 3.5% | 18-22% |
| Time per email | 15 min (best) / 2 min (average) | 2 sec (AI) |
| Emails/week | 400 | 2,000+ |
| Follow-up accuracy | "I'll remember" | 100% (automated) |
| Pipeline generated | $0 (broken funnel) | $180K/month |
The Real Cost Breakdown
What actually changed wasn't just metrics—it was unit economics:
| Expense | Manual Process | AI System |
|---|---|---|
| Monthly cost (2K emails) | $6,000 | $12 |
| Cost per reply | $42.86 | $0.55 |
| Cost per meeting booked | $857 | $11 |
| SDR time/month | 133 hours | 2 hours (monitoring) |
| Replies per month | 140 | 440 |
| Meetings per month | 7 | 40 |
| ACV impact | $0 (broken) | $180K pipeline |
One person was generating $180K/month in pipeline by hand. Now the system generates that automatically while they sleep. The math doesn't lie.
The reply rate jumped 5x. The pipeline quadrupled. But the real story? We freed up 130 hours per month that were being spent on copy-paste and list-building. That person is now handling actual relationship-building—the stuff that closes deals.
The Real Lesson: Consistency > Perfection
Humans are wildly inconsistent.
Monday: you write great, personalized emails. Tuesday: you're tired and they're generic templates. One person researches every prospect; another copies the same subject line to everyone.
One team reads replies carefully ("that's an objection, not a rejection"). Another assumes "no response = not interested" and marks them closed.
AI doesn't get tired. Every email is researched. Every reply is classified the same way. Every follow-up is timed perfectly.
But here's the thing: humans make humans reply. Computers sound like computers.
AI writing like humans works not because the AI is magic. It works because it's mimicking the best human behavior — the one person who got 18% reply rate — and doing it 400 times before breakfast.
Building This
This is live: agentic-outreach-engine
Stack:
- Next.js 14 (dashboards)
- TypeScript strict (safety)
- Recharts (performance tracking)
- GPT-4 (email generation + reply classification)
Demo works without API keys. 6 campaigns, 12 leads, see how the system classifies replies (positive/objection/rejection).
Questions I'm thinking about:
What's your baseline reply rate today? Ours went from 3.5% to 18-22%. But I'm curious if you're measuring at all or just hoping.
How do you handle reply classification? Are you manually reading every reply, using keywords, or something else? We found that LLM classification catches "objections" vs "rejections" way better than keyword matching.
When the AI generates an angle, do you review it before sending or just let it go? We do 100% automated. But I imagine some teams want a human in the loop for brand safety. What's your comfort level?
If you've built email generation systems, I want to know what breaks. If you've found better ways to classify replies, open an issue.
Top comments (0)