SpurIQ Engineering

Posted on Apr 30

How We Use LLM Agents + CRM APIs to Auto-Generate Contextual Follow-Up Emails

#ai #python #llm #api

The follow-up email problem is not a writing problem

Everyone frames the sales follow-up email as a writing quality problem. Better copy, better subject lines, better personalization. A/B test the CTA. Shorten the paragraphs.

That is the wrong frame.

The actual problem is context. A rep finishing their sixth call of the day does not write a bad follow-up because they lack writing skill. They write a bad one, or write nothing at all, because they do not have the bandwidth to assemble the right context, synthesize what the prospect actually said and translate it into a message that sounds like it came from a person who was paying full attention.

What we built for LeadIQ and DealIQ is not an email template engine. It is a context assembly and generation pipeline that makes the follow-up email the least effortful and most accurate part of the post-call workflow. This post is the actual implementation.

System overview

Here is what the pipeline does, end to end:

Call ends
    ↓
Transcript processed by call intelligence platform
    ↓
Webhook fires → ingestion service
    ↓
Context assembly: CRM history + transcript + contact record
    ↓
LLM extraction: structured call summary
    ↓
LLM generation: draft follow-up email
    ↓
Validation + tone scoring
    ↓
Pending queue → rep reviews → sends (or auto-sends after TTL)

Let's go layer by layer.

Step 1: Ingestion and webhook handling

We subscribe to call completion webhooks from our call intelligence integration (we support Gong, Fireflies and Chorus). The webhook payload includes:

json{
  "call_id": "call_abc123",
  "crm_deal_id": "deal_xyz789",
  "participants": ["rep@company.com", "prospect@customer.com"],
  "transcript_url": "https://...",
  "duration_seconds": 2847,
  "completed_at": "2025-04-28T14:32:00Z"
}

Our ingestion service validates the payload, confirms the transcript is ready and pushes a job to the processing queue:

python
@app.route("/webhooks/call-complete", methods=["POST"])
def handle_call_complete():
    payload = request.json

    if not validate_webhook_signature(request, payload):
        return jsonify({"error": "Invalid signature"}), 401

    job = {
        "call_id": payload["call_id"],
        "deal_id": payload["crm_deal_id"],
        "transcript_url": payload["transcript_url"],
        "participants": payload["participants"],
        "queued_at": datetime.utcnow().isoformat()
    }

    queue.push("post_call_pipeline", job, priority="high")
    return jsonify({"status": "queued"}), 200

Step 2: Context assembly

This is the most important step and the one most people skip when building naive follow-up tools. Generating a good follow-up email requires more than the transcript. It requires:

Deal history: What stage is this deal at? What was discussed in previous calls?
Contact record: What is this person's role, seniority and communication history?
Open tasks and commitments: What did the rep already promise to send?
Previous email threads: What has already been said in writing?

We pull all of this from the CRM before touching the LLM:

python
class ContextAssembler:
    def __init__(self, crm_adapter, call_store):
        self.crm = crm_adapter
        self.calls = call_store

    def assemble(self, deal_id: str, call_id: str) -> DealContext:
        deal = self.crm.get_deal(deal_id)
        contact = self.crm.get_primary_contact(deal_id)

        # Get all previous calls for this deal, compressed if > 3
        call_history = self.calls.get_by_deal(deal_id, limit=10)
        compressed_history = self._compress_history(call_history, current_call_id=call_id)

        # Get open tasks and rep commitments from CRM
        open_tasks = self.crm.get_open_tasks(deal_id)

        # Get last 5 email threads
        email_threads = self.crm.get_email_history(deal_id, limit=5)

        return DealContext(
            deal=deal,
            contact=contact,
            call_history=compressed_history,
            open_tasks=open_tasks,
            email_threads=email_threads
        )

    def _compress_history(self, calls, current_call_id):
        recent = [c for c in calls[-3:] if c.id != current_call_id]
        older = [c for c in calls[:-3] if c.id != current_call_id]

        # Recent calls: return full summaries
        # Older calls: compress to key facts only
        compressed_older = [self._extract_key_facts(c) for c in older]

        return recent + compressed_older

The compression logic for older calls uses a lightweight LLM call with a strict output schema, we are not asking it to rewrite, just to extract: stakeholders, key pain points, commitments made, outcomes. This keeps older context within token budget without losing deal-relevant signal.

Step 3: LLM extraction: Structured call summary

Before generating the email, we run a structured extraction pass on the current call transcript. This is a separate LLM call with a strict JSON schema output:

python
EXTRACTION_PROMPT = """
You are analyzing a B2B sales call transcript. Extract the following information precisely.
Return ONLY valid JSON matching the schema. Do not add commentary.

Schema:
{
  "discussed_pain_points": ["string"],
  "rep_commitments": ["string - specific, actionable"],
  "prospect_commitments": ["string - specific, actionable"],
  "objections_raised": ["string"],
  "buying_signals": ["string"],
  "risk_signals": ["string"],
  "agreed_next_step": "string or null",
  "agreed_timeline": "string or null",
  "call_sentiment": "positive|neutral|cautious|negative"
}

Transcript:
{transcript}
"""

def extract_call_structure(transcript: str, model_client) -> CallStructure:
    response = model_client.complete(
        prompt=EXTRACTION_PROMPT.format(transcript=transcript),
        max_tokens=800,
        temperature=0.1  # Low temp for extraction, we want consistency
    )

    try:
        data = json.loads(response.text)
        return CallStructure(**data)
    except (json.JSONDecodeError, ValidationError) as e:
        # Log extraction failure, flag for human review
        raise ExtractionFailure(call_id=call_id, error=str(e))

We run validation on every extraction output. Missing agreed_next_step when the transcript clearly has one is a failure mode we caught early. We added a post-extraction validation pass that re-runs the extraction with a corrective prompt if confidence scores fall below threshold.

Step 4: Email generation: The actual interesting part

Now we have everything we need. The generation prompt is where the craft lives:

python
FOLLOWUP_PROMPT = """
You are drafting a follow-up email from {rep_name} to {prospect_name} at {company}.

Write as {rep_name}. Use first person. Sound like a real person who was paying full attention on the call, not a template, not a CRM auto-response.

CALL CONTEXT:
- Pain points discussed: {pain_points}
- What {rep_name} committed to: {rep_commitments}
- What {prospect_name} committed to: {prospect_commitments}
- Agreed next step: {next_step}
- Agreed timeline: {timeline}
- Objections raised: {objections}
- Call sentiment: {sentiment}

DEAL HISTORY SUMMARY:
{compressed_history}

OPEN TASKS (from CRM):
{open_tasks}

PREVIOUS EMAIL TONE (last thread):
{last_email_sample}

INSTRUCTIONS:
- Open with a genuine reference to something specific from today's call, not generic
- Address any commitments made by the rep directly
- Reference the agreed next step clearly and confirm the timeline
- If objections were raised, acknowledge one naturally, do not ignore them
- Do not use: 'Hope this finds you well', 'Per our conversation', 'As discussed', 'Circling back'
- Keep it under 200 words
- End with a single, specific call to action

Output format:
Subject: [subject line]
Body: [email body]
"""

The instruction layer is doing a lot of work here. Specifically: the ban on filler phrases, the word limit and the requirement to reference something specific from the call. These three constraints eliminate 90% of the "this sounds like an AI wrote it" problem.

We run the generation at temperature 0.7, high enough for natural variation, low enough that the factual content stays grounded.

Step 5: Validation and tone scoring

Generated email goes through two checks before it reaches the rep:
Factual grounding check: We run a lightweight verification pass, does the email reference anything that was not in the extraction output? If the model hallucinated a commitment the rep did not make, we catch it here.

python
def validate_factual_grounding(email: str, call_structure: CallStructure) -> ValidationResult:
    claims = extract_claims_from_email(email)

    for claim in claims:
        if not is_grounded_in_context(claim, call_structure):
            return ValidationResult(
                passed=False,
                issue=f"Ungrounded claim detected: {claim}"
            )

    return ValidationResult(passed=True)

Tone scoring: We score the email on three dimensions, formality match (does this match how the rep usually writes?), specificity (does it reference actual call content?) and CTA clarity (is there exactly one ask?). Emails that score low on any dimension go back for regeneration with corrective instructions.

Step 6: Pending queue and rep approval flow

Early versions auto-sent. Reps hated it.
Not because the emails were bad. Because the emails were their emails, going out under their names and they had no visibility into what was being sent.
We moved to a pending queue model:

python
class PendingAction:
    action_type: str  # "send_email", "update_crm", "create_task"
    deal_id: str
    payload: dict
    created_at: datetime
    auto_execute_at: datetime  # 4 hours after creation by default
    status: str  # "pending", "approved", "rejected", "auto_executed"

Reps get a notification: "3 actions ready for your review on Deal X." They can approve, edit, or reject each one. Anything not touched in 4 hours auto-executes (configurable per rep). This is the AI revenue orchestration model that actually works in practice, the system owns follow-through, the rep owns override.

Adoption went from 31% to 84% when we made this change.

The numbers that matter

Across deals where the pipeline ran vs. deals where it did not (uneven coverage during rollout gave us a natural comparison group):

Average time from call end to follow-up email sent: 94 minutes → 11 minutes

Follow-up emails with specific call references: 22% → 96%
Rep time spent on post-call admin per week: ~4.5 hours → ~40 minutes

The follow-ups are not just faster. They are more accurate, more specific and more likely to move the deal forward, because they are built from what was actually said, not from what the rep remembered an hour later.

DEV Community