DEV Community: SpurIQ Engineering

How to Build a Signal-to-Action Workflow Without Writing a Monolith

SpurIQ Engineering — Mon, 18 May 2026 11:29:24 +0000

Most RevOps engineering projects start the same way.

Someone on the revenue team identifies a pain point, deals going dark, follow-ups falling through, CRM data rotting quietly in the background. They bring it to engineering. Engineering scopes it. Somebody says "we could automate that." Everyone agrees.

Three months later, there is a sprawling Node service that handles CRM webhooks, enriches contact data, scores leads, triggers email sequences, updates deal stages, sends Slack alerts and logs everything to a data warehouse. It works, until it does not. One upstream API changes. One webhook payload shifts format. One enrichment service goes down for 40 minutes. And now the entire pipeline is silent, nobody knows why and half the team is afraid to touch the code.

This is what happens when you build CRM automation as a monolith. Not because the engineers were sloppy. Because the problem seduces you into building too much in one place.

This post is about building signal-to-action workflows the other way, modular, composable, debuggable and designed to survive the inevitable moment when one piece breaks without taking everything else down with it.

First: What a signal-to-action workflow actually is

Before the architecture, the definition, because "signal-to-action" gets used loosely and it matters to be precise.

A signal is any event that changes the probability a deal will advance or decay if acted on or ignored. CRM events, call transcripts, website behavior, intent data, stakeholder changes, all of these are signals. They are inputs that carry information about where a deal or account is right now.

An action is anything that happens in response: a follow-up email goes out, a CRM field updates, a Slack message fires, a task gets created, a sequence triggers. Actions are the outputs.

The workflow is the intelligence layer in between, the thing that takes a signal, understands its context, decides what action is warranted and makes sure that action executes reliably.

The problem with monolithic sales workflow automation is that all three of these concerns, signal ingestion, decision logic and action execution, get tangled together in a single codebase. When you need to change how actions execute, you touch the same code that handles signal ingestion. When one enrichment API fails, it blocks action execution entirely. The whole thing becomes fragile in proportion to how tightly the pieces are coupled.

The alternative is a pipeline architecture: discrete, independently deployable services connected by an event queue. Each service does one thing. Each failure is contained. Each piece can be tested, upgraded and replaced without touching the others.

Here is how to build it.

The architecture: Four layers, one queue

[Signal Sources]
      ↓
[Ingestion Layer]    ← normalizes all incoming signals
      ↓
[Event Queue]        ← the backbone; decouples everything
      ↓
[Enrichment Layer]   ← adds context before decisions are made
      ↓
[Decision Layer]     ← determines what action is required
      ↓
[Execution Layer]    ← carries out the action across the GTM stack
      ↓
[Observability]      ← logs, traces, failure alerts

Each layer is a separate service. They communicate exclusively through the queue. No layer knows about the internal implementation of any other. This is the design decision that keeps the system maintainable as it grows.

Layer 1: Signal ingestion - normalize everything at the boundary

Your signal sources will be inconsistent. Your CRM fires webhooks in one format. Your call intelligence platform fires them in another. Your intent data provider sends batch files on a schedule. Your website sends behavioral events via a tracking endpoint.

If you let each of these formats bleed into your core workflow, you will write format-specific handling code everywhere. Instead, write a thin adapter for each source that normalizes into one canonical schema at the point of entry.

# Canonical signal schema
@dataclass
class Signal:
    signal_id: str
    signal_type: str          # "crm_event" | "call_complete" | "intent" | "behavioral"
    source: str               # "hubspot" | "gong" | "bombora" | "website"
    account_id: str
    deal_id: Optional[str]
    contact_id: Optional[str]
    payload: dict             # normalized, source-agnostic
    urgency: str              # "immediate" | "high" | "standard" | "low"
    confidence: float         # 0.0 - 1.0
    fired_at: datetime
    received_at: datetime

# HubSpot CRM webhook → normalized signal
class HubSpotAdapter:
    def normalize(self, raw: dict) -> Signal:
        return Signal(
            signal_id=f"hs_{raw['objectId']}_{raw['propertyName']}",
            signal_type="crm_event",
            source="hubspot",
            account_id=raw.get("companyId"),
            deal_id=raw.get("objectId"),
            contact_id=raw.get("contactId"),
            payload={
                "event": raw["propertyName"],
                "old_value": raw.get("propertyValue", {}).get("from"),
                "new_value": raw.get("propertyValue", {}).get("to"),
            },
            urgency=self._classify_urgency(raw),
            confidence=0.95,
            fired_at=datetime.fromtimestamp(raw["occurredAt"] / 1000),
            received_at=datetime.utcnow()
        )

    def _classify_urgency(self, raw: dict) -> str:
        high_urgency_events = {"dealstage", "closedate", "hs_deal_stage_probability"}
        return "high" if raw.get("propertyName") in high_urgency_events else "standard"

Once normalized, every signal gets pushed to the queue in the same format. Downstream layers never need to know where the signal came from or what format the source used. The adapter layer absorbs all that complexity at the edge.

Layer 2: The event queue - the backbone of the whole system

This is the piece that makes everything else composable. Your queue (SQS, RabbitMQ, Kafka, Redis Streams, choose based on your scale and operational preference) sits between every layer. Nothing calls another layer directly. Everything publishes to the queue and subscribes from it.

class EventQueue:
    def publish(self, topic: str, event: dict, priority: str = "standard"):
        message = {
            "event_id": str(uuid4()),
            "topic": topic,
            "priority": priority,
            "payload": event,
            "published_at": datetime.utcnow().isoformat()
        }
        self.backend.push(topic, message)

    def subscribe(self, topic: str, handler: Callable, batch_size: int = 10):
        while True:
            messages = self.backend.poll(topic, batch_size)
            for msg in messages:
                try:
                    handler(msg["payload"])
                    self.backend.ack(msg)
                except Exception as e:
                    self.backend.nack(msg)
                    self.logger.error(f"Handler failed: {e}", extra={"msg": msg})

The queue gives you three things that a monolith cannot:

Isolation: If the enrichment layer goes down for 20 minutes, signals pile up in the queue and process when it recovers. Nothing is lost. Nothing downstream is blocked by the enrichment failure.

Independent scaling: If your intent data processor needs to handle 10x more events than your call-complete processor, you scale the relevant consumer independently. You do not scale the entire pipeline.

Auditability: Every event that touches the queue is logged with a timestamp. When something goes wrong, you have a full event trail to trace through. This is what makes RevOps engineering debuggable, not guesswork.

Layer 3: Enrichment - context before decisions

Enrichment is a separate consumer that pulls signals from the queue, adds context and republishes enriched signals to a different topic. It should not make decisions. It should not trigger actions. It should only make signals smarter.

class EnrichmentService:
    def enrich(self, signal: dict) -> dict:
        deal_id = signal.get("deal_id")
        account_id = signal.get("account_id")

        enriched = {**signal}

        # Pull deal context from CRM
        if deal_id:
            deal = self.crm.get_deal(deal_id)
            enriched["deal_context"] = {
                "stage": deal.stage,
                "days_in_stage": deal.days_in_stage,
                "last_activity_days_ago": deal.last_activity_days_ago,
                "close_date": deal.close_date.isoformat(),
                "open_tasks": [t.description for t in deal.open_tasks],
                "competitors_mentioned": deal.competitors_mentioned
            }

        # Pull account firmographics
        if account_id:
            account = self.crm.get_account(account_id)
            enriched["account_context"] = {
                "employee_count": account.employee_count,
                "industry": account.industry,
                "recent_signals": self.intent_store.get_recent(account_id, days=30)
            }

        # Re-score urgency with context
        enriched["urgency"] = self._rescore_urgency(enriched)

        return enriched

    def _rescore_urgency(self, enriched: dict) -> str:
        deal = enriched.get("deal_context", {})
        days_to_close = self._days_until(deal.get("close_date"))
        idle_days = deal.get("last_activity_days_ago", 0)

        # Promote urgency if the deal is close to close date and idle
        if days_to_close and days_to_close < 21 and idle_days > 10:
            return "immediate"
        return enriched.get("urgency", "standard")

The enrichment layer is where signals go from being raw events to being context-aware inputs. A CRM event that says "deal stage changed" becomes a context-aware signal that says "deal stage changed, this deal has been idle 12 days, close date is in 18 days, a competitor was mentioned in the last call." That context is what the decision layer needs to make a good call.

Keep enrichment idempotent. The same signal enriched twice should produce the same output. This makes retrying safe, which matters when external APIs are flaky.

Layer 4: The decision layer - rules first, LLM second

This is where the GTM automation intelligence lives. The decision layer consumes enriched signals and produces action plans. It does not execute actions itself. It decides what should happen and publishes that decision to the queue.

Run a two-pass decision process:

class DecisionEngine:
    def decide(self, enriched_signal: dict) -> ActionPlan:
        # Pass 1: Rules engine — fast, deterministic, zero-LLM
        rule_result = self.rules_engine.evaluate(enriched_signal)

        if rule_result.is_conclusive:
            return rule_result.action_plan

        # Pass 2: LLM reasoning — for complex, ambiguous cases
        return self.llm_decider.reason(enriched_signal)

class RulesEngine:
    def evaluate(self, signal: dict) -> RuleResult:
        deal = signal.get("deal_context", {})
        idle_days = deal.get("last_activity_days_ago", 0)
        days_to_close = self._days_until(deal.get("close_date"))
        competitors = deal.get("competitors_mentioned", [])

        actions = []

        # Rule: Idle deal approaching close date
        if idle_days > 10 and days_to_close and days_to_close < 21:
            actions.append(Action(
                type="draft_followup",
                priority="immediate",
                context={"reason": "idle_near_close", "idle_days": idle_days}
            ))

        # Rule: Competitor mentioned, no competitive task exists
        if competitors and not self._has_competitive_task(deal):
            actions.append(Action(
                type="create_task",
                priority="high",
                context={"task": f"Address competitive position vs {competitors[0]}"}
            ))

        return RuleResult(
            is_conclusive=len(actions) > 0,
            action_plan=ActionPlan(actions=actions) if actions else None
        )

Rules handle the clear-cut cases fast. Reserve the LLM call for genuinely ambiguous situations, multi-stakeholder complexity, conflicting signals, deals with long histories that require synthesis. This keeps latency low for the 60–70% of signals that have obvious responses and uses inference budget only where it adds real value.

Layer 5: Execution - adapters again, same principle

The execution layer consumes action plans and carries them out against your GTM stack. Apply the same adapter pattern you used for ingestion:

class ExecutionRouter:
    adapters = {
        "draft_followup": EmailDraftAdapter,
        "update_crm": CRMWriteAdapter,
        "create_task": TaskAdapter,
        "send_alert": SlackAdapter,
        "enroll_sequence": SequenceAdapter
    }

    def execute(self, action: Action):
        adapter_class = self.adapters.get(action.type)
        if not adapter_class:
            raise UnknownActionType(action.type)

        adapter = adapter_class(self.config)
        result = adapter.execute(action)

        self.audit_log.record(action, result)
        return result

Each adapter handles one action type against one external system. When Salesforce changes their API, you update one adapter. The rest of the pipeline does not care.

Build a pending queue on top of this for actions that require human approval before executing, follow-up emails especially. Reps want to know what is going out under their name. An approval layer with a 4-hour auto-execute TTL gives them visibility without creating a bottleneck.

What makes this hold up over time

The teams that build sales workflow automation that actually survives contact with production share a few habits:

They treat observability as a first-class concern, not an afterthought: Every event that enters the queue, every enrichment call, every decision, every execution attempt, logged with a correlation ID that traces the full journey from signal to action. When something breaks, you trace the ID, not the guesswork.

They design for partial failure: The enrichment service will go down. An external API will timeout. The LLM will occasionally return malformed JSON. Build for it explicitly, dead letter queues, retry with exponential backoff, graceful degradation to rule-based decisions when the LLM is unavailable.

They resist the urge to merge layers: The moment someone says "we could just call the enrichment function directly from the ingestion handler, it would be simpler", that is the moment the monolith starts growing back. The queue is the contract. Maintain it.

They version their schemas: Signal schemas change. Deal context fields get added. Action types evolve. Version from the start so old events can still be processed by newer consumers without breaking.

The architecture is the product

One more thing worth saying directly: the reason this architecture matters is not purely technical.

Revenue teams measure pipeline quality, deal velocity and forecast accuracy. Engineering teams measure uptime, latency and deployment frequency. A modular signal-to-action pipeline is the architecture that lets both teams win, because it is observable enough for engineering to trust and fast enough for revenue to rely on.

The 15-minute window between a signal firing and an action executing is not a performance target. It is a revenue metric. Build the system that hits it consistently and the upstream teams will notice.

Signal-to-Action in Under 15 Minutes: Our Real-Time Pipeline Architecture

SpurIQ Engineering — Fri, 01 May 2026 10:14:05 +0000

A signal without action is just noise with a timestamp

I want to be direct about something before we get into the architecture: most revenue teams are not losing deals because they lack data. They are losing deals because data arrives, sits unactioned for 36 hours and by the time someone gets to it the buying window has moved.

A prospect visits your pricing page at 11:42am on a Tuesday. Your intent data platform flags it. The flag lands in a dashboard. The SDR sees it on Thursday morning during their weekly review. They send an outreach email Friday. The prospect is already two calls deep with a competitor.

That is not a data problem. That is a signal-to-action gap problem and it is what we built SpurIQ's real-time pipeline to eliminate.

This post is the technical architecture behind how we get from signal detected to action executed in under 15 minutes. Not as a marketing claim. As an engineering reality.

What counts as a signal

Before the architecture, the definition. A signal, in our system, is any event that changes the probability that a specific deal or prospect will advance or decay, if acted on or ignored.

Signals we ingest:

Each signal type has a different latency profile. Intent data comes in batches (usually daily or twice daily). Behavioral signals from your own website can fire in real time. CRM decay signals require polling. The architecture has to handle all of these without a uniform assumption about when data arrives.

The pipeline architecture

Here is the full system:

[Signal Sources]
    ↓
[Ingestion Layer - per-source adapters]
    ↓
[Signal Normalization - unified schema]
    ↓
[Signal Router - priority + type classification]
    ↓
[Context Enrichment - CRM + deal history]
    ↓
[Action Decision Engine - LLM + rules]
    ↓
[Action Queue - prioritized execution]
    ↓
[Execution Layer - CRM writes, alerts, drafts, sequences]
    ↓
[Feedback Loop - outcome tracking]

Let me go through each layer.

Layer 1: Ingestion, Per-source adapters, unified output

Every signal source has its own data format, authentication model and delivery mechanism. Bombora sends batch files. Your website fires webhooks. CRM decay requires scheduled polling. Treating these differently all the way through the pipeline creates unmaintainable complexity.
We built a per-source adapter layer that translates everything into a normalized signal schema at the boundary:

@dataclass
class NormalizedSignal:
    signal_id: str
    signal_type: str           # "intent", "engagement", "behavioral", "decay", "relationship"
    source: str                # "bombora", "gong", "website", "crm"
    account_id: str
    deal_id: Optional[str]     # None if pre-pipeline signal
    contact_id: Optional[str]
    raw_payload: dict
    confidence_score: float    # 0.0 - 1.0
    detected_at: datetime
    received_at: datetime      # When we got it (latency tracking)
    urgency: str               # "immediate", "high", "standard", "low"

The confidence_score and urgency fields are set by each adapter based on source-specific heuristics. A prospect visiting the pricing page for 4 minutes gets urgency: "immediate". A prospect who appeared in a weekly intent batch from three weeks ago gets urgency: "low".

class WebsiteBehaviorAdapter:
    def normalize(self, raw_event: dict) -> NormalizedSignal:
        session_duration = raw_event.get("session_duration_seconds", 0)
        pages_visited = raw_event.get("pages", [])

        is_high_intent = (
            "pricing" in pages_visited and 
            session_duration > 180
        )

        return NormalizedSignal(
            signal_type="behavioral",
            source="website",
            urgency="immediate" if is_high_intent else "standard",
            confidence_score=0.85 if is_high_intent else 0.40,
            # ... other fields
        )

Layer 2: Signal routing, Not all signals are equal

After normalization, signals go to the router. The router does two things: deduplication and priority classification.

Deduplication: The same account might fire five behavioral signals in an hour. We do not want five separate action pipelines running for the same account. We aggregate signals within a configurable time window (default: 30 minutes for behavioral, 24 hours for intent) and pass the aggregate downstream.

class SignalAggregator:
    def __init__(self, redis_client, window_seconds=1800):
        self.redis = redis_client
        self.window = window_seconds

    def aggregate(self, signal: NormalizedSignal) -> Optional[AggregatedSignal]:
        key = f"signal_agg:{signal.account_id}:{signal.signal_type}"

        # Check if an aggregate already exists in window
        existing = self.redis.get(key)

        if existing:
            agg = AggregatedSignal.from_json(existing)
            agg.add(signal)
            self.redis.setex(key, self.window, agg.to_json())
            return None  # Don't fire yet - still aggregating
        else:
            # First signal in window - start aggregate, schedule flush
            agg = AggregatedSignal.start(signal)
            self.redis.setex(key, self.window, agg.to_json())
            schedule_flush(key, delay=self.window)
            return None

When the flush fires, the aggregated signal, with all component events, moves downstream as a single enriched package.

Priority classification: After aggregation, signals are classified into priority tiers that determine queue placement and processing SLA:

P1 (Immediate - < 5 min SLA): High-intent behavioral signals, strong engagement signals, reactivation of previously dark deals
P2 (High - < 15 min SLA): Intent surge signals, stakeholder job changes, competitor mentions in calls
P3 (Standard - < 2 hour SLA): Routine engagement signals, CRM hygiene flags
P4 (Low - next business day): Low-confidence intent, bulk enrichment updates

The signal-to-action gap is most acute at P1 and P2. These are the moments where a 15-minute response is meaningfully better than a 4-hour response.

Layer 3: Context enrichment

Before the decision engine sees a signal, we enrich it with deal and account context. This is what separates a signal from an insight.

A pricing page visit from a prospect with no open deal, no prior engagement and no ICP match is noise. The same visit from a prospect who is 3 weeks into an active deal, has a close date in 18 days and whose champion is listed as "evaluating alternatives" in the CRM, that is P1.

class ContextEnricher:
    def enrich(self, signal: AggregatedSignal) -> EnrichedSignal:
        account = self.crm.get_account(signal.account_id)

        open_deals = self.crm.get_open_deals(signal.account_id)
        deal_context = None

        if open_deals:
            deal = self._select_most_relevant_deal(open_deals, signal)
            deal_context = DealContext(
                stage=deal.stage,
                days_in_stage=deal.days_in_stage,
                close_date=deal.close_date,
                last_activity_days_ago=deal.last_activity_days_ago,
                open_tasks=self.crm.get_open_tasks(deal.id),
                risk_flags=deal.risk_flags
            )

        return EnrichedSignal(
            signal=signal,
            account=account,
            deal_context=deal_context,
            enrichment_at=datetime.utcnow()
        )

The enrichment layer also re-scores urgency. A signal that came in as P2 might get promoted to P1 if the deal context reveals high risk. A P1 signal might be demoted to P3 if the account turns out to be a past customer with no active evaluation.

Layer 4: The action decision engine

This is the AI revenue execution core. The decision engine takes the enriched signal and determines: what should happen, in what order, with what priority.

We run a two-pass approach:

Pass 1: Rules engine: Fast, deterministic, zero-LLM. Rule examples:

If deal has been idle > 14 days AND close date < 21 days → flag as at-risk, notify manager
If pricing page visit AND active deal AND last contact > 5 days → trigger immediate follow-up
If competitor mentioned in call AND no "competitive positioning" task exists → create task

Rules handle ~60% of cases. They are fast, predictable and auditable.

Pass 2: LLM reasoning (for complex cases):

When rules do not produce clear action plan, ambiguous deal stage, multiple conflicting signals, multi-stakeholder complexity, we pass to an LLM with the full enriched context:

DECISION_PROMPT = """
You are a senior revenue strategist reviewing a live deal signal.

SIGNAL:
{signal_summary}

DEAL CONTEXT:
{deal_context}

ACCOUNT HISTORY:
{account_summary}

Based on this, determine:
1. Is immediate action required? (yes/no + reason)
2. What is the single most important action to take right now?
3. What is the risk level to this deal if no action is taken in 24 hours?
4. Are there any other stakeholders who should be looped in?

Be specific. Reference the actual signal data. Do not give generic advice.
Output as JSON matching the ActionPlan schema.
"""

The LLM output feeds the action queue as structured tasks, not freeform text.

Layer 5: Execution and feedback

Actions execute against the GTM stack via the same adapter pattern described in previous Blog CRM writes, Slack alerts, email drafts, sequence enrollments. Each execution logs an outcome event that feeds back into our signal scoring model.

If a P1 signal action consistently leads to deal progression, that signal type gets weighted higher. If a certain action type (e.g., manager alert for competitive mention) rarely results in meaningful activity, we adjust the rule. The system gets incrementally better without requiring manual model retraining.

The 15-minute number

P1 signal detected → action queued for rep review: average 4.2 minutes.
P1 signal detected → action executed (including auto-executes): average 11.8 minutes.
P2 signals: average 9.1 minutes to queue, 14.4 minutes to execution.

We track this in real time. When latency creeps above SLA, we get paged. The 15-minute number is a commitment, not an aspiration.

Why this architecture holds up

The design principles that made this work at scale:

Normalize at the boundary: Do not let source-specific formats leak into core processing.
Enrich before deciding: A signal without context is noise. Context before the LLM, not after.
Rules first, LLM second: Fast deterministic paths for common cases. Reserve expensive reasoning for genuinely complex ones.
Measure latency as a first-class metric: If you do not instrument the gap, you will not close it.

Revenue does not wait. The architecture should not either.

How We Use LLM Agents + CRM APIs to Auto-Generate Contextual Follow-Up Emails

SpurIQ Engineering — Thu, 30 Apr 2026 13:44:48 +0000

The follow-up email problem is not a writing problem

Everyone frames the sales follow-up email as a writing quality problem. Better copy, better subject lines, better personalization. A/B test the CTA. Shorten the paragraphs.

That is the wrong frame.

The actual problem is context. A rep finishing their sixth call of the day does not write a bad follow-up because they lack writing skill. They write a bad one, or write nothing at all, because they do not have the bandwidth to assemble the right context, synthesize what the prospect actually said and translate it into a message that sounds like it came from a person who was paying full attention.

What we built for LeadIQ and DealIQ is not an email template engine. It is a context assembly and generation pipeline that makes the follow-up email the least effortful and most accurate part of the post-call workflow. This post is the actual implementation.

System overview

Here is what the pipeline does, end to end:

Call ends
    ↓
Transcript processed by call intelligence platform
    ↓
Webhook fires → ingestion service
    ↓
Context assembly: CRM history + transcript + contact record
    ↓
LLM extraction: structured call summary
    ↓
LLM generation: draft follow-up email
    ↓
Validation + tone scoring
    ↓
Pending queue → rep reviews → sends (or auto-sends after TTL)

Let's go layer by layer.

Step 1: Ingestion and webhook handling

We subscribe to call completion webhooks from our call intelligence integration (we support Gong, Fireflies and Chorus). The webhook payload includes:

json{
  "call_id": "call_abc123",
  "crm_deal_id": "deal_xyz789",
  "participants": ["rep@company.com", "prospect@customer.com"],
  "transcript_url": "https://...",
  "duration_seconds": 2847,
  "completed_at": "2025-04-28T14:32:00Z"
}

Our ingestion service validates the payload, confirms the transcript is ready and pushes a job to the processing queue:

python
@app.route("/webhooks/call-complete", methods=["POST"])
def handle_call_complete():
    payload = request.json

    if not validate_webhook_signature(request, payload):
        return jsonify({"error": "Invalid signature"}), 401

    job = {
        "call_id": payload["call_id"],
        "deal_id": payload["crm_deal_id"],
        "transcript_url": payload["transcript_url"],
        "participants": payload["participants"],
        "queued_at": datetime.utcnow().isoformat()
    }

    queue.push("post_call_pipeline", job, priority="high")
    return jsonify({"status": "queued"}), 200

Step 2: Context assembly

This is the most important step and the one most people skip when building naive follow-up tools. Generating a good follow-up email requires more than the transcript. It requires:

Deal history: What stage is this deal at? What was discussed in previous calls?
Contact record: What is this person's role, seniority and communication history?
Open tasks and commitments: What did the rep already promise to send?
Previous email threads: What has already been said in writing?

We pull all of this from the CRM before touching the LLM:

python
class ContextAssembler:
    def __init__(self, crm_adapter, call_store):
        self.crm = crm_adapter
        self.calls = call_store

    def assemble(self, deal_id: str, call_id: str) -> DealContext:
        deal = self.crm.get_deal(deal_id)
        contact = self.crm.get_primary_contact(deal_id)

        # Get all previous calls for this deal, compressed if > 3
        call_history = self.calls.get_by_deal(deal_id, limit=10)
        compressed_history = self._compress_history(call_history, current_call_id=call_id)

        # Get open tasks and rep commitments from CRM
        open_tasks = self.crm.get_open_tasks(deal_id)

        # Get last 5 email threads
        email_threads = self.crm.get_email_history(deal_id, limit=5)

        return DealContext(
            deal=deal,
            contact=contact,
            call_history=compressed_history,
            open_tasks=open_tasks,
            email_threads=email_threads
        )

    def _compress_history(self, calls, current_call_id):
        recent = [c for c in calls[-3:] if c.id != current_call_id]
        older = [c for c in calls[:-3] if c.id != current_call_id]

        # Recent calls: return full summaries
        # Older calls: compress to key facts only
        compressed_older = [self._extract_key_facts(c) for c in older]

        return recent + compressed_older

The compression logic for older calls uses a lightweight LLM call with a strict output schema, we are not asking it to rewrite, just to extract: stakeholders, key pain points, commitments made, outcomes. This keeps older context within token budget without losing deal-relevant signal.

Step 3: LLM extraction: Structured call summary

Before generating the email, we run a structured extraction pass on the current call transcript. This is a separate LLM call with a strict JSON schema output:

python
EXTRACTION_PROMPT = """
You are analyzing a B2B sales call transcript. Extract the following information precisely.
Return ONLY valid JSON matching the schema. Do not add commentary.

Schema:
{
  "discussed_pain_points": ["string"],
  "rep_commitments": ["string - specific, actionable"],
  "prospect_commitments": ["string - specific, actionable"],
  "objections_raised": ["string"],
  "buying_signals": ["string"],
  "risk_signals": ["string"],
  "agreed_next_step": "string or null",
  "agreed_timeline": "string or null",
  "call_sentiment": "positive|neutral|cautious|negative"
}

Transcript:
{transcript}
"""

def extract_call_structure(transcript: str, model_client) -> CallStructure:
    response = model_client.complete(
        prompt=EXTRACTION_PROMPT.format(transcript=transcript),
        max_tokens=800,
        temperature=0.1  # Low temp for extraction, we want consistency
    )

    try:
        data = json.loads(response.text)
        return CallStructure(**data)
    except (json.JSONDecodeError, ValidationError) as e:
        # Log extraction failure, flag for human review
        raise ExtractionFailure(call_id=call_id, error=str(e))

We run validation on every extraction output. Missing agreed_next_step when the transcript clearly has one is a failure mode we caught early. We added a post-extraction validation pass that re-runs the extraction with a corrective prompt if confidence scores fall below threshold.

Step 4: Email generation: The actual interesting part

Now we have everything we need. The generation prompt is where the craft lives:

python
FOLLOWUP_PROMPT = """
You are drafting a follow-up email from {rep_name} to {prospect_name} at {company}.

Write as {rep_name}. Use first person. Sound like a real person who was paying full attention on the call, not a template, not a CRM auto-response.

CALL CONTEXT:
- Pain points discussed: {pain_points}
- What {rep_name} committed to: {rep_commitments}
- What {prospect_name} committed to: {prospect_commitments}
- Agreed next step: {next_step}
- Agreed timeline: {timeline}
- Objections raised: {objections}
- Call sentiment: {sentiment}

DEAL HISTORY SUMMARY:
{compressed_history}

OPEN TASKS (from CRM):
{open_tasks}

PREVIOUS EMAIL TONE (last thread):
{last_email_sample}

INSTRUCTIONS:
- Open with a genuine reference to something specific from today's call, not generic
- Address any commitments made by the rep directly
- Reference the agreed next step clearly and confirm the timeline
- If objections were raised, acknowledge one naturally, do not ignore them
- Do not use: 'Hope this finds you well', 'Per our conversation', 'As discussed', 'Circling back'
- Keep it under 200 words
- End with a single, specific call to action

Output format:
Subject: [subject line]
Body: [email body]
"""

The instruction layer is doing a lot of work here. Specifically: the ban on filler phrases, the word limit and the requirement to reference something specific from the call. These three constraints eliminate 90% of the "this sounds like an AI wrote it" problem.

We run the generation at temperature 0.7, high enough for natural variation, low enough that the factual content stays grounded.

Step 5: Validation and tone scoring

Generated email goes through two checks before it reaches the rep:
Factual grounding check: We run a lightweight verification pass, does the email reference anything that was not in the extraction output? If the model hallucinated a commitment the rep did not make, we catch it here.

python
def validate_factual_grounding(email: str, call_structure: CallStructure) -> ValidationResult:
    claims = extract_claims_from_email(email)

    for claim in claims:
        if not is_grounded_in_context(claim, call_structure):
            return ValidationResult(
                passed=False,
                issue=f"Ungrounded claim detected: {claim}"
            )

    return ValidationResult(passed=True)

Tone scoring: We score the email on three dimensions, formality match (does this match how the rep usually writes?), specificity (does it reference actual call content?) and CTA clarity (is there exactly one ask?). Emails that score low on any dimension go back for regeneration with corrective instructions.

Step 6: Pending queue and rep approval flow

Early versions auto-sent. Reps hated it.
Not because the emails were bad. Because the emails were their emails, going out under their names and they had no visibility into what was being sent.
We moved to a pending queue model:

python
class PendingAction:
    action_type: str  # "send_email", "update_crm", "create_task"
    deal_id: str
    payload: dict
    created_at: datetime
    auto_execute_at: datetime  # 4 hours after creation by default
    status: str  # "pending", "approved", "rejected", "auto_executed"

Reps get a notification: "3 actions ready for your review on Deal X." They can approve, edit, or reject each one. Anything not touched in 4 hours auto-executes (configurable per rep). This is the AI revenue orchestration model that actually works in practice, the system owns follow-through, the rep owns override.

Adoption went from 31% to 84% when we made this change.

The numbers that matter

Across deals where the pipeline ran vs. deals where it did not (uneven coverage during rollout gave us a natural comparison group):

Average time from call end to follow-up email sent: 94 minutes → 11 minutes

Follow-up emails with specific call references: 22% → 96%
Rep time spent on post-call admin per week: ~4.5 hours → ~40 minutes

The follow-ups are not just faster. They are more accurate, more specific and more likely to move the deal forward, because they are built from what was actually said, not from what the rep remembered an hour later.

Building an AI Agent That Owns Post-Call Execution: Architecture Decisions

SpurIQ Engineering — Wed, 29 Apr 2026 19:43:19 +0000

The call ended. Now what?

Here is something product teams rarely talk about: the most expensive moment in a B2B sales cycle is not the failed cold email or the missed demo request. It is the 48 hours after a good call where nothing happens.

The rep hangs up. They have three more calls. The notes are rough. The CRM still says "Meeting held." The follow-up email goes out two days later, generic, rushed, missing the context that made the call valuable in the first place.

We built SpurIQ to fix that specific failure. Not by reminding reps to follow up. By building an AI agent that actually owns post-call execution, the moment the call ends to the moment the next committed step is confirmed.

This post walks through the architecture decisions we made, why we made them and what we learned building it.

What "owning execution" actually means in engineering terms

Before we get into the stack, it is worth being precise about what we mean by execution ownership, because it is easy to confuse this with automation.

Automation sends a follow-up. Execution ownership decides:

What the follow-up should say based on what was actually discussed
When it should go based on what was committed in the call
Who it should come from based on deal structure
What happens next if there is no response in a defined window

The difference is consequential. Automation is a trigger-action loop. Execution ownership is a decision-making agent that holds state across the lifecycle of a deal.

That distinction shaped every architecture decision we made.

The core architecture: Event-driven, not scheduled

Our first instinct was a cron-based system. Poll the CRM every 30 minutes, check for calls that ended, kick off a follow-up pipeline. Simple to build, easy to reason about.

We scrapped it for two reasons.

First: 30-minute polling means a 30-minute average lag from call end to action. In B2B sales, that is an eternity. A prospect who just had a strong 45-minute discovery call and then hears nothing for two hours has already started doubting the purchase.

Second: CRM polling is noisy. Every CRM mutation you do not care about (field edits, internal notes, task updates) fires your pipeline. You spend more time filtering events than processing them.

We moved to a fully event-driven architecture using webhook subscriptions from the CRM layer and call intelligence platforms. The moment a call recording is marked as processed, an event fires. Our orchestrator picks it up immediately.

[Call ends]
     ↓
[Call intelligence platform marks recording complete]
     ↓
[Webhook fires → Event queue (SQS)]
     ↓
[Orchestrator picks up event]
     ↓
[Post-call execution agent initializes]

Average latency from call end to agent initialization: under 90 seconds.

The agent architecture: Three layers, not one

We tried a single large agent first. One LLM prompt, full context, single output. It was brittle. The model could not reliably separate summarization from decision-making from drafting and when one step failed, the whole output failed.

We landed on a three-layer agent architecture:

Layer 1: Extraction Agent

This agent receives the raw call transcript and pulls structured information:

Stakeholders identified on the call
Pain points surfaced (verbatim, not paraphrased)
Objections raised
Next steps committed to by the rep
Next steps committed to by the prospect
Timeline signals ("we need this by Q3", "our budget cycle closes in April")
Risk signals ("we're also talking to two other vendors")

This is a focused extraction task. The prompt is tight, the output schema is strict JSON and we validate every field before passing downstream.

python
extraction_schema = {
  "stakeholders": List[str],
  "pain_points": List[str],
  "objections": List[str],
  "rep_commitments": List[str],
  "prospect_commitments": List[str],
  "timeline_signals": List[str],
  "risk_signals": List[str]
}

Why separate this from the drafting layer? Because extraction errors compound. If the extraction agent misses a key objection, every downstream agent working from that output will produce work that ignores it. Isolating extraction lets us validate before we spend tokens drafting.

Layer 2: Decision Agent

The decision agent takes the extraction output plus CRM context (deal stage, days since last activity, contact history, existing open tasks) and makes three decisions:

What actions are required? (Follow-up email, internal Slack summary, CRM field updates, task creation, risk flag)

What is the priority order? (What needs to happen in the next 2 hours vs. next 48 hours)

Are there any escalation triggers? (Deal has gone dark before, timeline is tight, multiple competitors mentioned)

This is where the revenue execution in B2B sales intelligence lives. The decision agent is not just routing tasks, it is applying deal context to determine what execution actually looks like for this specific call at this specific stage.

Layer 3: Execution Agent

The execution agent receives the action plan and executes each item:

Drafts the follow-up email with full call context embedded
Writes the internal deal summary for Slack or CRM
Creates CRM tasks with specific next-step language (not "follow up", "Send security review docs before Friday per Priya's request")
Updates deal fields (stage, next step, close date confidence)
Flags risk in the pipeline view if escalation triggers fired

Context management: The hard part

Here is the thing nobody warns you about when building deal-aware agents: context gets expensive fast.

A single deal might have 12 call transcripts, 40 email threads, 6 months of CRM history and notes from three different reps. You cannot feed all of that into every agent call. But you also cannot ignore it, the agent making decisions about call number 12 needs to know what was said in calls 4, 7 and 9.

We built a context assembly layer that sits between the CRM/call store and the agent layer. Before any agent call, the context assembler:

Retrieves the full deal history
Scores each piece of context for relevance to the current call (using a smaller, cheaper model)
Assembles a compressed context window: recent calls in full, older calls summarized to key facts
Appends CRM state, open tasks and stakeholder map

The compressed context fits within token limits without losing decision-relevant history. The key design principle: recency gets full fidelity, history gets structured compression.

The CRM write problem

Writing back to CRM was harder than reading from it. Every CRM has its own field structure, validation logic and update rate limits. Salesforce behaves differently from HubSpot. HubSpot behaves differently from Pipedrive.

We built a CRM adapter layer that abstracts this:

python
class CRMAdapter:
    def update_deal(self, deal_id, fields: dict) -> UpdateResult
    def create_task(self, deal_id, task: Task) -> TaskResult
    def log_activity(self, deal_id, activity: Activity) -> ActivityResult
    def flag_risk(self, deal_id, risk: RiskFlag) -> FlagResult

Each CRM has its own implementation of this interface. The AI revenue orchestration layer never talks to a CRM directly, it only talks to the adapter. This was the right call. When Salesforce deprecated an API endpoint last year, we fixed one adapter, not twelve agent workflows.

What we got wrong the first time

We let the agent write directly to CRM without a review buffer: It was fast, but reps hated it. They felt the system was changing their deals without them. We added a "pending actions" queue, agents queue their writes, reps get a one-tap approval UI and unreviewed items auto-execute after 4 hours. Adoption tripled.

We over-indexed on transcript quality: Our extraction agent fell apart when call audio was poor or transcripts had errors. We added a confidence scoring layer, if extraction confidence falls below threshold, the agent flags for human review rather than proceeding with bad data.

We treated all deals the same: A $2,000 SMB deal and a $200,000 enterprise deal should not have the same execution protocol. We added deal-tier routing, larger deals get more conservative execution with more human checkpoints, smaller deals get more aggressive automation.

The outcome worth mentioning

When we look at deals where the agent executed vs. deals where it did not (early days, inconsistent coverage), the difference in 30-day progression rates is significant enough that we do not run the comparison casually in meetings anymore. It makes the case too loudly.

The point is not that AI replaces reps. The point is that execution should not depend on rep memory, bandwidth, or consistency. Post-call execution is a system design problem. We just built the system.

Curious about how we handle the follow-up drafting at scale with real CRM context? That's the next post.