DEV Community

Zackrag
Zackrag

Posted on

The LLM Outbound Personalization Stack: Actual Prompt Templates, Data Schema, and Fallback Rules

Three months ago I ran a test. Took 500 LinkedIn profiles we'd already enriched in Clay and ran three personalization approaches through the same Smartlead sequence: (1) Lyne.ai-generated openers, (2) our hand-tuned Claude prompt, and (3) a no-personalization control. The reply rates came back at 4.1%, 11.3%, and 2.8% respectively.

The gap wasn't luck. It came down to one thing: our prompt knew exactly what to do when the enrichment data was thin.

Almost every guide covering AI email personalization describes the outcome ("write a custom first line per prospect") and names the tools (Clay, Instantly, Autobound). Nobody publishes the actual prompt. Nobody shows the data schema. Nobody explains what fires when your enrichment returns a blank job title and a company URL that 404s.

That's what this post does.


What the Enrichment Schema Actually Looks Like

Before the prompt matters, the data structure matters more. I standardize every prospect into this JSON schema before any LLM call:

{
  "prospect": {
    "full_name": "Jane Doe",
    "first_name": "Jane",
    "title": "VP of Revenue Operations",
    "seniority": "vp",
    "linkedin_url": "https://linkedin.com/in/...",
    "recent_post_excerpt": "We just rolled out a new sales process...",
    "tenure_months": 8
  },
  "company": {
    "name": "Acme Corp",
    "domain": "acmecorp.com",
    "employee_count": 210,
    "industry": "B2B SaaS",
    "funding_stage": "Series B",
    "recent_news_headline": "Acme raises $18M to expand into EMEA",
    "tech_stack": ["Salesforce", "Outreach", "Gong"],
    "hiring_signal": "3 open SDR roles on LinkedIn"
  },
  "meta": {
    "signal_tier": 1,
    "data_freshness_days": 4,
    "fallback_reason": null
  }
}
Enter fullscreen mode Exit fullscreen mode

I assemble this in Clay using a combination of LinkedIn enrichment, People Data Labs for firmographics, and custom HTTP columns that scrape news headlines. The signal_tier field in meta is the most important — it determines which prompt branch fires downstream.


Signal Priority Order: What Fires When

Not all signals are equal. A prospect's LinkedIn post from last week is worth 10x more than a company description scraped from Crunchbase six months ago. I rank signals in four tiers, evaluated in order:

Tier 1 — Use if present (highest specificity)

  • Prospect's own LinkedIn post in the last 30 days
  • Announced company news in the last 14 days (funding round, product launch, exec hire)
  • Recent job change (tenure under 6 months)

Tier 2 — Use if no Tier 1 signal

  • Technology stack combined with a known pain point for that stack
  • Open hiring roles that imply a team-building or scaling phase
  • Funding stage announced within 90 days

Tier 3 — Use if no Tier 1/2 signal

  • Industry + company size + a generic ICP match
  • Prospect's title + the typical functional pain for that role

Tier 4 — Sparse data fallback

  • Company domain only, or data older than 30 days — triggers a safe fallback prompt

This tiering lives in Clay as a formula column that cascades through each condition. About 58% of my lists hit Tier 1 or 2. Another 30% land in Tier 3. The remaining 12% fall through to Tier 4 and get flagged for manual review before they reach Smartlead.


The Prompt Template I Actually Ship to Claude

Here's the production prompt I send to the Claude API for Tier 1 and Tier 2 signals, lightly anonymized:

You are writing a single cold email opening line for a B2B sales rep.

PROSPECT DATA:
Name: {{first_name}}
Title: {{title}}
Company: {{company_name}} ({{employee_count}} employees, {{industry}})
Signal (anchor the line to this specific fact): {{best_signal}}

RULES:
- Maximum 22 words
- First person is FORBIDDEN ("I noticed", "I saw", "I was looking at")
- Do not start with the prospect's name
- Do not use: congratulations, impressive, love your work, noticed, curious, hope 
  this finds you, reach out, quick question, game-changer, excited to, would love to,
  seamless, unlock, leverage, revolutionary, best-in-class
- Reference the signal specifically — not generically ("expanding into EMEA"
  not "growing internationally")
- Do not make claims about the prospect's feelings or intent
- Output ONLY the opening line. No subject line. No explanation.

VOICE CALIBRATION:
Match this tone from a closed-won email:
"Scaling the SDR team while Salesforce is still the system of record is a specific kind of pain."

EXAMPLES OF BAD OUTPUT:
- "Congratulations on the Series B funding, Jane!"
- "I noticed your company is hiring SDRs."
- "Would love to connect about your outbound strategy."

EXAMPLES OF GOOD OUTPUT:
- "Taking $18M into EMEA with a Salesforce-native stack is a specific sequencing problem."
- "Eight months into a VP role usually means the inherited process is already showing its cracks."
Enter fullscreen mode Exit fullscreen mode

The {{best_signal}} variable is populated by a Clay formula column that selects whichever Tier 1 or Tier 2 signal is present. If multiple Tier 1 signals exist, the LinkedIn post wins — it's the most direct evidence of what the prospect is currently thinking about.

Cost at Claude API rates: roughly $0.003 per row using claude-sonnet-4-6 with a 300-token output cap. For a 1,000-row list that's $3 in LLM spend, plus whatever Clay charges for enrichment credits.


Forbidden Phrases and Why They Destroy Deliverability

The bad-output examples above aren't random. Each one fails for a documented reason:

"Congratulations on..." — Every AI personalization tool defaults to this. Lyne.ai and Autobound both produce it at high frequency. Spam filters have trained on it. More importantly, it positions you as a spectator, not a peer who understands the problem.

"I noticed..." — Appears in an estimated 60–70% of AI-generated cold email openers in circulation. It's a dead signal to any experienced buyer. It also implies surveillance in a way that feels off.

"Would love to connect" — Ends the opener without giving the reader a reason to continue. The first line's job is to earn the second line.

"Quick question" — It is never a quick question. Every reader knows this.

I maintain a forbidden-phrase JSON array that appends to every prompt as a system instruction. It's now 34 phrases long. I add to it whenever I audit a low-performing sequence and find a pattern. Reviewing sequences monthly and updating this list is the single highest-leverage prompt maintenance task.


Voice Calibration from Closed-Won Emails

The VOICE CALIBRATION block in the prompt matters more than it looks. I went through the last 40 closed-won deals and pulled the first email that got a substantive reply. The pattern that emerged: they all assumed a shared problem rather than asking permission to discuss one.

Weak: "Are you struggling with outbound at scale?"

Strong: "Outbound at scale with a two-person RevOps team is basically inventory management with no inventory system."

The calibration sentence gives the model a concrete tone target to mimic. Without it, the output tends to be technically correct but slightly formal — it reads like someone summarizing a LinkedIn post rather than a practitioner who has lived the problem. Two or three voice calibration sentences, drawn from your own best emails, close this gap more reliably than any prompt engineering trick I've tried.


What Happens When Enrichment Returns Garbage

This is the section every published guide skips entirely.

Fallback condition: signal_tier == 4 OR data_freshness_days > 30 OR company domain returns a 4xx/5xx

Fallback prompt:

Write a cold email opening line (max 20 words) for a {{title}} at a
{{employee_count}}-person {{industry}} company.
Use the persona's functional pain, not a specific company event.
Forbidden: I noticed, congratulations, hope this finds you,
love your work, quick question.
Output only the line.
Enter fullscreen mode Exit fullscreen mode

This produces a generic-but-accurate opener rather than a hallucinated claim. A fabricated signal ("I saw your recent post about scaling your SDR team" when no such post exists) is worse than no personalization — it gets a "please remove me" reply at best and a spam report at worst. Generic with no false claims outperforms confident with wrong facts every time.

Tier 4 rows also get a flag in Clay before they enter any Smartlead or Instantly campaign. A human reviews them in batches of 20–30 before launch.


Dedicated Tools vs. DIY Stack

Tool Signal Sources Prompt Control Fallback Handling Cost / 1k rows Verdict
Lyne.ai LinkedIn, website None (black box) Not documented ~$29 Fast, zero control
Autobound 700+ signal types Limited via UI Partial ~$45–80 Strong signals, opaque prompts
Clay + Claude API 50+ via Clay waterfall Full control You build it ~$8–15 Most flexible, highest maintenance
Apollo built-in AI Apollo data only Minimal None Bundled Weakest personalization layer
Instantly Spintax AI None (template spin) Template-level None Bundled Not genuine personalization
RocketReach Contact data only None None Bundled Data layer, not a copywriter

The Clay + Claude API path costs roughly 75–80% less per row than dedicated tools and gives you full ownership of the prompt. The tradeoff: you own the maintenance. Forbidden-phrase lists go stale. Prompt drift happens when you expand into new ICP verticals. I do a prompt audit every three weeks — read 50 generated lines, flag anything that sounds off, update the calibration block or forbidden-phrase list.


What I Actually Use

For most sequences: Clay enrichment feeding a Claude API HTTP column with the tiered prompt above, then pushed to Smartlead or Instantly depending on mailbox infrastructure. Lyne.ai stays in the toolkit for quick agency jobs where I don't have time to build a custom enrichment schema — the output quality is adequate and the turnaround is fast.

For social-profile enrichment specifically — situations where the starting point is a Twitter/X handle or a Facebook profile rather than a work email — Ziwa has been faster for me than People Data Labs's direct API for assembling the signal layer. Autobound is worth testing if you want a fully managed signal layer and prefer not to maintain the Clay waterfall yourself; their 700-signal breadth is genuinely hard to replicate DIY.

The highest-leverage thing you can do today isn't switching tools. It's spending two hours pulling your last 20 positive email replies and identifying the exact sentence structure that made someone respond. That pattern becomes your voice calibration block. The rest is just plumbing.

Top comments (0)