Architecture Overview
This n8n workflow automates backlink outreach by chaining together Google Sheets data retrieval, Apify web scraping, and OpenAI's GPT-4 for email generation. Here's the data flow:
[Google Sheets] → [URL Cleaning] → [Apify Scraper] → [Email Validation] → [GPT-4 Generation] → [Gmail Send]
↓ ↓ ↓ ↓ ↓ ↓
Backlink Normalize Extract Contact Check if Found Generate Email Deliver
List URLs Emails Emails Exist with AI Message
Why this architecture? Sequential processing prevents API rate limits and ensures each backlink gets proper attention. Batch size of 1 means if email scraping fails for one URL, the workflow continues with the next. The IF conditional acts as a quality gate—no emails found means skip to next backlink.
Alternative considered: Parallel processing would be faster but risks overwhelming Apify's scraper and makes debugging failures harder. For outreach campaigns, reliability > speed.
API Integration Deep-Dive
Google Sheets API
Authentication: OAuth 2.0 credential in n8n. Grant read access to your backlink tracking spreadsheet.
Request structure:
// n8n handles this internally, but equivalent REST call:
GET https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}/values/{range}
Headers: { "Authorization": "Bearer {token}" }
Response format:
{
"range": "Sheet1!A1:B100",
"values": [
["URL Source", "Target URL"],
["https://example.com/article", "https://yoursite.com/page"]
]
}
n8n configuration:
- Resource: Sheet Within Document
- Operation: Get Row(s)
- Document: Select from list
- Sheet: Select target sheet
- Returns: Array of objects with column headers as keys
Rate limits: 100 requests per 100 seconds per user. Workflow runs once per execution, so no concern.
Edge cases: Empty rows return null values—handle in downstream nodes. If spreadsheet structure changes, workflow breaks (column name dependencies).
Apify Website Emails Scraper API
Authentication: API key passed in header.
Request example:
POST https://api.apify.com/v2/acts/maximedupre~website-emails-scraper/runs
Headers: { "Authorization": "Bearer apify_api_xxx" }
Body: {
"startUrls": [{"url": "https://example.com"}],
"maxRequestsPerCrawl": 10,
"maxCrawlingDepth": 2
}
Response structure:
{
"data": {
"items": [
{
"url": "https://example.com",
"email": "contact@example.com",
"foundOnPage": "https://example.com/contact"
}
]
}
}
n8n configuration:
- Resource: Actor
- Operation: Run an Actor and Get Dataset
- Actor: Website Emails Scraper
- Input JSON:
{{ $json }}(passes cleaned URL array) - Memory: 1024 MB (balance cost/performance)
Rate limits: Based on compute units, not requests. 1 GB memory = ~0.1 compute units per minute. Free tier: 5 compute units/month.
Cost optimization: Reduce maxCrawlingDepth to 1 if emails are typically on homepage/contact page. Costs ~$0.02-0.05 per website scraped.
Common failures:
- No emails found: Some sites hide contact info or use contact forms only
- JavaScript-heavy sites: May need increased memory allocation
- Cloudflare protection: Scraper may be blocked
Debugging: Check Apify run logs in dashboard. Error "ACTOR_RUN_FAILED" means insufficient memory or timeout.
OpenAI GPT-4 API
Authentication: Bearer token (API key from OpenAI dashboard).
Request format:
POST https://api.openai.com/v1/chat/completions
Headers: {
"Authorization": "Bearer sk-xxx",
"Content-Type": "application/json"
}
Body: {
"model": "gpt-4-1106-preview",
"messages": [
{"role": "system", "content": "You are an expert email outreach specialist..."},
{"role": "user", "content": "Generate email for backlink from example.com to mysite.com"}
],
"response_format": { "type": "json_object" }
}
Response structure:
{
"id": "chatcmpl-xxx",
"choices": [{
"message": {
"role": "assistant",
"content": "{\"selected_email\":\"editor@example.com\",\"subject\":\"Quick request...\",\"html_body\":\"<p>Hi,</p>...\"}"
}
}],
"usage": {
"prompt_tokens": 450,
"completion_tokens": 200,
"total_tokens": 650
}
}
n8n configuration:
- Model: gpt-4-1106-preview (or gpt-4-1-mini for cost savings)
- Use Responses API: Enabled
- Output Parser: Structured Output with JSON schema
- System Message: Detailed prompt with email generation guidelines
Rate limits (Tier 1):
- 500 requests per day
- 10,000 tokens per minute
- For outreach: ~650 tokens/email = ~15 emails/minute max
Cost: GPT-4-1-mini: $0.15 per 1M input tokens, $0.60 per 1M output. Typical email: $0.01-0.02.
Edge cases:
- AI refuses to generate: If scraped data looks like spam/sensitive info
- Invalid JSON: Auto-fix in n8n or add validation prompt
- Generic output: Improve system prompt with more context
Prompt engineering tips:
System message should include:
- Specific role/expertise
- Output format requirements (JSON structure)
- Tone guidelines (professional but friendly)
- What to include (backlink context, value proposition)
- What to avoid (aggressive sales language, false urgency)
Gmail API
Authentication: OAuth 2.0 credential. Requires enabling Gmail API in Google Cloud Console.
Request format:
POST https://gmail.googleapis.com/gmail/v1/users/me/messages/send
Headers: { "Authorization": "Bearer {token}" }
Body: {
"raw": "{base64_encoded_email}"
}
n8n configuration:
- Resource: Message
- Operation: Send
- To:
{{ $json.output.selected_email }} - Subject:
{{ $json.output.subject }} - Email Type: HTML
- Message:
{{ $json.output.html_body }}
Rate limits: 100-500 emails per day (varies by account age/reputation). For outreach, stay under 50/day to avoid spam flags.
Deliverability tips:
- Warm up new accounts (start with 5-10 emails/day)
- Set up SPF/DKIM records
- Use professional signature
- Avoid spam trigger words
Implementation Gotchas
1. Handling Missing Data
When Apify finds no emails, the IF node prevents downstream errors:
// Condition in IF node
{{ $json.email }} exists
Without this check, the AI Agent receives undefined data and fails.
2. URL Normalization
The Function node standardizes URLs because Apify expects clean input:
const urls = $input.first().json["URL Source"].split(",");
return urls.map(url => ({
url: url.trim().startsWith("http") ? url.trim() : `https://${url.trim()}`
}));
Handles: missing protocols, extra spaces, comma-separated lists.
3. API Cost Management
- Apify: ~$0.03 per site scraped
- OpenAI: ~$0.015 per email generated
- Gmail: Free
- Total per backlink: ~$0.045
For 100 backlinks: ~$4.50 in API costs. Budget accordingly.
4. Gmail Sending Limits
Workflow processes sequentially but Gmail has daily quotas. For large lists:
// Add Wait node after Gmail Send
// Wait: 2 minutes between emails
// Spreads 100 emails over ~3.3 hours
5. Error Recovery
If workflow fails mid-execution:
- n8n preserves state at last successful node
- Re-run continues from failure point
- Track sent emails in Google Sheets to prevent duplicates
Prerequisites
Required Accounts:
- n8n Cloud or self-hosted instance
- Google account with Sheets API enabled
- Apify account (free tier: 5 compute units/month)
- OpenAI account with API key and billing
- Gmail account for sending (use dedicated outreach email)
API Credentials Needed:
- Google OAuth 2.0 (Sheets + Gmail)
- Apify API key (Settings → Integrations)
- OpenAI API key (API Keys section)
Estimated Costs:
- n8n Cloud: $20/month (Starter plan) or self-hosted (free)
- Apify: $0.03/backlink or free tier
- OpenAI: $0.015/email
- Gmail: Free
Data Preparation:
Google Sheet structure:
| URL Source | Target URL | Status | Date Contacted |
|-----------------------------|-------------------------------|-------------|----------------|
| https://example.com/article | https://yoursite.com/guide | Pending | |
Documentation Links:
Get the Complete Workflow Configuration
This tutorial covers the API integration architecture and implementation details. For the complete n8n workflow JSON file, Google Sheets template, AI prompt configuration, and video walkthrough showing the workflow in action, check out the full implementation guide.
Top comments (0)