Hackceleration

Posted on Mar 10 • Originally published at hackceleration.com

Building an AI-Powered Backlink Outreach System with n8n, Apify, and OpenAI

#n8n #api #automation #tutorial

Architecture Overview

This n8n workflow automates backlink outreach by chaining together Google Sheets data retrieval, Apify web scraping, and OpenAI's GPT-4 for email generation. Here's the data flow:

[Google Sheets] → [URL Cleaning] → [Apify Scraper] → [Email Validation] → [GPT-4 Generation] → [Gmail Send]
       ↓              ↓                   ↓                  ↓                    ↓                ↓
   Backlink      Normalize        Extract Contact      Check if Found      Generate Email    Deliver
   List          URLs             Emails               Emails Exist        with AI           Message

Why this architecture? Sequential processing prevents API rate limits and ensures each backlink gets proper attention. Batch size of 1 means if email scraping fails for one URL, the workflow continues with the next. The IF conditional acts as a quality gate—no emails found means skip to next backlink.

Alternative considered: Parallel processing would be faster but risks overwhelming Apify's scraper and makes debugging failures harder. For outreach campaigns, reliability > speed.

API Integration Deep-Dive

Google Sheets API

Authentication: OAuth 2.0 credential in n8n. Grant read access to your backlink tracking spreadsheet.

Request structure:

// n8n handles this internally, but equivalent REST call:
GET https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}/values/{range}
Headers: { "Authorization": "Bearer {token}" }

Response format:

{
  "range": "Sheet1!A1:B100",
  "values": [
    ["URL Source", "Target URL"],
    ["https://example.com/article", "https://yoursite.com/page"]
  ]
}

n8n configuration:

Resource: Sheet Within Document
Operation: Get Row(s)
Document: Select from list
Sheet: Select target sheet
Returns: Array of objects with column headers as keys

Rate limits: 100 requests per 100 seconds per user. Workflow runs once per execution, so no concern.

Edge cases: Empty rows return null values—handle in downstream nodes. If spreadsheet structure changes, workflow breaks (column name dependencies).

Apify Website Emails Scraper API

Authentication: API key passed in header.

Request example:

POST https://api.apify.com/v2/acts/maximedupre~website-emails-scraper/runs
Headers: { "Authorization": "Bearer apify_api_xxx" }
Body: {
  "startUrls": [{"url": "https://example.com"}],
  "maxRequestsPerCrawl": 10,
  "maxCrawlingDepth": 2
}

Response structure:

{
  "data": {
    "items": [
      {
        "url": "https://example.com",
        "email": "contact@example.com",
        "foundOnPage": "https://example.com/contact"
      }
    ]
  }
}

n8n configuration:

Resource: Actor
Operation: Run an Actor and Get Dataset
Actor: Website Emails Scraper
Input JSON: {{ $json }} (passes cleaned URL array)
Memory: 1024 MB (balance cost/performance)

Rate limits: Based on compute units, not requests. 1 GB memory = ~0.1 compute units per minute. Free tier: 5 compute units/month.

Cost optimization: Reduce maxCrawlingDepth to 1 if emails are typically on homepage/contact page. Costs ~$0.02-0.05 per website scraped.

Common failures:

No emails found: Some sites hide contact info or use contact forms only
JavaScript-heavy sites: May need increased memory allocation
Cloudflare protection: Scraper may be blocked

Debugging: Check Apify run logs in dashboard. Error "ACTOR_RUN_FAILED" means insufficient memory or timeout.

OpenAI GPT-4 API

Authentication: Bearer token (API key from OpenAI dashboard).

Request format:

POST https://api.openai.com/v1/chat/completions
Headers: { 
  "Authorization": "Bearer sk-xxx",
  "Content-Type": "application/json"
}
Body: {
  "model": "gpt-4-1106-preview",
  "messages": [
    {"role": "system", "content": "You are an expert email outreach specialist..."},
    {"role": "user", "content": "Generate email for backlink from example.com to mysite.com"}
  ],
  "response_format": { "type": "json_object" }
}

Response structure:

{
  "id": "chatcmpl-xxx",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "{\"selected_email\":\"editor@example.com\",\"subject\":\"Quick request...\",\"html_body\":\"<p>Hi,</p>...\"}"
    }
  }],
  "usage": {
    "prompt_tokens": 450,
    "completion_tokens": 200,
    "total_tokens": 650
  }
}

n8n configuration:

Model: gpt-4-1106-preview (or gpt-4-1-mini for cost savings)
Use Responses API: Enabled
Output Parser: Structured Output with JSON schema
System Message: Detailed prompt with email generation guidelines

Rate limits (Tier 1):

500 requests per day
10,000 tokens per minute
For outreach: ~650 tokens/email = ~15 emails/minute max

Cost: GPT-4-1-mini: $0.15 per 1M input tokens, $0.60 per 1M output. Typical email: $0.01-0.02.

Edge cases:

AI refuses to generate: If scraped data looks like spam/sensitive info
Invalid JSON: Auto-fix in n8n or add validation prompt
Generic output: Improve system prompt with more context

Prompt engineering tips:

System message should include:
- Specific role/expertise
- Output format requirements (JSON structure)
- Tone guidelines (professional but friendly)
- What to include (backlink context, value proposition)
- What to avoid (aggressive sales language, false urgency)

Gmail API

Authentication: OAuth 2.0 credential. Requires enabling Gmail API in Google Cloud Console.

Request format:

POST https://gmail.googleapis.com/gmail/v1/users/me/messages/send
Headers: { "Authorization": "Bearer {token}" }
Body: {
  "raw": "{base64_encoded_email}"
}

n8n configuration:

Resource: Message
Operation: Send
To: {{ $json.output.selected_email }}
Subject: {{ $json.output.subject }}
Email Type: HTML
Message: {{ $json.output.html_body }}

Rate limits: 100-500 emails per day (varies by account age/reputation). For outreach, stay under 50/day to avoid spam flags.

Deliverability tips:

Warm up new accounts (start with 5-10 emails/day)
Set up SPF/DKIM records
Use professional signature
Avoid spam trigger words

Implementation Gotchas

1. Handling Missing Data

When Apify finds no emails, the IF node prevents downstream errors:

// Condition in IF node
{{ $json.email }} exists

Without this check, the AI Agent receives undefined data and fails.

2. URL Normalization

The Function node standardizes URLs because Apify expects clean input:

const urls = $input.first().json["URL Source"].split(",");
return urls.map(url => ({
  url: url.trim().startsWith("http") ? url.trim() : `https://${url.trim()}`
}));

Handles: missing protocols, extra spaces, comma-separated lists.

3. API Cost Management

Apify: ~$0.03 per site scraped
OpenAI: ~$0.015 per email generated
Gmail: Free
Total per backlink: ~$0.045

For 100 backlinks: ~$4.50 in API costs. Budget accordingly.

4. Gmail Sending Limits

Workflow processes sequentially but Gmail has daily quotas. For large lists:

// Add Wait node after Gmail Send
// Wait: 2 minutes between emails
// Spreads 100 emails over ~3.3 hours

5. Error Recovery

If workflow fails mid-execution:

n8n preserves state at last successful node
Re-run continues from failure point
Track sent emails in Google Sheets to prevent duplicates

Prerequisites

Required Accounts:

n8n Cloud or self-hosted instance
Google account with Sheets API enabled
Apify account (free tier: 5 compute units/month)
OpenAI account with API key and billing
Gmail account for sending (use dedicated outreach email)

API Credentials Needed:

Google OAuth 2.0 (Sheets + Gmail)
Apify API key (Settings → Integrations)
OpenAI API key (API Keys section)

Estimated Costs:

n8n Cloud: $20/month (Starter plan) or self-hosted (free)
Apify: $0.03/backlink or free tier
OpenAI: $0.015/email
Gmail: Free

Data Preparation:

Google Sheet structure:

| URL Source                  | Target URL                    | Status      | Date Contacted |
|-----------------------------|-------------------------------|-------------|----------------|
| https://example.com/article | https://yoursite.com/guide    | Pending     |                |

Documentation Links:

Get the Complete Workflow Configuration

This tutorial covers the API integration architecture and implementation details. For the complete n8n workflow JSON file, Google Sheets template, AI prompt configuration, and video walkthrough showing the workflow in action, check out the full implementation guide.

DEV Community

Building an AI-Powered Backlink Outreach System with n8n, Apify, and OpenAI

Architecture Overview

API Integration Deep-Dive

Google Sheets API

Apify Website Emails Scraper API

OpenAI GPT-4 API

Gmail API

Implementation Gotchas

Prerequisites

Get the Complete Workflow Configuration

Top comments (0)