DEV Community

Hackceleration
Hackceleration

Posted on • Originally published at hackceleration.com

Building an AI-Powered Backlink Outreach System with n8n, Apify, and OpenAI

Architecture Overview

This n8n workflow automates backlink outreach by chaining together Google Sheets data retrieval, Apify web scraping, and OpenAI's GPT-4 for email generation. Here's the data flow:

[Google Sheets] → [URL Cleaning] → [Apify Scraper] → [Email Validation] → [GPT-4 Generation] → [Gmail Send]
       ↓              ↓                   ↓                  ↓                    ↓                ↓
   Backlink      Normalize        Extract Contact      Check if Found      Generate Email    Deliver
   List          URLs             Emails               Emails Exist        with AI           Message
Enter fullscreen mode Exit fullscreen mode

Why this architecture? Sequential processing prevents API rate limits and ensures each backlink gets proper attention. Batch size of 1 means if email scraping fails for one URL, the workflow continues with the next. The IF conditional acts as a quality gate—no emails found means skip to next backlink.

Alternative considered: Parallel processing would be faster but risks overwhelming Apify's scraper and makes debugging failures harder. For outreach campaigns, reliability > speed.

API Integration Deep-Dive

Google Sheets API

Authentication: OAuth 2.0 credential in n8n. Grant read access to your backlink tracking spreadsheet.

Request structure:

// n8n handles this internally, but equivalent REST call:
GET https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}/values/{range}
Headers: { "Authorization": "Bearer {token}" }
Enter fullscreen mode Exit fullscreen mode

Response format:

{
  "range": "Sheet1!A1:B100",
  "values": [
    ["URL Source", "Target URL"],
    ["https://example.com/article", "https://yoursite.com/page"]
  ]
}
Enter fullscreen mode Exit fullscreen mode

n8n configuration:

  • Resource: Sheet Within Document
  • Operation: Get Row(s)
  • Document: Select from list
  • Sheet: Select target sheet
  • Returns: Array of objects with column headers as keys

Rate limits: 100 requests per 100 seconds per user. Workflow runs once per execution, so no concern.

Edge cases: Empty rows return null values—handle in downstream nodes. If spreadsheet structure changes, workflow breaks (column name dependencies).

Apify Website Emails Scraper API

Authentication: API key passed in header.

Request example:

POST https://api.apify.com/v2/acts/maximedupre~website-emails-scraper/runs
Headers: { "Authorization": "Bearer apify_api_xxx" }
Body: {
  "startUrls": [{"url": "https://example.com"}],
  "maxRequestsPerCrawl": 10,
  "maxCrawlingDepth": 2
}
Enter fullscreen mode Exit fullscreen mode

Response structure:

{
  "data": {
    "items": [
      {
        "url": "https://example.com",
        "email": "contact@example.com",
        "foundOnPage": "https://example.com/contact"
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

n8n configuration:

  • Resource: Actor
  • Operation: Run an Actor and Get Dataset
  • Actor: Website Emails Scraper
  • Input JSON: {{ $json }} (passes cleaned URL array)
  • Memory: 1024 MB (balance cost/performance)

Rate limits: Based on compute units, not requests. 1 GB memory = ~0.1 compute units per minute. Free tier: 5 compute units/month.

Cost optimization: Reduce maxCrawlingDepth to 1 if emails are typically on homepage/contact page. Costs ~$0.02-0.05 per website scraped.

Common failures:

  • No emails found: Some sites hide contact info or use contact forms only
  • JavaScript-heavy sites: May need increased memory allocation
  • Cloudflare protection: Scraper may be blocked

Debugging: Check Apify run logs in dashboard. Error "ACTOR_RUN_FAILED" means insufficient memory or timeout.

OpenAI GPT-4 API

Authentication: Bearer token (API key from OpenAI dashboard).

Request format:

POST https://api.openai.com/v1/chat/completions
Headers: { 
  "Authorization": "Bearer sk-xxx",
  "Content-Type": "application/json"
}
Body: {
  "model": "gpt-4-1106-preview",
  "messages": [
    {"role": "system", "content": "You are an expert email outreach specialist..."},
    {"role": "user", "content": "Generate email for backlink from example.com to mysite.com"}
  ],
  "response_format": { "type": "json_object" }
}
Enter fullscreen mode Exit fullscreen mode

Response structure:

{
  "id": "chatcmpl-xxx",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "{\"selected_email\":\"editor@example.com\",\"subject\":\"Quick request...\",\"html_body\":\"<p>Hi,</p>...\"}"
    }
  }],
  "usage": {
    "prompt_tokens": 450,
    "completion_tokens": 200,
    "total_tokens": 650
  }
}
Enter fullscreen mode Exit fullscreen mode

n8n configuration:

  • Model: gpt-4-1106-preview (or gpt-4-1-mini for cost savings)
  • Use Responses API: Enabled
  • Output Parser: Structured Output with JSON schema
  • System Message: Detailed prompt with email generation guidelines

Rate limits (Tier 1):

  • 500 requests per day
  • 10,000 tokens per minute
  • For outreach: ~650 tokens/email = ~15 emails/minute max

Cost: GPT-4-1-mini: $0.15 per 1M input tokens, $0.60 per 1M output. Typical email: $0.01-0.02.

Edge cases:

  • AI refuses to generate: If scraped data looks like spam/sensitive info
  • Invalid JSON: Auto-fix in n8n or add validation prompt
  • Generic output: Improve system prompt with more context

Prompt engineering tips:

System message should include:
- Specific role/expertise
- Output format requirements (JSON structure)
- Tone guidelines (professional but friendly)
- What to include (backlink context, value proposition)
- What to avoid (aggressive sales language, false urgency)
Enter fullscreen mode Exit fullscreen mode

Gmail API

Authentication: OAuth 2.0 credential. Requires enabling Gmail API in Google Cloud Console.

Request format:

POST https://gmail.googleapis.com/gmail/v1/users/me/messages/send
Headers: { "Authorization": "Bearer {token}" }
Body: {
  "raw": "{base64_encoded_email}"
}
Enter fullscreen mode Exit fullscreen mode

n8n configuration:

  • Resource: Message
  • Operation: Send
  • To: {{ $json.output.selected_email }}
  • Subject: {{ $json.output.subject }}
  • Email Type: HTML
  • Message: {{ $json.output.html_body }}

Rate limits: 100-500 emails per day (varies by account age/reputation). For outreach, stay under 50/day to avoid spam flags.

Deliverability tips:

  • Warm up new accounts (start with 5-10 emails/day)
  • Set up SPF/DKIM records
  • Use professional signature
  • Avoid spam trigger words

Implementation Gotchas

1. Handling Missing Data

When Apify finds no emails, the IF node prevents downstream errors:

// Condition in IF node
{{ $json.email }} exists
Enter fullscreen mode Exit fullscreen mode

Without this check, the AI Agent receives undefined data and fails.

2. URL Normalization

The Function node standardizes URLs because Apify expects clean input:

const urls = $input.first().json["URL Source"].split(",");
return urls.map(url => ({
  url: url.trim().startsWith("http") ? url.trim() : `https://${url.trim()}`
}));
Enter fullscreen mode Exit fullscreen mode

Handles: missing protocols, extra spaces, comma-separated lists.

3. API Cost Management

  • Apify: ~$0.03 per site scraped
  • OpenAI: ~$0.015 per email generated
  • Gmail: Free
  • Total per backlink: ~$0.045

For 100 backlinks: ~$4.50 in API costs. Budget accordingly.

4. Gmail Sending Limits

Workflow processes sequentially but Gmail has daily quotas. For large lists:

// Add Wait node after Gmail Send
// Wait: 2 minutes between emails
// Spreads 100 emails over ~3.3 hours
Enter fullscreen mode Exit fullscreen mode

5. Error Recovery

If workflow fails mid-execution:

  • n8n preserves state at last successful node
  • Re-run continues from failure point
  • Track sent emails in Google Sheets to prevent duplicates

Prerequisites

Required Accounts:

API Credentials Needed:

  • Google OAuth 2.0 (Sheets + Gmail)
  • Apify API key (Settings → Integrations)
  • OpenAI API key (API Keys section)

Estimated Costs:

  • n8n Cloud: $20/month (Starter plan) or self-hosted (free)
  • Apify: $0.03/backlink or free tier
  • OpenAI: $0.015/email
  • Gmail: Free

Data Preparation:

Google Sheet structure:

| URL Source                  | Target URL                    | Status      | Date Contacted |
|-----------------------------|-------------------------------|-------------|----------------|
| https://example.com/article | https://yoursite.com/guide    | Pending     |                |
Enter fullscreen mode Exit fullscreen mode

Documentation Links:

Get the Complete Workflow Configuration

This tutorial covers the API integration architecture and implementation details. For the complete n8n workflow JSON file, Google Sheets template, AI prompt configuration, and video walkthrough showing the workflow in action, check out the full implementation guide.

Top comments (0)