Backlink prospecting is a tab-hopping chore: dashboard → export CSV → eyeball → copy domains → switch to email. It's a filter-and-rank problem, which is exactly the kind of thing an agent should do for you. So here's how to do the whole thing from inside Claude Code (or Cursor / Cline / Zed / Windsurf) with one prompt, using a free MCP server.
No affiliation required to follow along — the data is the public Common Crawl webgraph, and the MCP wrapper is open source.
What we're building toward
By the end you'll be able to type this into your agent and get back a ranked, de-noised list:
"Find the sites linking to competitor-a.com, competitor-b.com and competitor-c.com but not to my-site.com, and draft a short outreach email to each."
The agent runs a competitor gap analysis, filters to the highest-value targets, and writes the emails — without you opening a single SEO dashboard.
Step 1: install the MCP server
The server is crawlgraph-mcp on npm. It's a thin TypeScript stdio wrapper over a backlink API; nothing to clone or build.
Add it to your MCP config (claude_desktop_config.json, or .mcp.json for Claude Code):
{
"mcpServers": {
"crawlgraph": {
"command": "npx",
"args": ["-y", "crawlgraph-mcp"],
"env": { "CRAWLGRAPH_API_KEY": "cg_live_..." }
}
}
}
Grab the cg_live_ key from your account page, restart the client, and you'll see four tools appear: backlinks, gap_analysis, gap_outreach_targets, and releases.
Step 2: sanity-check the data
Before trusting it for outreach, point it at a domain you know. In your agent:
"Use the backlinks tool on stripe.com, limit 5, sorted by authority."
You'll get the top referring domains plus the target's own authority score:
{
"domain": "stripe.com",
"total_linking_domains": 100000,
"cg_authority": 78,
"results": [
{ "linking_domain": "github.com", "num_hosts": 2120, "cg_authority": 96 },
{ "linking_domain": "news.ycombinator.com", "num_hosts": 880, "cg_authority": 94 }
]
}
That total_linking_domains count is the quick gut-check: compare it to whatever your current tool reports for the same domain. If it's in the same ballpark, the graph is complete enough to prospect from.
Step 3: the actual play — gap analysis
The primitive is gap_analysis: give it your domain plus 2-5 competitors, and it returns every domain linking to at least one competitor but not to you, each tagged with which competitors it links to (found_on).
The raw output is the material, not the answer — you don't want a 1,000-row dump, you want the few dozen worth emailing. That's what gap_outreach_targets does on top:
- keeps only domains in the gap that link to all your competitors (a site linking to all three covers your whole niche and just hasn't heard of you — the warmest possible target);
- strips platform/CDN noise (amazonaws, github.io, facebook, shorteners);
- ranks the survivors by authority.
So you skip straight to:
"Run gap_outreach_targets for my-site.com against competitor-a.com, competitor-b.com and competitor-c.com."
and get back something like:
{
"priority_targets": [
{ "linking_domain": "industry-roundup.com", "found_on": ["competitor-a.com","competitor-b.com","competitor-c.com"], "cg_authority": 73 },
{ "linking_domain": "niche-review.io", "found_on": ["competitor-a.com","competitor-b.com","competitor-c.com"], "cg_authority": 61 }
],
"secondary_targets": [ /* link to 2 of 3 */ ],
"platforms_filtered": 26
}
Step 4: let the agent write the outreach
This is where doing it in an agent beats a dashboard. The list is already in context, so:
"For each priority target, draft a 3-sentence outreach email: reference the kind of content they link to in my niche, and pitch my-site.com as a fit. Keep it specific, no templates."
You get first-draft emails per target in the same session. You still edit them in your voice — cold outreach that goes out unedited gets ignored — but the gather → filter → rank → draft chain that used to be an afternoon is now one conversation.
Why 2-3 competitors, not one
A site linking to one competitor might be a fluke or a paid placement. A site linking to three of your competitors is a publisher who covers your category. That overlap is the qualifier — it's the difference between a prospect list and a noise list.
Honest limitations
- Quarterly snapshot. Common Crawl publishes ~4×/year, so this is for one-off prospecting, not live link monitoring. For "what changed this week," use a continuous-crawl tool.
- No anchor text in the gap output (the webgraph is source→destination edges).
- Quotas: the free path covers light use; heavier use needs the lifetime tier (1,000 backlink calls + 50 gap jobs/month).
Why this pattern matters beyond backlinks
The interesting bit isn't the SEO — it's that a well-scoped MCP tool can encode a workflow, not just an API call. gap_outreach_targets doesn't mirror an endpoint; it does the filtering-and-ranking judgment you'd otherwise re-explain to the model every time. If you're building MCP servers, that's the lever: ship the composite tool that returns a decision-ready answer, not just the raw rows.
Server's MIT and on GitHub (npx -y crawlgraph-mcp). If you try the gap play, I'd genuinely like to hear whether the priority/secondary split matches what you'd have picked by hand.
Top comments (0)