DEV Community

liav maman
liav maman

Posted on

I Built a Python Tool to Automate Link-Building Research (Using the Google Custom Search API)

If you've ever done SEO link building manually, you know the drill: you sit there Googling variations of "submit a guest post" + niche or intitle:"resources" + keyword, opening 40 tabs, and copy-pasting URLs into a spreadsheet. It works, but it doesn't scale, and it's mind-numbing.

So instead of doing that every month for client sites at my agency, I built a small Python tool to do the first pass for me: the Monthly Link Opportunity Finder.

Here's how it works, what I learned building it, and the actual approach if you want to build something similar.

The Problem

Link building research usually comes down to running a bunch of search operator queries against a target niche, then manually eyeballing which results are actually worth pursuing (directories, resource pages, guest post opportunities, local business associations, etc.) versus which are junk.

That triage step is where all the time goes. I wanted a script that could:

  1. Run a batch of targeted search queries automatically
  2. Pull back clean, structured results (not scraped HTML soup)
  3. Filter out obvious noise
  4. Spit out a categorized list I could actually act on

Why the Google Custom Search API (and not scraping)

My first instinct was to just scrape Google search results directly. Don't do this — it's against Google's terms, you'll get IP-blocked fast, and the HTML structure changes constantly, which means your scraper breaks every few weeks.

The Google Custom Search JSON API solves this cleanly. You configure a Programmable Search Engine (even one set to search the entire web), get an API key, and query it like any normal REST API. It returns structured JSON — title, snippet, link, display link — with no parsing headaches.

The tradeoff: it's rate-limited and metered (100 free queries/day, then paid per 1,000 queries after that), so query efficiency matters. You can't just brute-force every possible search operator combination.

The Core Approach

At a high level, the script does this:

python
import requests

API_KEY = "YOUR_API_KEY"
SEARCH_ENGINE_ID = "YOUR_CX_ID"

def search(query, start=1):
url = "https://www.googleapis.com/customsearch/v1"
params = {
"key": API_KEY,
"cx": SEARCH_ENGINE_ID,
"q": query,
"start": start
}
response = requests.get(url, params=params)
return response.json()


The interesting part isn't the API call itself — it's the query construction. I built out a bank of query templates broken into tiers based on link type, roughly:

Enter fullscreen mode Exit fullscreen mode


python
QUERY_TEMPLATES = [
'"{niche}" intitle:"resources"',
'"{niche}" "submit a guest post"',
'"{niche}" "local business directory"',
'inurl:directory "{niche}"',
'"{niche}" "add your business"',
]

def build_queries(niche):
return [t.format(niche=niche) for t in QUERY_TEMPLATES]


Each tier maps to a different outreach approach later — a resource page pitch is written completely differently than a directory submission.

Filtering the Noise

Raw search results are still noisy — you'll get a mix of huge unrelated sites, dead pages, and results that technically match the query but aren't real opportunities. I added a lightweight scoring pass based on a few heuristics:

- **Domain-level filtering**: strip out major platforms you're never getting a link from anyway (social media profiles, marketplaces, etc.)
- **Snippet keyword matching**: does the snippet actually contain language suggesting submission/inclusion is open?
- **Duplicate domain collapsing**: if five URLs from the same domain show up, keep the most relevant one and drop the rest

Enter fullscreen mode Exit fullscreen mode


python
def score_result(item, niche_keywords):
score = 0
snippet = item.get("snippet", "").lower()
if any(k in snippet for k in ["submit", "add your", "directory", "resources"]):
score += 2
if item.get("displayLink", "").endswith((".edu", ".gov")):
score += 3 # rare, high-value opportunities
return score




Nothing groundbreaking — just enough logic to sort the list roughly by "worth a human's time" before it ever reaches a spreadsheet.

Output

The last step formats everything into a CSV with columns for URL, query tier, score, and a blank "status" column I fill in manually as I work through outreach. Simple, but it turns a few hours of manual searching into a five-minute script run plus a focused review pass.

What I'd Improve Next

A few things on my list:

- **Caching queries** so re-runs in the same month don't burn API quota on repeat searches
- **Domain authority lookups** via a third-party API to weight scoring more accurately instead of relying on heuristics alone
- **Scheduling** it to run automatically at the start of each month and email me the CSV

Takeaway

You don't need a fancy SaaS SEO tool to automate the tedious parts of link research. A few hundred lines of Python and the Google Custom Search API get you most of the way there, and you keep full control over the query logic and scoring instead of being stuck with a black-box "opportunity score."

If you're doing SEO work for local businesses or agency clients, this kind of small internal tooling adds up fast — it's the difference between spending an afternoon on manual research every month versus fifteen minutes reviewing a pre-filtered list.

---

*I write about the tools I build for [Chazak Digital](https://chazakdigital.com), a web design and SEO agency working with local businesses on Long Island. If you're building similar SEO tooling, I'd love to hear how you're approaching it.*
Enter fullscreen mode Exit fullscreen mode

Top comments (0)