If you're running SEO for a content site, your data lives in three places: Google Search Console has your rankings and impressions, GA4 has your traffic and engagement, and a third-party tool like DataForSEO has competitor positions and keyword difficulty. Checking each dashboard separately wastes time and misses the connections between them.
I built a Python pipeline that pulls from all three APIs, merges the data into a single DataFrame, and scores every keyword by opportunity. This runs on Shatranj Live, a chess analytics platform where I publish 40+ SEO-optimized articles. Here's how it works.
Architecture Overview
┌──────────────────┐ ┌──────────────┐ ┌───────────────┐
│ Google Search │ │ GA4 │ │ DataForSEO │
│ Console │ │ (via API) │ │ (via API) │
│ │ │ │ │ │
│ - Rankings │ │ - Sessions │ │ - SERP data │
│ - Impressions │ │ - Bounce │ │ - KD scores │
│ - CTR │ │ - Duration │ │ - Competitor │
│ - Clicks │ │ - Events │ │ positions │
└────────┬─────────┘ └──────┬───────┘ └───────┬───────┘
│ │ │
└─────────┬───────────┘ │
│ │
┌──────▼──────┐ │
│ Merge on │◄─────────────────────────┘
│ keyword / │
│ URL │
└──────┬──────┘
│
┌──────▼──────┐
│ Opportunity │
│ Scorer │
└──────┬──────┘
│
┌──────▼──────┐
│ Prioritized│
│ Actions │
└─────────────┘
Setting Up the Three APIs
Google Search Console
GSC gives you what no other tool can: your actual ranking data straight from Google.
from google.oauth2 import service_account
from googleapiclient.discovery import build
class GoogleSearchConsole:
def __init__(self, credentials_path: str, site_url: str):
creds = service_account.Credentials.from_service_account_file(
credentials_path,
scopes=['https://www.googleapis.com/auth/webmasters.readonly']
)
self.service = build('searchconsole', 'v1', credentials=creds)
self.site_url = site_url
def get_keyword_positions(self, days: int = 90, limit: int = 500) -> list:
end = datetime.now().strftime('%Y-%m-%d')
start = (datetime.now() - timedelta(days=days)).strftime('%Y-%m-%d')
response = self.service.searchanalytics().query(
siteUrl=self.site_url,
body={
'startDate': start,
'endDate': end,
'dimensions': ['query', 'page'],
'rowLimit': limit,
'dimensionFilterGroups': [{
'filters': [{
'dimension': 'country',
'expression': 'ind' # filter by country if needed
}]
}]
}
).execute()
rows = []
for row in response.get('rows', []):
rows.append({
'query': row['keys'][0],
'page': row['keys'][1],
'clicks': row['clicks'],
'impressions': row['impressions'],
'ctr': row['ctr'],
'position': row['position'],
})
return rows
The key fields: query (what users searched), position (your average rank), ctr (click-through rate), and impressions (how often you appeared).
GA4
GA4 tells you what happens after the click. I use the GA4 Data API (not the old Universal Analytics API):
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
RunReportRequest, DateRange, Dimension, Metric
)
class GoogleAnalytics:
def __init__(self, credentials_path: str, property_id: str):
self.client = BetaAnalyticsDataClient.from_service_account_json(
credentials_path
)
self.property_id = f'properties/{property_id}'
def get_page_metrics(self, days: int = 90) -> list:
end = datetime.now().strftime('%Y-%m-%d')
start = (datetime.now() - timedelta(days=days)).strftime('%Y-%m-%d')
request = RunReportRequest(
property=self.property_id,
date_ranges=[DateRange(start_date=start, end_date=end)],
dimensions=[Dimension(name='pagePath')],
metrics=[
Metric(name='sessions'),
Metric(name='averageSessionDuration'),
Metric(name='bounceRate'),
Metric(name='engagedSessions'),
Metric(name='screenPageViews'),
],
limit=500,
)
response = self.client.run_report(request)
rows = []
for row in response.rows:
rows.append({
'page': row.dimension_values[0].value,
'sessions': int(row.metric_values[0].value),
'avg_duration': float(row.metric_values[1].value),
'bounce_rate': float(row.metric_values[2].value),
'engaged_sessions': int(row.metric_values[3].value),
'pageviews': int(row.metric_values[4].value),
})
return rows
DataForSEO
DataForSEO fills the gap: keyword difficulty, search volume, and competitor SERP positions. Their API is REST-based and straightforward:
import requests
import base64
class DataForSEO:
def __init__(self, login: str, password: str):
cred = base64.b64encode(f'{login}:{password}'.encode()).decode()
self.headers = {
'Authorization': f'Basic {cred}',
'Content-Type': 'application/json'
}
self.base_url = 'https://api.dataforseo.com/v3'
def get_keyword_data(self, keywords: list, location: int = 2356) -> list:
"""Get search volume and difficulty for keywords.
location 2356 = India, 2840 = United States"""
payload = [{
'keywords': keywords[:700], # API limit per request
'location_code': location,
'language_code': 'en',
}]
resp = requests.post(
f'{self.base_url}/keywords_data/google_ads/search_volume/live',
headers=self.headers,
json=payload
)
resp.raise_for_status()
results = []
for task in resp.json().get('tasks', []):
for item in task.get('result', []):
results.append({
'keyword': item['keyword'],
'volume': item.get('search_volume', 0),
'competition': item.get('competition', 0),
'cpc': item.get('cpc', 0),
'kd': item.get('keyword_difficulty', 0),
})
return results
def get_serp_competitors(self, keyword: str, location: int = 2356) -> list:
"""Get top 10 SERP results for a keyword."""
payload = [{
'keyword': keyword,
'location_code': location,
'language_code': 'en',
'depth': 10,
}]
resp = requests.post(
f'{self.base_url}/serp/google/organic/live/regular',
headers=self.headers,
json=payload
)
resp.raise_for_status()
competitors = []
for task in resp.json().get('tasks', []):
for result in task.get('result', []):
for item in result.get('items', []):
if item['type'] == 'organic':
competitors.append({
'position': item['rank_absolute'],
'url': item['url'],
'domain': item['domain'],
'title': item['title'],
})
return competitors
Merging the Data
This is where it gets useful. Each API uses different identifiers: GSC uses query + page URL, GA4 uses page path, DataForSEO uses keyword. The merge logic normalizes these:
import pandas as pd
class DataAggregator:
def __init__(self, gsc, ga4, dataforseo):
self.gsc = gsc
self.ga4 = ga4
self.dataforseo = dataforseo
def build_unified_report(self, days: int = 90) -> pd.DataFrame:
# Pull from all three sources
gsc_data = pd.DataFrame(self.gsc.get_keyword_positions(days=days))
ga4_data = pd.DataFrame(self.ga4.get_page_metrics(days=days))
# Normalize URLs for joining
gsc_data['page_path'] = gsc_data['page'].apply(
lambda u: '/' + u.split('/', 3)[-1] if '/' in u else u
)
ga4_data['page_path'] = ga4_data['page']
# Merge GSC + GA4 on page path
merged = gsc_data.merge(
ga4_data,
on='page_path',
how='left',
suffixes=('_gsc', '_ga4')
)
# Enrich top keywords with DataForSEO data
top_keywords = merged.nlargest(100, 'impressions')['query'].unique().tolist()
kd_data = pd.DataFrame(
self.dataforseo.get_keyword_data(top_keywords)
)
# Merge keyword difficulty
final = merged.merge(
kd_data,
left_on='query',
right_on='keyword',
how='left'
)
return final
The Opportunity Scorer
Raw merged data is useful, but ranking keywords by opportunity is where the pipeline pays for itself. I score each keyword on 8 weighted factors:
def score_opportunity(row: dict) -> float:
"""Score a keyword opportunity from 0-100."""
weights = {
'volume': 0.25,
'position': 0.20,
'intent': 0.20,
'competition': 0.15,
'cluster': 0.10,
'ctr_gap': 0.05,
'freshness': 0.05,
'trend': 0.05,
}
scores = {}
# Volume score: higher volume = higher score
vol = row.get('volume', 0)
scores['volume'] = min(vol / 5000, 1.0) * 100
# Position score: positions 4-15 are the sweet spot
# (close enough to reach top 3, but not already there)
pos = row.get('position', 100)
if 4 <= pos <= 10:
scores['position'] = 90
elif 11 <= pos <= 15:
scores['position'] = 70
elif 1 <= pos <= 3:
scores['position'] = 30 # already ranking well
else:
scores['position'] = max(0, 50 - (pos - 15) * 2)
# CTR gap: expected CTR for position vs actual CTR
expected_ctr = {1: 0.28, 2: 0.15, 3: 0.11, 4: 0.08,
5: 0.06, 6: 0.05, 7: 0.04, 8: 0.03}
exp = expected_ctr.get(int(pos), 0.02)
actual = row.get('ctr', 0)
scores['ctr_gap'] = min((exp - actual) / max(exp, 0.01), 1.0) * 100
# Competition: lower KD = higher opportunity
kd = row.get('kd', 50)
scores['competition'] = max(0, 100 - kd)
# Weighted total
total = sum(scores.get(k, 50) * v for k, v in weights.items())
return round(total, 1)
Putting It Together
Here's the full script I run weekly:
#!/usr/bin/env python3
"""Weekly SEO opportunity report."""
from data_aggregator import DataAggregator
from google_search_console import GoogleSearchConsole
from google_analytics import GoogleAnalytics
from dataforseo import DataForSEO
from dotenv import load_dotenv
import os
load_dotenv('config/.env')
gsc = GoogleSearchConsole(
credentials_path='credentials/ga4-credentials.json',
site_url='sc-domain:shatranj.live'
)
ga4 = GoogleAnalytics(
credentials_path='credentials/ga4-credentials.json',
property_id=os.environ['GA4_PROPERTY_ID']
)
dforseo = DataForSEO(
login=os.environ['DATAFORSEO_LOGIN'],
password=os.environ['DATAFORSEO_PASSWORD']
)
# Build unified report
aggregator = DataAggregator(gsc, ga4, dforseo)
report = aggregator.build_unified_report(days=90)
# Score opportunities
report['opportunity_score'] = report.apply(score_opportunity, axis=1)
# Top 20 opportunities
top = report.nlargest(20, 'opportunity_score')
print(top[['query', 'position', 'clicks', 'volume', 'kd', 'opportunity_score']])
Sample output from a recent run on Shatranj Live:
query position clicks volume kd opportunity_score
candidates tournament 2026 4.2 312 2400 45 82.3
fide rating system 6.1 189 1900 52 78.6
chess rating categories 8.3 94 1600 38 76.1
how to get fide rating 11.2 67 1100 28 74.8
elo rating chess 12.4 45 3200 61 71.2
The "sweet spot" keywords are positions 4-15 with decent volume and low-to-medium KD. These are the ones where a content refresh or new article can push you into the top 3.
What I Learned Running This
GSC is the ground truth. DataForSEO's volume estimates are directional, not precise. GSC tells you exactly how many impressions and clicks you actually got. Always prioritize GSC data when the two conflict.
GA4's engagement metrics catch content problems. If a page ranks well (GSC position 5) but has a 85% bounce rate (GA4), the content isn't matching search intent. That's a rewrite opportunity, not a link-building one.
DataForSEO's value is competitor intelligence. Knowing that position 1 for "candidates tournament 2026" is held by Chess.com with a 5,000-word guide tells you what you need to beat. Pull their SERP data, analyze the gap, and write accordingly.
Run it weekly, not daily. Rankings fluctuate daily. Weekly aggregation smooths the noise and shows real trends. I run this every Monday morning and feed the output into my content planning for the week.
Setup Checklist
- GSC: Enable Search Console API in Google Cloud Console, create a service account, add it as a user in GSC
- GA4: Enable Analytics Data API, use the same service account, grant Viewer access to your GA4 property
- DataForSEO: Sign up at dataforseo.com, get API credentials (they have a pay-as-you-go model)
-
Dependencies:
pip install google-api-python-client google-analytics-data pandas python-dotenv requests -
Credentials: Store in
.envfile, never commit to git
The full pipeline runs in about 30 seconds for 500 keywords. The insights it surfaces in that time would take hours of manual dashboard-hopping.
I use this pipeline for Shatranj Live, where we publish SEO-optimized chess content. Track live Candidates Tournament 2026 standings and explore player ratings.
Top comments (0)