Haripriya Veluchamy

Posted on Apr 10

I Built an AI SEO Monitor That Remembers Everything

#ai #machinelearning #data #automation

Most SEO monitoring tools give you a snapshot: today's clicks, today's issues, today's recommendations. You fix something, come back tomorrow, and the tool has no idea what you did or whether it helped.

I wanted something smarter a system that remembers site's history, correlates code changes with ranking shifts, and gives AI-generated insights that get better every day. Here's what I built and how.

The Problem With Basic SEO Monitoring

A typical monitoring setup looks like this:

Fetch Google Search Console data
Run a Lighthouse audit
Send a Slack message with today's numbers

That's fine. But it can't answer questions like:

"Did that metadata fix I deployed 10 days ago actually improve rankings?"
"This recommendation has been flagged for 15 days why hasn't it been fixed?"
"Clicks dropped this week was it a code change or an algorithm shift?"

To answer those questions, you need memory. That's where Cognee comes in.

What Is Cognee?

Cognee is a knowledge graph SDK. Instead of storing data as flat rows in a database, it extracts entities and relationships and stores them as nodes and edges in a graph (Neo4j) with vector embeddings in a vector database (ChromaDB).

Think of it like this: a normal database stores "clicks = 262 on April 8". A knowledge graph stores "keyword 'vibe trading' ranked at position 1.78 on April 8, which is 12 spots better than March 25, and that improvement happened 3 days after a metadata fix was deployed".

The difference matters when you want AI to reason across weeks of history not just today.

System Architecture

The full pipeline runs daily via GitHub Actions:

PHASE 1 — Parallel data collection
├── Lighthouse audit (performance, SEO scores)
├── Broken links check
├── Meta tags validation
├── Core Web Vitals (via PageSpeed API)
└── Google Search Console (clicks, CTR, position, queries)

PHASE 2 — Main analysis job
├── git-change-detector.js    → scans commits, classifies SEO-relevant changes
├── cognee_ingest.py          → writes today's data to Neo4j + ChromaDB
├── cognee-store-updater.js   → updates 30-day rolling JSON snapshot
├── audit-scraper.js          → fetches live pages, scores SEO/GEO/AEO signals
├── audit-ingest.py           → stores audit scores in the knowledge graph
├── cognee-analyzer.js        → builds enriched AI context, calls Azure OpenAI
├── send-ai-slack.js          → posts daily report to Slack
└── cognee-blob-sync.js       → backs up knowledge graph to Azure Blob Storage

PHASE 3 — Weekly (Sundays)
└── competitor-monitor.js     → fetches competitor pages, scores them, posts comparison

Why Cognee? The Knowledge Graph Advantage

Every day, cognee_ingest.py builds a structured document containing today's GSC metrics, top queries, AI recommendations, and recent git commits. Azure OpenAI reads this and extracts entities keywords, positions, dates, code changes which Cognee writes to Neo4j as connected nodes. graph starts to grow.

After 30 days, the graph contains nodes like:

(Keyword: "platform name") —[RANKED_AT]→ (Position: 1.78, date: April 8)
(CodeChange: "metadata fix") —[HAPPENED_BEFORE]→ (MetricSnapshot: April 5)
(Recommendation: "add FAQ schema") —[FLAGGED_ON]→ (Date: April 1)
(Recommendation: "add FAQ schema") —[FLAGGED_ON]→ (Date: April 2)
... (flagged 12 days in a row)

Now when the AI runs its daily analysis, it doesn't just see today's data. It sees patterns:

Keyword velocity: which keywords improved or dropped more than 5 positions in 14 days
Stuck recommendations: same issue flagged 3+ days in a row, still unactioned
Code change impact: did clicks or position change after a specific deploy?

The Slack report reflects this. Instead of "your CTR is 18.94%", it says:

"Your site has more than doubled daily clicks over the past month (106% growth), driven by a metadata fix on March 26 and header overlap fixes on March 28. Short-term momentum is slowing 7-day clicks are -3% suggesting you need to now expand content around fast-moving branded keywords."

That's a different class of insight.

The Audit Scrape SEO/GEO/AEO Scoring

Beyond GSC data, audit-scraper.js fetches your actual pages daily and scores them across three dimensions:

SEO classic signals: title tag, meta description, H1, canonical, OG tags, schema markup, JS-gated content detection

GEO (Generative Engine Optimization) how well AI search engines like Perplexity or ChatGPT Search can read and cite your content: structured data presence, content density, crawlability

AEO (Answer Engine Optimization) featured snippet and voice search readiness: FAQ schema, article schema, H2 density, word count

Each page gets a score out of 10. The system flags critical issues (JS-gated content = crawlers see a blank page, missing H1 = no primary ranking signal) and sends a separate Slack message with the audit digest.

🔴 SEO Audit — 2026-04-08
Scores: SEO 9/10 | GEO 9/10 | AEO 3/10 | Combined 21/30

Critical Issues:
🚨 missing_h1 on Pricing — Missing primary ranking signal
🚨 js_gated_content on Pricing — Crawlers see blank page

Code Change Impact Tracking

This is the part I'm most proud of. git-change-detector.js scans git commits and classifies them it looks for commit messages mentioning SEO-related terms (metadata, schema, redirect, canonical, performance, etc.) and logs them with their date.

change-impact-tracker.js then cross-references those commits with GSC metrics. For each logged change, it compares the 7-day window before vs after deployment:

✅ Migrate to new partition keys (2026-03-30)
   → Position improved 6.7 spots (17.39 → 10.71)

✅ API pagination fix (2026-03-18)
   → Clicks grew 81% (127 → 229.7/day)

⏳ content deploy (2026-03-13)
   → Monitoring... (not enough post-deploy data yet)

This surfaces directly in the Slack report under "Code Change Tracker". Over time, it tells you which types of changes actually move the needle.

Storage Architecture

Three layers, each with a different purpose:

Layer	What it stores	Why
Neo4j (Azure VM)	Graph nodes + edges — keywords, positions, code changes, relationships	Multi-hop reasoning: "which keyword improved after which deploy?"
ChromaDB (Azure VM)	Vector embeddings of all entities	Semantic search across history
cognee-knowledge.json (Azure Blob)	30-day rolling JSON snapshots	Fast daily reads without querying the graph every run

The JSON file is the workhorse for the daily Slack report. Neo4j and ChromaDB are queried for deeper pattern analysis and become increasingly valuable as history accumulates.

Key Things I Learned

Cognee initializes config at import time. If you set environment variables after import cognee, they're ignored. You have to call cognee.config.set_graph_db_config() directly after import to update the live config object. This cost me several hours.

The mistralai import conflict. Cognee's dependency instructor==1.14.x tries to import Mistral from mistralai at import time regardless of whether you use it. Fix: inject a fake mistralai module into sys.modules before importing Cognee.

JS-gated content is invisible to the audit scraper. If your page renders entirely client-side, the raw HTML fetch returns fewer than 80 words. The scraper flags this as js_gated_content — which is actually useful because it means Google probably can't index it either.

The knowledge graph gets smarter non-linearly. Day 1 the system is just a fancier GSC dashboard. Day 7 you start seeing real code change verdicts. Day 30 the AI recommendations start referencing patterns that span weeks. The value compounds.

Tech Stack

GitHub Actions — pipeline orchestration, daily cron
Node.js — audit scraper, Cognee analyzer, Slack formatting, git change detection
Python — Cognee SDK ingestion (cognee_ingest.py, audit-ingest.py)
Cognee 0.5.3 — knowledge graph SDK
Neo4j Community — graph database
ChromaDB — vector database
Azure OpenAI — GPT-4.1 for analysis, text-embedding-3-large for vectors
Azure Blob Storage — knowledge graph backup/restore
Azure VM (Standard B2s) — hosts Neo4j + ChromaDB via Docker Compose
Google Search Console API — real click/impression/position data

Top comments (1)

Apogee Watcher • Apr 24

The interesting part here is not just collecting more SEO data, but keeping enough historical context to understand change.

That is usually the missing layer in monitoring setups. Teams can fetch GSC numbers and run Lighthouse, but the harder question is connecting a drop or improvement to a specific deployment, template edit, metadata change, or third-party script update.

Once the system can preserve that context, the monitoring becomes much more actionable.