Most SEO monitoring tools give you a snapshot: today's clicks, today's issues, today's recommendations. You fix something, come back tomorrow, and the tool has no idea what you did or whether it helped.
I wanted something smarter a system that remembers site's history, correlates code changes with ranking shifts, and gives AI-generated insights that get better every day. Here's what I built and how.
The Problem With Basic SEO Monitoring
A typical monitoring setup looks like this:
- Fetch Google Search Console data
- Run a Lighthouse audit
- Send a Slack message with today's numbers
That's fine. But it can't answer questions like:
- "Did that metadata fix I deployed 10 days ago actually improve rankings?"
- "This recommendation has been flagged for 15 days why hasn't it been fixed?"
- "Clicks dropped this week was it a code change or an algorithm shift?"
To answer those questions, you need memory. That's where Cognee comes in.
What Is Cognee?
Cognee is a knowledge graph SDK. Instead of storing data as flat rows in a database, it extracts entities and relationships and stores them as nodes and edges in a graph (Neo4j) with vector embeddings in a vector database (ChromaDB).
Think of it like this: a normal database stores "clicks = 262 on April 8". A knowledge graph stores "keyword 'vibe trading' ranked at position 1.78 on April 8, which is 12 spots better than March 25, and that improvement happened 3 days after a metadata fix was deployed".
The difference matters when you want AI to reason across weeks of history not just today.
System Architecture
The full pipeline runs daily via GitHub Actions:
PHASE 1 — Parallel data collection
├── Lighthouse audit (performance, SEO scores)
├── Broken links check
├── Meta tags validation
├── Core Web Vitals (via PageSpeed API)
└── Google Search Console (clicks, CTR, position, queries)
PHASE 2 — Main analysis job
├── git-change-detector.js → scans commits, classifies SEO-relevant changes
├── cognee_ingest.py → writes today's data to Neo4j + ChromaDB
├── cognee-store-updater.js → updates 30-day rolling JSON snapshot
├── audit-scraper.js → fetches live pages, scores SEO/GEO/AEO signals
├── audit-ingest.py → stores audit scores in the knowledge graph
├── cognee-analyzer.js → builds enriched AI context, calls Azure OpenAI
├── send-ai-slack.js → posts daily report to Slack
└── cognee-blob-sync.js → backs up knowledge graph to Azure Blob Storage
PHASE 3 — Weekly (Sundays)
└── competitor-monitor.js → fetches competitor pages, scores them, posts comparison
Why Cognee? The Knowledge Graph Advantage
Every day, cognee_ingest.py builds a structured document containing today's GSC metrics, top queries, AI recommendations, and recent git commits. Azure OpenAI reads this and extracts entities keywords, positions, dates, code changes which Cognee writes to Neo4j as connected nodes. graph starts to grow.
After 30 days, the graph contains nodes like:
(Keyword: "platform name") —[RANKED_AT]→ (Position: 1.78, date: April 8)
(CodeChange: "metadata fix") —[HAPPENED_BEFORE]→ (MetricSnapshot: April 5)
(Recommendation: "add FAQ schema") —[FLAGGED_ON]→ (Date: April 1)
(Recommendation: "add FAQ schema") —[FLAGGED_ON]→ (Date: April 2)
... (flagged 12 days in a row)
Now when the AI runs its daily analysis, it doesn't just see today's data. It sees patterns:
- Keyword velocity: which keywords improved or dropped more than 5 positions in 14 days
- Stuck recommendations: same issue flagged 3+ days in a row, still unactioned
- Code change impact: did clicks or position change after a specific deploy?
The Slack report reflects this. Instead of "your CTR is 18.94%", it says:
"Your site has more than doubled daily clicks over the past month (106% growth), driven by a metadata fix on March 26 and header overlap fixes on March 28. Short-term momentum is slowing 7-day clicks are -3% suggesting you need to now expand content around fast-moving branded keywords."
That's a different class of insight.
The Audit Scrape SEO/GEO/AEO Scoring
Beyond GSC data, audit-scraper.js fetches your actual pages daily and scores them across three dimensions:
SEO classic signals: title tag, meta description, H1, canonical, OG tags, schema markup, JS-gated content detection
GEO (Generative Engine Optimization) how well AI search engines like Perplexity or ChatGPT Search can read and cite your content: structured data presence, content density, crawlability
AEO (Answer Engine Optimization) featured snippet and voice search readiness: FAQ schema, article schema, H2 density, word count
Each page gets a score out of 10. The system flags critical issues (JS-gated content = crawlers see a blank page, missing H1 = no primary ranking signal) and sends a separate Slack message with the audit digest.
🔴 SEO Audit — 2026-04-08
Scores: SEO 9/10 | GEO 9/10 | AEO 3/10 | Combined 21/30
Critical Issues:
🚨 missing_h1 on Pricing — Missing primary ranking signal
🚨 js_gated_content on Pricing — Crawlers see blank page
Code Change Impact Tracking
This is the part I'm most proud of. git-change-detector.js scans git commits and classifies them it looks for commit messages mentioning SEO-related terms (metadata, schema, redirect, canonical, performance, etc.) and logs them with their date.
change-impact-tracker.js then cross-references those commits with GSC metrics. For each logged change, it compares the 7-day window before vs after deployment:
✅ Migrate to new partition keys (2026-03-30)
→ Position improved 6.7 spots (17.39 → 10.71)
✅ API pagination fix (2026-03-18)
→ Clicks grew 81% (127 → 229.7/day)
⏳ content deploy (2026-03-13)
→ Monitoring... (not enough post-deploy data yet)
This surfaces directly in the Slack report under "Code Change Tracker". Over time, it tells you which types of changes actually move the needle.
Storage Architecture
Three layers, each with a different purpose:
| Layer | What it stores | Why |
|---|---|---|
| Neo4j (Azure VM) | Graph nodes + edges — keywords, positions, code changes, relationships | Multi-hop reasoning: "which keyword improved after which deploy?" |
| ChromaDB (Azure VM) | Vector embeddings of all entities | Semantic search across history |
| cognee-knowledge.json (Azure Blob) | 30-day rolling JSON snapshots | Fast daily reads without querying the graph every run |
The JSON file is the workhorse for the daily Slack report. Neo4j and ChromaDB are queried for deeper pattern analysis and become increasingly valuable as history accumulates.
Key Things I Learned
Cognee initializes config at import time. If you set environment variables after import cognee, they're ignored. You have to call cognee.config.set_graph_db_config() directly after import to update the live config object. This cost me several hours.
The mistralai import conflict. Cognee's dependency instructor==1.14.x tries to import Mistral from mistralai at import time regardless of whether you use it. Fix: inject a fake mistralai module into sys.modules before importing Cognee.
JS-gated content is invisible to the audit scraper. If your page renders entirely client-side, the raw HTML fetch returns fewer than 80 words. The scraper flags this as js_gated_content — which is actually useful because it means Google probably can't index it either.
The knowledge graph gets smarter non-linearly. Day 1 the system is just a fancier GSC dashboard. Day 7 you start seeing real code change verdicts. Day 30 the AI recommendations start referencing patterns that span weeks. The value compounds.
Tech Stack
- GitHub Actions — pipeline orchestration, daily cron
- Node.js — audit scraper, Cognee analyzer, Slack formatting, git change detection
- Python — Cognee SDK ingestion (cognee_ingest.py, audit-ingest.py)
- Cognee 0.5.3 — knowledge graph SDK
- Neo4j Community — graph database
- ChromaDB — vector database
- Azure OpenAI — GPT-4.1 for analysis, text-embedding-3-large for vectors
- Azure Blob Storage — knowledge graph backup/restore
- Azure VM (Standard B2s) — hosts Neo4j + ChromaDB via Docker Compose
- Google Search Console API — real click/impression/position data
Top comments (0)