DEV Community

Cover image for Building an AI Visibility Monitoring Tool: A Developer's Guide to Tracking LLM Citations
msm yaqoob
msm yaqoob

Posted on

Building an AI Visibility Monitoring Tool: A Developer's Guide to Tracking LLM Citations

TL;DR
Build a Python-based monitoring system to track how AI platforms (ChatGPT, Claude, Perplexity, Gemini) cite your brand. Includes automated testing, sentiment analysis, and alerting for perception drift.

The Problem: Traditional SEO Metrics Are Incomplete
You're crushing it on Google. #1 rankings. Solid domain authority. Traffic growing.
But then you discover that when potential users ask ChatGPT or Claude about tools in your category, your product isn't mentioned at all.
Welcome to the new reality: Google rankings ≠ AI visibility.
As a developer, your first instinct is probably the same as mine: "I can build something to monitor this."
Spoiler: You can, and you should. Here's how.

What We're Building
A Python-based monitoring system that:
✅ Tests your brand across multiple AI platforms
✅ Tracks citation frequency and positioning
✅ Detects sentiment changes over time
✅ Alerts when perception drift occurs
✅ Generates weekly reports
Tech Stack:

Python 3.10+
OpenAI API (ChatGPT)
Anthropic API (Claude)
Requests library (Perplexity, Gemini)
SQLite for data storage
Pandas for analysis
Plotly for visualization

Architecture Overview
python# High-level flow
query_list = load_queries()
results = {}

for platform in ['chatgpt', 'claude', 'perplexity', 'gemini']:
for query in query_list:
response = test_platform(platform, query)
results[platform][query] = analyze_response(response)

store_results(results)
detect_drift(results)
send_alerts_if_needed()
Pretty straightforward. The complexity is in the analysis.

Step 1: Setting Up Platform APIs
ChatGPT (OpenAI)
pythonimport openai
from datetime import datetime

class ChatGPTTester:
def init(self, api_key):
self.client = openai.OpenAI(api_key=api_key)

def test_query(self, query, brand_name):
    """Test a single query and analyze brand mention"""
    response = self.client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": query}
        ],
        temperature=0.3  # Lower temp for consistency
    )

    content = response.choices[0].message.content

    return {
        'timestamp': datetime.now().isoformat(),
        'query': query,
        'response': content,
        'mentioned': brand_name.lower() in content.lower(),
        'position': self._find_position(content, brand_name),
        'competing_brands': self._extract_competitors(content),
        'sentiment': self._analyze_sentiment(content, brand_name)
    }

def _find_position(self, content, brand_name):
    """Find position of brand mention (1st, 2nd, 3rd, etc.)"""
    # Simple implementation - can be enhanced
    sentences = content.split('.')
    for i, sentence in enumerate(sentences):
        if brand_name.lower() in sentence.lower():
            return i + 1
    return None

def _extract_competitors(self, content):
    """Extract competing brand names mentioned"""
    # You'd maintain a list of known competitors
    competitors = ['Competitor1', 'Competitor2', 'Competitor3']
    found = []
    for comp in competitors:
        if comp.lower() in content.lower():
            found.append(comp)
    return found

def _analyze_sentiment(self, content, brand_name):
    """Basic sentiment analysis for brand mentions"""
    # Find sentences mentioning the brand
    sentences = [s for s in content.split('.') if brand_name.lower() in s.lower()]

    positive_words = ['best', 'leading', 'excellent', 'trusted', 'top', 'recommended']
    negative_words = ['limited', 'expensive', 'complicated', 'outdated', 'lacks']

    sentiment_score = 0
    for sentence in sentences:
        sentence_lower = sentence.lower()
        sentiment_score += sum(1 for word in positive_words if word in sentence_lower)
        sentiment_score -= sum(1 for word in negative_words if word in sentence_lower)

    if sentiment_score > 0:
        return 'positive'
    elif sentiment_score < 0:
        return 'negative'
    return 'neutral'
Enter fullscreen mode Exit fullscreen mode

Claude (Anthropic)
pythonimport anthropic

class ClaudeTester:
def init(self, api_key):
self.client = anthropic.Anthropic(api_key=api_key)

def test_query(self, query, brand_name):
    """Test query on Claude"""
    message = self.client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1000,
        temperature=0.3,
        messages=[
            {"role": "user", "content": query}
        ]
    )

    content = message.content[0].text

    return {
        'timestamp': datetime.now().isoformat(),
        'query': query,
        'response': content,
        'mentioned': brand_name.lower() in content.lower(),
        'position': self._find_position(content, brand_name),
        'competing_brands': self._extract_competitors(content),
        'sentiment': self._analyze_sentiment(content, brand_name)
    }

# Same helper methods as ChatGPTTester
Enter fullscreen mode Exit fullscreen mode

Perplexity (HTTP-based)
pythonimport requests

class PerplexityTester:
def init(self, api_key):
self.api_key = api_key
self.base_url = "https://api.perplexity.ai/chat/completions"

def test_query(self, query, brand_name):
    """Test query on Perplexity"""
    headers = {
        "Authorization": f"Bearer {self.api_key}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "llama-3.1-sonar-large-128k-online",
        "messages": [
            {"role": "user", "content": query}
        ],
        "temperature": 0.3
    }

    response = requests.post(self.base_url, json=payload, headers=headers)
    data = response.json()
    content = data['choices'][0]['message']['content']

    return {
        'timestamp': datetime.now().isoformat(),
        'query': query,
        'response': content,
        'mentioned': brand_name.lower() in content.lower(),
        'position': self._find_position(content, brand_name),
        'citations': data.get('citations', []),  # Perplexity provides citations
        'competing_brands': self._extract_competitors(content),
        'sentiment': self._analyze_sentiment(content, brand_name)
    }
Enter fullscreen mode Exit fullscreen mode

Step 2: Query Management
Create a structured query library:
python# queries.yaml
brand_queries:

  • "What is {brand_name}?"
  • "Tell me about {brand_name}"
  • "What does {brand_name} do?"

category_queries:

  • "What are the best {category} tools?"
  • "Top {category} solutions for {use_case}"
  • "Compare {category} platforms"

competitor_queries:

  • "Compare {brand_name} vs {competitor}"
  • "{brand_name} or {competitor} - which is better?"

problem_solution:

  • "How do I solve {problem}?"
  • "Best way to {use_case}" Load and format queries: pythonimport yaml

class QueryManager:
def init(self, config_file='queries.yaml'):
with open(config_file, 'r') as f:
self.templates = yaml.safe_load(f)

def generate_queries(self, brand_name, category, competitors, problems):
    """Generate formatted queries from templates"""
    queries = []

    # Brand queries
    for template in self.templates['brand_queries']:
        queries.append(template.format(brand_name=brand_name))

    # Category queries
    for template in self.templates['category_queries']:
        for use_case in ['startups', 'enterprise', 'small business']:
            queries.append(template.format(
                category=category,
                use_case=use_case
            ))

    # Competitor queries
    for template in self.templates['competitor_queries']:
        for competitor in competitors:
            queries.append(template.format(
                brand_name=brand_name,
                competitor=competitor
            ))

    # Problem-solution queries
    for template in self.templates['problem_solution']:
        for problem in problems:
            queries.append(template.format(
                problem=problem,
                use_case=problem
            ))

    return queries
Enter fullscreen mode Exit fullscreen mode

Step 3: Data Storage
Use SQLite for persistence:
pythonimport sqlite3
import json

class ResultsDB:
def init(self, db_path='ai_visibility.db'):
self.conn = sqlite3.connect(db_path)
self.create_tables()

def create_tables(self):
    """Initialize database schema"""
    self.conn.execute('''
        CREATE TABLE IF NOT EXISTS test_results (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            timestamp TEXT NOT NULL,
            platform TEXT NOT NULL,
            query TEXT NOT NULL,
            brand_mentioned BOOLEAN,
            position INTEGER,
            sentiment TEXT,
            response_text TEXT,
            competing_brands TEXT,
            raw_data TEXT
        )
    ''')

    self.conn.execute('''
        CREATE TABLE IF NOT EXISTS visibility_scores (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            date TEXT NOT NULL,
            platform TEXT NOT NULL,
            citation_rate REAL,
            avg_position REAL,
            sentiment_score REAL,
            share_of_voice REAL
        )
    ''')

    self.conn.commit()

def save_result(self, platform, result):
    """Save individual test result"""
    self.conn.execute('''
        INSERT INTO test_results 
        (timestamp, platform, query, brand_mentioned, position, 
         sentiment, response_text, competing_brands, raw_data)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
    ''', (
        result['timestamp'],
        platform,
        result['query'],
        result['mentioned'],
        result.get('position'),
        result['sentiment'],
        result['response'],
        json.dumps(result.get('competing_brands', [])),
        json.dumps(result)
    ))
    self.conn.commit()

def calculate_daily_scores(self, date, platform):
    """Calculate visibility scores for a given day"""
    cursor = self.conn.execute('''
        SELECT 
            COUNT(*) as total_queries,
            SUM(CASE WHEN brand_mentioned THEN 1 ELSE 0 END) as mentions,
            AVG(CASE WHEN position IS NOT NULL THEN position ELSE 0 END) as avg_pos,
            SUM(CASE WHEN sentiment = 'positive' THEN 1 
                     WHEN sentiment = 'negative' THEN -1 
                     ELSE 0 END) as sentiment_total
        FROM test_results
        WHERE DATE(timestamp) = ? AND platform = ?
    ''', (date, platform))

    row = cursor.fetchone()

    if row[0] == 0:
        return None

    citation_rate = (row[1] / row[0]) * 100
    avg_position = row[2]
    sentiment_score = row[3] / row[0] if row[0] > 0 else 0

    return {
        'citation_rate': citation_rate,
        'avg_position': avg_position,
        'sentiment_score': sentiment_score
    }
Enter fullscreen mode Exit fullscreen mode

Step 4: Drift Detection
Detect when your visibility changes significantly:
pythonimport pandas as pd
import numpy as np

class DriftDetector:
def init(self, db):
self.db = db

def detect_drift(self, platform, lookback_days=30, threshold=15):
    """
    Detect significant changes in visibility

    Args:
        platform: AI platform name
        lookback_days: Days to analyze
        threshold: % change to trigger alert
    """
    # Get historical data
    query = '''
        SELECT date, citation_rate, avg_position, sentiment_score
        FROM visibility_scores
        WHERE platform = ? 
        AND date >= date('now', ? || ' days')
        ORDER BY date DESC
    '''

    df = pd.read_sql_query(
        query, 
        self.db.conn, 
        params=(platform, f'-{lookback_days}')
    )

    if len(df) < 7:
        return None  # Not enough data

    # Calculate rolling averages
    df['citation_rate_ma7'] = df['citation_rate'].rolling(7).mean()
    df['position_ma7'] = df['avg_position'].rolling(7).mean()

    # Compare recent vs baseline
    recent_citation = df['citation_rate'].head(3).mean()
    baseline_citation = df['citation_rate'].tail(14).mean()

    recent_position = df['avg_position'].head(3).mean()
    baseline_position = df['avg_position'].tail(14).mean()

    # Calculate percentage changes
    citation_change = ((recent_citation - baseline_citation) / baseline_citation) * 100
    position_change = ((recent_position - baseline_position) / baseline_position) * 100

    drift_detected = False
    alerts = []

    if abs(citation_change) > threshold:
        drift_detected = True
        direction = "increased" if citation_change > 0 else "decreased"
        alerts.append(f"Citation rate {direction} by {abs(citation_change):.1f}%")

    if abs(position_change) > threshold:
        drift_detected = True
        direction = "improved" if position_change < 0 else "worsened"
        alerts.append(f"Average position {direction} by {abs(position_change):.1f}%")

    if drift_detected:
        return {
            'platform': platform,
            'drift_detected': True,
            'citation_change': citation_change,
            'position_change': position_change,
            'alerts': alerts,
            'data': df.to_dict('records')
        }

    return None
Enter fullscreen mode Exit fullscreen mode

Step 5: Automated Reporting
Generate weekly reports:
pythonimport plotly.graph_objects as go
from plotly.subplots import make_subplots

class ReportGenerator:
def init(self, db):
self.db = db

def generate_weekly_report(self):
    """Generate comprehensive weekly report"""
    platforms = ['chatgpt', 'claude', 'perplexity', 'gemini']

    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Citation Rate', 'Average Position', 
                      'Sentiment Score', 'Share of Voice')
    )

    for platform in platforms:
        # Get last 30 days of data
        query = '''
            SELECT date, citation_rate, avg_position, 
                   sentiment_score, share_of_voice
            FROM visibility_scores
            WHERE platform = ? 
            AND date >= date('now', '-30 days')
            ORDER BY date ASC
        '''

        df = pd.read_sql_query(query, self.db.conn, params=(platform,))

        # Citation Rate
        fig.add_trace(
            go.Scatter(x=df['date'], y=df['citation_rate'], 
                      name=platform, mode='lines+markers'),
            row=1, col=1
        )

        # Average Position
        fig.add_trace(
            go.Scatter(x=df['date'], y=df['avg_position'], 
                      name=platform, mode='lines+markers'),
            row=1, col=2
        )

        # Sentiment Score
        fig.add_trace(
            go.Scatter(x=df['date'], y=df['sentiment_score'], 
                      name=platform, mode='lines+markers'),
            row=2, col=1
        )

        # Share of Voice
        fig.add_trace(
            go.Scatter(x=df['date'], y=df['share_of_voice'], 
                      name=platform, mode='lines+markers'),
            row=2, col=2
        )

    fig.update_layout(height=800, showlegend=True, 
                     title_text="AI Visibility Dashboard - 30 Day Trend")

    fig.write_html('reports/weekly_report.html')

    return fig
Enter fullscreen mode Exit fullscreen mode

Step 6: Putting It All Together
Main orchestration script:
pythonimport schedule
import time
from datetime import datetime

class AIVisibilityMonitor:
def init(self, config):
self.config = config
self.db = ResultsDB()
self.query_manager = QueryManager()

    # Initialize platform testers
    self.testers = {
        'chatgpt': ChatGPTTester(config['openai_api_key']),
        'claude': ClaudeTester(config['anthropic_api_key']),
        'perplexity': PerplexityTester(config['perplexity_api_key']),
    }

    self.drift_detector = DriftDetector(self.db)
    self.reporter = ReportGenerator(self.db)

def run_daily_tests(self):
    """Run all tests for the day"""
    print(f"Starting daily tests: {datetime.now()}")

    queries = self.query_manager.generate_queries(
        brand_name=self.config['brand_name'],
        category=self.config['category'],
        competitors=self.config['competitors'],
        problems=self.config['problems']
    )

    for platform, tester in self.testers.items():
        print(f"Testing {platform}...")

        for query in queries:
            try:
                result = tester.test_query(
                    query, 
                    self.config['brand_name']
                )
                self.db.save_result(platform, result)

                # Rate limiting
                time.sleep(2)

            except Exception as e:
                print(f"Error testing {platform} - {query}: {e}")

        # Calculate daily scores
        today = datetime.now().date().isoformat()
        scores = self.db.calculate_daily_scores(today, platform)

        if scores:
            print(f"{platform} - Citation Rate: {scores['citation_rate']:.1f}%")

    print("Daily tests complete")

def check_for_drift(self):
    """Check for perception drift"""
    print("Checking for drift...")

    for platform in self.testers.keys():
        drift = self.drift_detector.detect_drift(platform)

        if drift:
            print(f"⚠️ DRIFT DETECTED on {platform}:")
            for alert in drift['alerts']:
                print(f"  - {alert}")

            # Send alert (implement your notification method)
            self.send_alert(drift)

def generate_weekly_report(self):
    """Generate and email weekly report"""
    print("Generating weekly report...")
    self.reporter.generate_weekly_report()
    # Email report (implement your email method)

def send_alert(self, drift_data):
    """Send drift alert via email/Slack/etc"""
    # Implementation depends on your notification preferences
    pass
Enter fullscreen mode Exit fullscreen mode

Configuration

config = {
'brand_name': 'YourBrand',
'category': 'AI SEO Tools',
'competitors': ['Competitor1', 'Competitor2', 'Competitor3'],
'problems': ['improve ai visibility', 'rank on chatgpt', 'optimize for llms'],
'openai_api_key': 'your-key',
'anthropic_api_key': 'your-key',
'perplexity_api_key': 'your-key',
}

Initialize monitor

monitor = AIVisibilityMonitor(config)

Schedule jobs

schedule.every().day.at("09:00").do(monitor.run_daily_tests)
schedule.every().day.at("10:00").do(monitor.check_for_drift)
schedule.every().monday.at("08:00").do(monitor.generate_weekly_report)

Run

while True:
schedule.run_pending()
time.sleep(60)

Deployment Options
Option 1: GitHub Actions (Free)
yaml# .github/workflows/ai-visibility-monitor.yml
name: AI Visibility Monitor

on:
schedule:
- cron: '0 9 * * *' # Run daily at 9 AM UTC
workflow_dispatch: # Allow manual trigger

jobs:
monitor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

  - name: Set up Python
    uses: actions/setup-python@v2
    with:
      python-version: '3.10'

  - name: Install dependencies
    run: |
      pip install -r requirements.txt

  - name: Run monitoring
    env:
      OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      PERPLEXITY_API_KEY: ${{ secrets.PERPLEXITY_API_KEY }}
    run: |
      python monitor.py --single-run

  - name: Upload results
    uses: actions/upload-artifact@v2
    with:
      name: visibility-reports
      path: reports/
Enter fullscreen mode Exit fullscreen mode

Option 2: Docker Container
dockerfileFROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "monitor.py"]
Option 3: AWS Lambda (Serverless)
For cost-effective serverless deployment with scheduled CloudWatch events.

Cost Analysis
API Costs (Monthly estimates):

OpenAI (ChatGPT): ~$50-100 (depending on query volume)
Anthropic (Claude): ~$40-80
Perplexity: ~$20-40
Total: ~$110-220/month

Infrastructure:

GitHub Actions: Free (2,000 minutes/month)
SQLite storage: Free (or S3 for ~$1/month)

Much cheaper than manual monitoring or enterprise tools ($500-2000/month).

Key Takeaways

Build it yourself - You have the skills, use them
Start simple - Don't over-engineer; iterate based on data
Automate everything - Set it and forget it (mostly)
Monitor trends, not absolutes - Drift matters more than single data points
Act on insights - Build the tool, but use the data to improve visibility

What's Next?
This is a foundation. Extensions you might add:

Natural language analysis using spaCy or transformers
Competitor benchmarking (track their visibility too)
Integration with Google Search Console (correlate traditional SEO)
Machine learning to predict drift before it happens
Multi-region testing (how visibility varies by geography)

Resources
📖 Strategic Framework: For the business side of AI visibility (how to present to executives, budget allocation, quarterly planning), check out this comprehensive guide.

Discussion
What features would you add? How are you tracking AI visibility for your projects?
Drop a comment - I'm curious what approaches other devs are taking to this problem.

Top comments (0)