Neil Yan

Posted on Jun 16

Building an AI Visibility Scanner: Hybrid AI Analysis Architecture

#ai #architecture #llm #showdev

Building an AI Visibility Scanner: Hybrid AI Analysis Architecture

If you've been following the AI space, you've likely noticed the shift: users are no longer just "Googling it." They're asking ChatGPT, Perplexity, Claude, and Gemini directly. This changes everything about how content gets discovered — and it's a problem most site owners haven't even realized they have.

Traditional SEO metrics (backlinks, domain authority, keyword stuffing) have only a ~0.3 correlation with AI citation rates. A site that ranks #1 on Google can be completely invisible to ChatGPT. This is the gap Generative Engine Optimization (GEO) fills.

In this article, I'll walk through what GEO actually means from a technical perspective, then dive into a real implementation — using GetCiteFlow, the AI visibility scanner I built — with code, architecture decisions, and lessons learned.

1. Understanding GEO: The Technical Layer AI Search Cares About

When an AI like ChatGPT or Claude answers a user query, it doesn't "rank" pages the way Google does. Instead, it looks for signals that make content easy to cite, summarize, and attribute.

Through our analysis of thousands of sites, we found six dimensions that matter most:

Dimension	What It Measures
AI Visibility	Can the AI find and parse your content?
FAQ Coverage	Do you have structured FAQ schema?
Entity Clarity	Does the page clearly define what it is?
Authority	Is there original research or named authors?
Content Structure	Are lists, tables, and headings being used?
Summary Optimization	Is there a clear summary for AI to extract?

The key insight: AI search engines don't read pages the way humans do. They look for machine-readable signals — structured data, entity definitions, llms.txt files — not just keyword density.

2. Architecture Overview: Hybrid Analysis with AI + Deterministic Checks

GetCiteFlow uses a hybrid analysis architecture. Instead of relying solely on an LLM to evaluate a site (which can hallucinate), we combine two independent analysis layers:

User enters URL
      |
      v
  [1] Scrape site → extract signals (HTML parsing)
      |
      v
  [2] Format signals → send to AI (Gemini/OpenAI/Deepseek)
      |
      v
  [3] AI returns structured JSON (score, breakdown, suggestions)
      |
      v
  [4] Merge with deterministic checks (lists, meta length, etc.)
      |
      v
  [5] Cache result + render report

Here's the core orchestration function from lib/analyze.ts:

export async function analyzeSite(url: string): Promise<Record<string, unknown>> {
  const cacheKey = `report:${url}`;
  const cached = cacheGet<Record<string, unknown>>(cacheKey);
  if (cached) return { ...cached, cached: true };

  // Deduplicate concurrent requests to the same URL
  if (pendingCache.has(cacheKey)) {
    return pendingCache.get(cacheKey)!;
  }

  const analyze = async () => {
    const activeProvider = getProvider();
    const fn = providerFns[activeProvider];
    const report = await fn(url);

    // Merge AI results with deterministic signal detection
    const siteData = await getSiteData(url);
    const deterministicMissing = getDeterministicMissing(siteData);
    const aiMissing = (report.missing as string[]) || [];
    const mergedMissing = [...new Set([...aiMissing, ...deterministicMissing])];

    const result = { ...report, missing: mergedMissing };
    cacheSet(cacheKey, result, CACHE_TTL_MS);
    return result;
  };

  const promise = analyze();
  pendingCache.set(cacheKey, promise);
  return promise;
}

Why hybrid? LLMs are great at qualitative judgment but bad at counting. An AI might miss that a site has no <ul> tags, but a simple regex check won't. By combining both, we get the best of both worlds.

3. The Scraper: Extracting Deterministic Signals

The scraper (lib/scrape.ts) is a pure-HTTP fetcher — no headless browser. It fetches the HTML, parses structured signals using regex, and checks for critical static files.

async function extractFromHtml(html: string) {
  const titleMatch = /<title[^>]*>([^<]*)<\/title>/i.exec(html);
  const hasOpenGraph =
    /<meta[^>]+property=["']og:(title|description|image)["']/i.test(html);
  const hasFaqSchema = /* parses JSON-LD <script> blocks */;
  const hasOrderedLists = /<ol[\s>]/i.test(html);
  const avgParagraphLength = /* calculates from <p> tags */;
  const hasSummarySection =
    /\b(key takeaways?|executive summary|tldr|tl;dr)\b/i.test(bodyLower);

  return { title, hasOpenGraph, hasFaqSchema, hasOrderedLists, ... };
}

We also check three critical files in parallel:

const [hasRobotsTxt, hasSitemap, hasLlmstxt] = await Promise.all([
  checkStaticFile(resolvedOrigin, "/robots.txt"),
  checkStaticFile(resolvedOrigin, "/sitemap.xml"),
  checkStaticFile(resolvedOrigin, "/llms.txt"),
]);

The llms.txt check is particularly important — it's a relatively new standard (proposed by the llmstxt community) that creates a machine-readable site index specifically for AI crawlers. Sites with an llms.txt file get significantly better AI citation rates.

4. The AI Layer: Structured Output from Gemini

For the AI analysis, we use Google Gemini's native structured output support. This is critical — without it, parsing free-form JSON from an LLM is fragile and error-prone.

async function analyzeWithGemini(url: string) {
  const siteData = formatSiteData(url, await getSiteData(url));
  const response = await getAiClient().models.generateContent({
    model: "gemini-3-flash-preview",
    contents: ANALYZE_PROMPT(url, siteData),
    config: {
      temperature: 0,           // deterministic output
      responseMimeType: "application/json",
      responseSchema: {
        type: Type.OBJECT,
        required: ["score", "breakdown", "missing", "suggestions", "summary"],
        properties: {
          score: { type: Type.NUMBER },
          breakdown: {
            type: Type.OBJECT,
            properties: {
              aiVisibility: { type: Type.NUMBER },
              faqCoverage: { type: Type.NUMBER },
              entityClarity: { type: Type.NUMBER },
              authority: { type: Type.NUMBER },
              contentStructure: { type: Type.NUMBER },
              summaryOptimization: { type: Type.NUMBER },
            },
          },
          missing: { type: Type.ARRAY, items: { type: Type.STRING } },
          suggestions: { type: Type.ARRAY, items: { type: Type.STRING } },
          summary: { type: Type.STRING },
        },
      },
    },
  });

  return JSON.parse(response.text || "{}");
}

Key design decisions here:

temperature: 0 — We want deterministic, reproducible results. No creativity needed.
responseSchema — This tells Gemini exactly what JSON shape to return. Without this, you get parsing errors, missing fields, and inconsistent types.
We feed the actual scraped signals into the prompt. The AI doesn't guess — it evaluates what we found.

Here's the prompt template (lib/ai-provider.ts):

export const ANALYZE_PROMPT = (url: string, siteData?: string) =>
  `Analyze the AI visibility (GEO) of the website: ${url}.

${siteData ? `Here are the actual signals detected from the website:\n${siteData}\n\nBase your analysis on these real signals rather than guessing.` : ''}
Evaluate these factors specifically using the signals above:
- contentStructure (0-100): How well the content is structured for AI parsing...
- summaryOptimization (0-100): How optimized the page is for AI summarization...

Return ONLY a JSON object with these exact keys:
{ "score": <number 0-100>, "breakdown": { ... }, "missing": [...], "suggestions": [...], "summary": "..." }`;

We also support OpenAI and Deepseek as fallback providers, switched via the AI_PROVIDER_DEFAULT environment variable. The architecture makes adding new providers trivial — just implement the same function signature.

5. Caching: In-Memory with Deduplication

Since every analysis hits an LLM API (costly) and scrapes a site (slow), caching is essential. We use a simple in-memory Map with 1-hour TTL:

interface CacheEntry<T> {
  data: T;
  expiresAt: number;
}

const store = new Map<string, CacheEntry<unknown>>();
const CLEAN_INTERVAL = 60_000;
let lastClean = 0;

function clean() {
  const now = Date.now();
  if (now - lastClean < CLEAN_INTERVAL) return;
  lastClean = now;
  for (const [key, entry] of store) {
    if (now > entry.expiresAt) store.delete(key);
  }
}

export function cacheGet<T>(key: string): T | null {
  clean();
  const entry = store.get(key);
  if (!entry || Date.now() > entry.expiresAt) return null;
  return entry.data as T;
}

export function cacheSet<T>(key: string, data: T, ttlMs: number): void {
  store.set(key, { data, expiresAt: Date.now() + ttlMs });
}

We also use a pending cache (pendingCache in analyze.ts) to deduplicate concurrent requests for the same URL — so if two users submit the same URL simultaneously, only one analysis runs:

const pendingCache = new Map<string, Promise<Record<string, unknown>>>();
// ...
if (pendingCache.has(cacheKey)) {
  return pendingCache.get(cacheKey)!;  // wait for in-flight request
}

For production, you'd want Redis or another distributed cache. This in-memory approach works well for single-instance deployments (like Vercel's serverless functions with concurrency).

6. Rate Limiting: Graceful Degradation

We use Upstash Redis for rate limiting with a sliding window. The critical design choice: fail open when Redis is unavailable, not fail closed.

let ratelimit: Ratelimit | null = null;

try {
  const redis = Redis.fromEnv();
  ratelimit = new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(max, "1 h"),
    analytics: true,
    prefix: "@citeflow/ratelimit",
  });
} catch {
  // Redis init failed — fall through, rate limiting is degraded
}

export async function checkRateLimit(ip: string): Promise<RateLimitResult> {
  if (!ratelimit) {
    return { success: true };  // allow request when Redis is down
  }
  try {
    const { success } = await ratelimit.limit(ip);
    return success
      ? { success: true }
      : { success: false, reason: 'rate_limited' };
  } catch {
    return { success: false, reason: 'redis_unavailable' };
  }
}

Why fail open? Because the free-tier tool is meant to be accessible. Blocking all users because Redis is having a bad day is worse than temporarily bypassing rate limits for a few requests.

7. The Report Page: SSR with Edge OG Images

The report page at app/report/[domain]/page.tsx uses Server-Side Rendering (SSR) with maxDuration: 60 (Vercel's timeout for Pro plans). This is necessary because:

We need to scrape the target site (network I/O)
We need to call Gemini/OpenAI (API latency)
We need the full data before rendering

export const maxDuration = 60;

export default async function ReportPage({ params }) {
  const { domain } = await params;
  const ip = getClientIp(headers());

  const result = await getReport(domain, ip);

  if (!result.ok) {
    // Render error states: rate_limited, timeout, failed
    return <ErrorState reason={result.reason} />;
  }

  return <ReportView data={result.data} />;
}

We also generate dynamic OG images per report using the Edge runtime:

// app/api/og/route.tsx — runs on Vercel Edge
export const runtime = 'edge';
export const dynamic = 'force-dynamic';

This means every report page has a unique social preview showing the domain and score — critical for shareability on X/Twitter and LinkedIn.

8. Lessons Learned

8.1 AI Hallucination Is Real — Always Ground It

The first version of our AI analysis didn't feed real scraped signals into the prompt. The AI made up plausible-sounding but completely wrong assessments. Always provide ground-truth data in the prompt and instruct the model to base its analysis on that data.

8.2 Structured Output > JSON Prompting

Before Gemini supported responseSchema, we used "output valid JSON only" in the prompt. It worked ~70% of the time. With structured output, it's ~99.9%. Use native structured output whenever your provider supports it.

8.3 Cache Aggressively, Deduplicate Concurrently

LLM API calls are expensive ($0.15–$3.00 per million tokens) and slow (2–5 seconds). An in-memory cache with request deduplication eliminated redundant calls entirely. For Vercel deployments with multiple concurrent invocations, the pendingCache pattern is essential.

8.4 Deterministic Checks Catch What AI Misses

The AI often missed simple things like "no lists on the page" or "meta description is too short." These are trivial to detect with regex but easy for an LLM to gloss over. The hybrid approach catches both.

8.5 GEO Is Still Nascent — Standards Are Evolving

llms.txt is a proposed standard, not a W3C spec. FAQ Schema behavior in AI search changes monthly. Building this kind of tool means constantly iterating as the ecosystem evolves. We treat our signal detection as a pluggable layer that can be updated independently of the AI analysis.

The full source of this architecture is running at GetCiteFlow — feel free to test your own site and see how the analysis works end-to-end. The tech stack: Next.js 15 (App Router, SSR, Edge Functions), React 19 with Tailwind CSS 4 + shadcn/ui, Google Gemini for AI analysis (OpenAI/Deepseek fallbacks), Upstash Redis for rate limiting, deployed on Vercel.

GEO is still the early days — much like SEO was in 1998. The sites that optimize for AI search today will have a compound advantage as AI assistants become the primary interface for information discovery.

If you're building something in this space or have questions about the architecture, I'd love to hear from you. Leave a comment below or reach out on X/Twitter.

Built by GetCiteFlow — AI visibility analysis for the AI-search era.

DEV Community

Building an AI Visibility Scanner: Hybrid AI Analysis Architecture

Building an AI Visibility Scanner: Hybrid AI Analysis Architecture

1. Understanding GEO: The Technical Layer AI Search Cares About

2. Architecture Overview: Hybrid Analysis with AI + Deterministic Checks

3. The Scraper: Extracting Deterministic Signals

4. The AI Layer: Structured Output from Gemini

5. Caching: In-Memory with Deduplication

6. Rate Limiting: Graceful Degradation

7. The Report Page: SSR with Edge OG Images

8. Lessons Learned

8.1 AI Hallucination Is Real — Always Ground It

8.2 Structured Output > JSON Prompting

8.3 Cache Aggressively, Deduplicate Concurrently

8.4 Deterministic Checks Catch What AI Misses

8.5 GEO Is Still Nascent — Standards Are Evolving

Top comments (0)