Palash Bagchi

Posted on Jun 6 • Edited on Jun 25

The Anatomy of a Machine's Mind - Decoding AEO, GEO

#aeo #generative #ai #seo

We are moving away from traditional "10 blue links" (where Google ranks a document) to a semantic synthesis model (where Google extracts the factual payload and generates the answer directly via AI Overviews or Gemini).

To architect a dashboard for AEO and GEO, we must stop looking at keyword density and start looking at Entity Salience and RAG (Retrieval-Augmented Generation) compatibility.

Here is the architectural breakdown of the Google APIs required to track, test, and optimize for the Generative Search era that i used while building nqzai conversational GTM platform.

1. The Entity Truth Layer: Knowledge Graph Search API (AEO)

Answer Engine Optimization relies heavily on Google's Knowledge Graph. If Google doesn't recognize your brand, product, or author as a definitive "Entity," you will not appear in Knowledge Panels, nor will an LLM trust your brand as a source of truth.

The API: Google Knowledge Graph Search API

This API lets you query Google’s exact semantic database to see how it mathematically maps entities (people, places, organizations).

The Data Points:
resultScore: The algorithmic confidence Google has in the entity match.
@id (Machine-Readable Entity ID or MREID): The unique identifier (e.g., /m/0k8z) Google assigns to a recognized entity.
description / detailedDescription: The exact factual payload Google associates with that entity.
The Enrichment Play (Brand Authority): You can programmatically query your brand name or executive team names monthly. If your resultScore is increasing, your AEO efforts (digital PR, schema markup, Wikipedia/Wikidata editing) are working. If your brand returns no MREID, you are invisible to the Answer Engine.

This is the exact right place to start. If you do not understand how Google mathematically defines reality, all downstream Answer Engine Optimization (AEO) efforts are essentially guessing.

When we talk about the Google Knowledge Graph Search API, we are no longer dealing with web pages, URLs, or HTML. We are dealing with Nodes (Entities) and Edges (Relationships).

Here is the microscopic breakdown of how Google categorizes, measures, and scores reality.

1. What are "People, Places, and Organizations"?

In traditional SEO, "Kakunin" is just a string of letters (a keyword). In the Knowledge Graph, an Entity is a fundamental unit of knowledge—a specific, identifiable thing.

Google does not use arbitrary labels to define these; it strictly adheres to the Schema.org vocabulary.

People: Mapped as schema.org/Person (e.g., Taylor Swift, or a company's CEO).
Organizations: Mapped as schema.org/Organization or sub-types like schema.org/LocalBusiness or schema.org/Corporation (e.g., Google, Kakunin).
Places: Mapped as schema.org/Place (e.g., Ranchi, Eiffel Tower).

The Practical Benchmark: When does a brand cross the threshold from being a "keyword" on a webpage to a recognized "Entity" in the Knowledge Graph?
The benchmark is reconciliation. Google’s Entity Reconciliation engine constantly scrapes the web. When it finds enough corroborating "Semantic Triples" (Subject-Predicate-Object data points, like Kakunin -> is a -> SoftwarePlatform), it clusters that data together. You have practically achieved Entity status the moment Google mints a unique machine identifier for you in its database.

2. Expanding the 3 Core Data Points

When you query the Knowledge Graph API, it returns a JSON-LD payload. Here is what those specific data points actually mean and the signals that drive them.

A. `@id` (The Machine-Readable Entity ID or MREID)

This is the canonical database key for the entity. It is the most important data point in AEO.

What it looks like: It typically starts with kg:/m/ (e.g., /m/0dl567) or kg:/g/.
The Heritage: The /m/ prefix stands for "Machine ID" and is a legacy identifier inherited from Freebase, the massive open-source database Google acquired to build its Knowledge Graph. Newer entities created directly by Google's ML systems often get a /g/ prefix.
The Signal: You cannot manually edit or request this ID. It is minted automatically when Google's reconciliation algorithms determine that enough structured data (Schema.org), Wikipedia mentions, and high-authority backlinks all point to the exact same distinct concept.

B. `detailedDescription` (The Factual Payload)

This is the text that an Answer Engine (like Gemini or AI Overviews) will read as the absolute, verified truth about your entity.

How it is measured: Google does not write this text. It extracts it from what it considers "Tier 1 Trust Data Sources."
The Signals: The overwhelming majority of detailed descriptions are pulled verbatim from Wikipedia or Wikidata.org. For organizations, it frequently pulls from licensed databases like Crunchbase or Bloomberg. If you do not have a presence in these highly moderated, public data repositories, your detailedDescription will be completely empty, giving Answer Engines zero factual payload to pull from.

C. `resultScore` (The Salience & Probability Metric)

This is not a static "authority score" like Domain Rating (DR). It is a dynamic numerical value that describes how perfectly an entity matches the search query context.

How it is measured: The score ranges from a few decimals to tens of thousands. For example, if you query "Brad Pitt", the American actor might return a resultScore of 30,000, while a lesser-known Australian boxer named Brad Pitt might return a score of 200.
The Signals that drive the math:
Search Volume (Entity Popularity): High aggregate search volume for the entity name increases its baseline score.
Graph Density (Backlinks for Entities): Just like webpages have backlinks, entities have graph links. If Google maps that the entity "Bill Gates" is connected to "Microsoft", "Melinda Gates", and "Harvard", his graph density is massive, spiking his resultScore.
Co-occurrence: If your organization's name frequently appears in the same paragraph as established industry terms (e.g., Kakunin appearing next to "MiCA regulation" and "AI Compliance" across high-authority news sites), the ML engine raises your probability score for those semantic queries.

Simulating the Knowledge Graph

To visualize how these signals compound to push a brand from a mere "keyword" to a fully reconciled Entity with a high resultScore, I have generated a Knowledge Graph Entity API Simulator below. Adjust the signals to see how the JSON-LD payload mutates in real-time.

{ "@context": { "@vocab": "http://schema.org/", "goog": "http://schema.googleapis.com/", "detailedDescription": "goog:detailedDescription", "resultScore": "goog:resultScore" }, "@type": "EntitySearchResult", "result": { "@id": "kg:/g/11bsled", "name": "Kakunin", "@type": [ "Organization" ], "detailedDescription": { "articleBody": "Kakunin is an established organization recognized by global semantic authorities.", "url": "https://en.wikipedia.org/wiki/Kakunin", "license": "https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License" } }, "resultScore": 74375 }

2. The Semantic Vector Layer: Google Cloud Natural Language API (GEO)

Generative Engine Optimization (GEO) requires your content to be easily parsed by Large Language Models (LLMs). LLMs do not read "keywords"; they calculate the mathematical relationship between words.

To optimize for AI Overviews (formerly SGE), you must feed your content into the same Natural Language Processing (NLP) engines Google uses to train its models.

The API: Google Cloud Natural Language API

This API exposes Google's internal machine learning models for syntax analysis, entity extraction, and sentiment analysis.

The Data Points:
entities: What nouns/concepts Google extracts from your text.
salience: A critical metric (ranging from 0.0 to 1.0) indicating the importance or centrality of an entity to the entire document text.
sentiment.score & sentiment.magnitude: How positive, negative, or neutral the text is.
The Enrichment Play (The Salience Audit): Before publishing a high-value SaaS landing page, pass the text through the NLP API. If your target product feature has a salience score of 0.12, but a competitor's integration mentioned off-hand has a score of 0.85, the LLM will completely misunderstand the core topic of your page. You must rewrite the syntax—using clearer subject-verb-object structures—until your core product hits a salience score above 0.70.

If Answer Engine Optimization (AEO) is about getting Google to recognize your existence as a factual "Entity" (via the Knowledge Graph), Generative Engine Optimization (GEO) is about controlling how an LLM reads, fragments, and scores your content.

Large Language Models (like Gemini or the models powering AI Overviews) do not read pages top-to-bottom like humans, nor do they count keyword frequencies like legacy Googlebot. They convert text into Semantic Vectors—lists of numbers representing the mathematical distance between concepts.

To master the GEO layer using the Google Cloud Natural Language API, there are three critical sub-engines you must understand, as they directly dictate whether your content is "RAG-friendly" (Retrieval-Augmented Generation).

1. Syntax Analysis (The Dependency Tree)

In traditional SEO, you could put the keyword "MiCA AI Compliance" at the top of the page, write 500 words of fluff, and still rank. In GEO, that will completely fail.

The Natural Language API features a analyzeSyntax method that generates a Dependency Parse Tree. It breaks every sentence into tokens (words) and maps the exact grammatical relationship between them (e.g., this noun is the subject, this verb is the root action, this adjective modifies the object).

How it works: It maps the "distance" between the Subject, the Verb, and the Object.
The GEO Benchmark (Subject-Verb Distance): LLMs rely on "attention mechanisms." If you write convoluted sentences using the passive voice (e.g., "Compliance with MiCA regulations, which are becoming increasingly strict across the European Union, is easily handled by Kakunin"), the mathematical distance between the entity ("Kakunin") and the action ("handles MiCA compliance") is too wide. The dependency tree breaks, and the LLM will drop the fact.
The Fix: You must write in dense, active, Subject-Verb-Object structures. ("Kakunin automates MiCA AI compliance.") This creates a tight dependency tree, resulting in a dense semantic vector that the LLM can easily extract and cite.

2. Entity Sentiment Analysis (Contextual Bias)

We briefly touched on salience (how important a word is to the page), but the API also exposes Entity Sentiment Analysis. This does not just measure if an article is generally "happy" or "sad"; it measures the exact emotional polarity attached to a specific entity within the text.

The Data Points:
score: Ranges from -1.0 (extremely negative) to 1.0 (extremely positive).
magnitude: Indicates the sheer volume of emotion, regardless of whether it's positive or negative (ranging from 0.0 to +inf).
The GEO Application (Competitor Conquesting): When users ask Gemini, "Which is better for AI governance, Kakunin or [Competitor]?", the engine doesn't just read feature lists. It aggregates the Entity Sentiment of both brands across the web. If your competitor has a higher positive score globally connected to the entity "AI governance", the LLM will confidently recommend them over you.
The Fix: When writing comparison pages (e.g., "Kakunin vs. Competitor X"), if you use overly aggressive, negative language against the competitor, the API will attach a high magnitude of negative score to that paragraph. Because LLMs are strictly programmed with safety filters to avoid generating toxic or highly biased text, they will often refuse to cite your comparison page entirely. Your competitive content must be structurally objective and emotionally neutral (score near 0.0) to be cited by an Answer Engine.

3. Content Classification v2 (The Taxonomy Filter)

LLMs have a limited "context window" (how much data they can process at once). To save computing power, before Google feeds a webpage to an LLM to generate an AI Overview, it filters the web using strict taxonomic categories.

The Natural Language API’s classifyText method maps your content against a hardcoded database of over 1,000 specific categories.

The Data Point: Returns categories like /Computers & Electronics/Enterprise Technology/Data Management alongside a confidence score (0.0 to 1.0).
The GEO Application (Topical Siloing): If you publish a blog post titled "The Cost of AI Non-Compliance," you want it classified under /Business & Industrial/Business Services/Consulting or /Law & Government/Legal. However, if your marketing team filled the post with metaphors about "crashing cars" or "paying expensive speeding tickets," the NLP engine might classify the page under /Autos & Vehicles.
The Result: When a user queries a legal/enterprise tech question, the retrieval engine will completely ignore your page because its Content Classification tag is mathematically mapped to the wrong industry taxonomy.

The Ultimate GEO Metric: "RAG-Readiness" (Content Chunking)

When you look at this API as a whole, it reveals how you must re-architect your landing pages.

Because LLMs extract data via RAG, they do not ingest your whole webpage. They ingest Semantic Chunks (usually a single <H2> header and the 1–2 paragraphs immediately below it).

If you pass a webpage through the Natural Language API, the API reads it linearly.

If your <H2> is a clever marketing pun (e.g., "Stop Flying Blind").
And your paragraph is just a bulleted list of features.
The API output: Salience is low, Syntax is broken (no verbs), and Entities are missing.

To achieve GEO dominance, every single section of your page must be a self-contained factual payload:

The Header: Must contain the primary Entity and the Question/Intent (e.g., <H2>How Kakunin Ensures MiCA Compliance</H2>).
The First Sentence: Must be a direct, Subject-Verb-Object answer with a neutral sentiment.
The Formatting: Must use HTML tables (<table>) or structured lists (<ul>) directly underneath the active sentence. LLMs assign incredibly high retrieval weight to HTML tables because the rows and columns already act as a pre-built relational database, requiring zero NLP guesswork.

If you optimize for the Natural Language API's Dependency Tree and Content Classifications rather than just "keyword volume," you ensure that when Gemini looks for a factual chunk to fulfill an enterprise search query, your data is the easiest mathematical vector for it to grab.

3. The Visibility Layer: Google Search Console API (The Blindspot)

Here is where the transition from traditional SEO to GEO gets messy. Google is currently injecting AI Overviews at the top of the SERP, but they are highly secretive about the analytics.

The Data Points: You still use the standard GSC API (impressions, clicks, ctr), but with a specific focus on the searchAppearance dimension.
The Enrichment Play (Structured Data): You can filter GSC data by searchAppearance types like FAQ, HOW_TO, or PRODUCT_SNIPPETS. Because AEO heavily relies on Schema.org markup to spoon-feed facts to Google, correlating a rise in these specific rich results with overall CTR is your best proxy for AEO success.
The "Gotcha" (The AI Overview Black Box): Currently, Google does not provide a searchAppearance filter for "AI Overviews" in GSC. If your site is cited as a source in an AI Overview, the clicks and impressions are simply lumped into standard web traffic.

It is exactly like SERP/SEO data measurement because it uses the exact same API endpoints. However, the way you must interpret the math is entirely inverted.

If you had asked me this last month, I would have told you that AI Overviews were a complete black box. Now we are looking at a massive, real-time architectural shift. On June 3, Google officially rolled out dedicated "Generative AI Performance" reporting inside Google Search Console.

The blindspot is officially lifting. Here is the fine-grained breakdown of how the Search Console API handles the Generative Engine Optimization (GEO) layer, and why the metrics mean something completely different now.

1. The New API Filter: Isolating the Machine

In traditional SEO, you query the GSC API with the searchType set to WEB (to see standard blue-link traffic) or IMAGE.

Now, the API is being updated to accept new Search Type filters specifically for AI Overviews and AI Mode. This allows your data pipeline to completely decouple standard human search behavior from machine-synthesized answers.

When you pass this new filter to the API, it returns the standard four metrics (impressions, clicks, ctr, position), but their definitions have radically mutated.

2. The Metric Mutation (Traditional vs. GEO)

To understand GEO, you have to abandon the traditional SEO dopamine hit of chasing "clicks." Here is how the math changes when you filter the API for AI Overviews:

Metric	Traditional Web Search	Generative AI Search (AI Overviews)
Impression	A user scrolled past your blue link on the page.	An LLM successfully extracted your data, synthesized it, and cited your URL.
Click	The user chose your link over a competitor's.	The user needed deep technical validation and clicked your citation card.
CTR	Standard range is 3% to 15%.	Standard range drops to 0.5% to 3% because the LLM answers the intent inline.
Position	Classical ranking list (1 through 10).	Binary variable. You are either embedded in the synthesis block (Position 0) or omitted.

3. The Fine-Grained Architecture of a GEO Impression

In the AI Overview context, an "Impression" is the ultimate victory metric. It proves that the Natural Language API (which we discussed earlier) successfully parsed your Semantic Vectors, and the Knowledge Graph recognized your Entity Salience.

When your dashboard logs an AI Impression for a query like "MiCA AI Compliance architecture," it means:

Google's LLM fan-out technique triggered.
The model scanned the internet for factual payloads.
It deemed your specific semantic chunk (your <h2> and active-voice paragraph) as the highest-trust, most mathematically relevant data point available.
It rendered your data and cited your brand on the user's screen.

4. Building the GEO Feedback Loop

Because the GSC API now separates this data, you can programmatically track the success of your GEO structural edits.

Here is what that automated workflow looks like:

Extract: Every week, your backend queries the GSC API twice for the exact same date range. Query A uses searchType: WEB. Query B uses the new searchType: AI_OVERVIEWS.
Transform: You join the data by query and page.
Analyze: You isolate the landing pages where WEB impressions are high, but AI_OVERVIEW impressions are zero.
Action: Those zero-AI pages are failing the machine-readability test. You must pass their text through the Natural Language API to fix the syntax dependency trees, convert bullet points to HTML tables, and tighten the subject-verb distance.

The goal of AEO and GEO is no longer to drive massive top-of-funnel traffic to your site. The goal is Narrative Control. If you dominate the AI Overview impressions, you control what the machine tells the world about your industry, even if the user never clicks through to your domain.

Migrating this pipeline from a low-code orchestrator (Make.com) into a native Next.js architecture is exactly how you productionize this for a SaaS environment or a high-performance internal tool.

By building this in Next.js, you eliminate the Airtable dependency, reduce API latency, and gain the ability to render the data in a minimalist, high-contrast dashboard (using your preferred shadcn/ui and Tailwind aesthetic).

Here is the complete full-stack architecture for your Generative Engine Optimization (GEO) God-Mode Dashboard.

1. The Backend: The GEO API Route

Instead of relying on Make.com, we will build a Next.js Route Handler (app/api/geo-audit/route.ts). This single endpoint acts as the orchestrator: it queries the Google Search Console API twice, merges the arrays, scrapes the failing URLs, and runs the semantic chunks through the Google Cloud Natural Language API.

First, install the required server dependencies:

npm install googleapis @google-cloud/language cheerio

Create the API route: app/api/geo-audit/route.ts

import { google } from 'googleapis';
import language from '@google-cloud/language';
import * as cheerio from 'cheerio';
import { NextResponse } from 'next/server';

// Initialize Google NLP Client
const nlpClient = new language.LanguageServiceClient();

// Initialize Google Search Console Client
const auth = new google.auth.GoogleAuth({
  scopes: ['https://www.googleapis.com/auth/webmasters.readonly'],
});
const searchconsole = google.searchconsole({ version: 'v1', auth });

const SITE_URL = 'https://kakunin.io'; // Replace with your verified GSC property
const TARGET_ENTITY = 'Kakunin';

export async function GET() {
  try {
    const sevenDaysAgo = new Date();
    sevenDaysAgo.setDate(sevenDaysAgo.getDate() - 7);
    const startDate = sevenDaysAgo.toISOString().split('T')[0];
    const endDate = new Date().toISOString().split('T')[0];

    // 1. Fetch Standard Web Traffic
    const webRes = await searchconsole.searchanalytics.query({
      siteUrl: SITE_URL,
      requestBody: {
        startDate,
        endDate,
        dimensions: ['page'],
        searchType: 'web',
        rowLimit: 1000,
      },
    });

    // 2. Fetch AI Overview Traffic (The 2026 Search Appearance filter)
    const aiRes = await searchconsole.searchanalytics.query({
      siteUrl: SITE_URL,
      requestBody: {
        startDate,
        endDate,
        dimensions: ['page'],
        dimensionFilterGroups: [{
          filters: [{ dimension: 'searchAppearance', operator: 'equals', expression: 'AI_OVERVIEWS' }]
        }],
        rowLimit: 1000,
      },
    });

    // 3. Merge the Data (The URL is the Join Key)
    const webData = webRes.data.rows || [];
    const aiData = aiRes.data.rows || [];

    const aiMap = new Map(aiData.map(row => [row.keys![0], row.impressions || 0]));

    const mergedData = webData.map(row => {
      const url = row.keys![0];
      const webImpressions = row.impressions || 0;
      const aiImpressions = aiMap.get(url) || 0;
      const captureRate = webImpressions > 0 ? (aiImpressions / webImpressions) : 0;

      return { url, webImpressions, aiImpressions, captureRate };
    });

    // 4. Filter for Failing Pages (High human traffic, 0 machine traffic)
    const failingPages = mergedData.filter(page => page.webImpressions > 100 && page.aiImpressions === 0);
    const auditResults = [];

    // 5. Scrape & NLP Audit the Failing Pages
for (const page of failingPages) {
  try {
    const response = await fetch(page.url, {
      headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) GEO-Auditor/1.0' }
    });
    const html = await response.text();
    const $ = cheerio.load(html);

    let isOptimized = true;
    let failingChunk = '';
    let recommendation = '';

    // Extract the primary H2 and paragraph block to test the RAG chunk
    const h2Text = $('h2').first().text().trim();
    const pText = $('h2').first().next('p').text().trim();

    if (h2Text && pText) {
      const chunkText = `${h2Text}. ${pText}`;
      failingChunk = chunkText;

      // Execute live calls to Google Cloud NLP
      const [entityRes] = await nlpClient.analyzeEntities({
        document: { content: chunkText, type: 'PLAIN_TEXT' }
      });
      const [syntaxRes] = await nlpClient.analyzeSyntax({
        document: { content: chunkText, type: 'PLAIN_TEXT' },
        encodingType: 'UTF8'
      });

      // Track if our target entity exists and has sufficient salience
      const targetEntityObj = entityRes.entities?.find(e => e.name?.toLowerCase() === TARGET_ENTITY.toLowerCase());
      const salience = targetEntityObj?.salience ?? 0;

      // Track if there are structural syntax issues (e.g., passive voice or massive token distances)
      const hasPassiveVoice = syntaxRes.tokens?.some(t => t.dependencyEdge?.label === 'NSUBJPASS');

      if (!targetEntityObj || salience < 0.4) {
        isOptimized = false;
        recommendation = `Entity '${TARGET_ENTITY}' salience is too low (${salience}). Rewrite the chunk to make your brand the active subject.`;
      } else if (hasPassiveVoice) {
        isOptimized = false;
        recommendation = "Passive voice syntax detected (NSUBJPASS). Convert your sentence structures to direct active voice.";
      }
    } else {
      isOptimized = false;
      recommendation = "Missing semantic HTML structure. Ensure your landing pages use explicit H2 tags followed by paragraph text.";
    }

    auditResults.push({
      url: page.url,
      webImpressions: page.webImpressions,
      status: isOptimized ? 'Optimized' : 'Failing',
      failingChunk: isOptimized ? null : failingChunk,
      recommendation: isOptimized ? null : recommendation
    });

  } catch (e) {
    console.error(`Failed to execute native cloud audit for: ${page.url}`, e);
  }
}

2. The Frontend: The Stripe-Inspired Dashboard

To maintain a high-contrast, minimalist, and professional aesthetic, we will build a client component that fetches the API route and renders the data using standard utility classes that mimic shadcn/ui structures (clean white cards, subtle gray borders, and strict typography).

Create the dashboard page: app/dashboard/geo/page.tsx

'use client';

import { useEffect, useState } from 'react';

// Define the shape of our API response
interface AuditResult {
  url: string;
  webImpressions: number;
  status: 'Optimized' | 'Failing';
  failingChunk: string | null;
  recommendation: string | null;
}

export default function GeoDashboard() {
  const [data, setData] = useState<AuditResult[]>([]);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    async function fetchAudit() {
      const res = await fetch('/api/geo-audit');
      const json = await res.json();
      if (json.success) {
        setData(json.data);
      }
      setLoading(false);
    }
    fetchAudit();
  }, []);

  return (
    <div className="min-h-screen bg-neutral-50 text-slate-900 p-8 font-sans">
      <div className="max-w-6xl mx-auto">

        {/* Header Section */}
        <header className="mb-10">
          <h1 className="text-3xl font-semibold tracking-tight">Generative Engine Optimization</h1>
          <p className="text-slate-500 mt-2">
            Monitoring the semantic vectors and AI Overview capture rates of your highest-traffic pages.
          </p>
        </header>

        {/* Dashboard Card */}
        <div className="bg-white border border-slate-200 rounded-xl shadow-sm overflow-hidden">
          {loading ? (
            <div className="p-12 text-center text-slate-400">Running NLP Semantic Vector Audit...</div>
          ) : (
            <table className="w-full text-left border-collapse">
              <thead>
                <tr className="border-b border-slate-100 bg-slate-50/50 text-sm font-medium text-slate-500">
                  <th className="p-4 pl-6">Landing Page URL</th>
                  <th className="p-4">Web Impressions</th>
                  <th className="p-4">GEO Status</th>
                </tr>
              </thead>
              <tbody className="divide-y divide-slate-100 text-sm">
                {data.map((row, index) => (
                  <tr key={index} className="hover:bg-slate-50 transition-colors">
                    <td className="p-4 pl-6 align-top">
                      <a href={row.url} className="font-medium text-indigo-600 hover:underline">
                        {row.url.replace('https://kakunin.io', '')}
                      </a>

                      {/* Inline Error Reporting for Failing Pages */}
                      {row.status === 'Failing' && row.failingChunk && (
                        <div className="mt-3 p-3 bg-red-50 border border-red-100 rounded-md">
                          <p className="text-xs font-semibold text-red-800 mb-1">Syntactic Breakdown:</p>
                          <p className="text-xs text-red-600 italic">"{row.failingChunk}"</p>
                          <p className="text-xs text-slate-600 mt-2 font-medium">↳ Action: {row.recommendation}</p>
                        </div>
                      )}
                    </td>
                    <td className="p-4 align-top tabular-nums text-slate-600">
                      {row.webImpressions.toLocaleString()}
                    </td>
                    <td className="p-4 align-top">
                      <span className={`inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-medium ${
                        row.status === 'Optimized' 
                          ? 'bg-emerald-100 text-emerald-800' 
                          : 'bg-red-100 text-red-800'
                      }`}>
                        {row.status}
                      </span>
                    </td>
                  </tr>
                ))}
                {data.length === 0 && (
                  <tr>
                    <td colSpan={3} className="p-8 text-center text-slate-500">
                      All high-traffic pages are passing the AI capture threshold.
                    </td>
                  </tr>
                )}
              </tbody>
            </table>
          )}
        </div>

      </div>
    </div>
  );
}

Architectural Deployment Notes

Vercel Cron Jobs: Instead of loading this dynamically every time you visit the page (which will eventually hit Google API rate limits), you should move the logic in the GET route into a background Cron Job using @vercel/cron. The cron job runs every Monday at 2:00 AM, writes the JSON payload to a lightweight Vercel Postgres or Redis instance, and the React component simply renders the cached database payload.
Cheerio vs. Puppeteer: The current API route uses Cheerio because it is lightning-fast and runs easily in a Next.js serverless environment. However, if your application is heavily client-side rendered (meaning the HTML is mostly empty until React mounts), Cheerio will fail to find the <h2> tags. In that case, you would need to swap Cheerio for a headless browser instance using puppeteer-core.

This Next.js architecture completely controls your data pipeline. It identifies the gap between your human traffic and machine-readability, audits the syntax exactly as Gemini would, and presents the workflow in a strictly professional interface.

4. The Testing Engine: Google Gemini API / Vertex AI (Simulation)

Because GSC obscures AI Overview data, the only way to truly test your GEO strategy is to build a synthetic testing environment. You must use an LLM to read your live site and see what it concludes.

The API: Gemini API (via Google AI Studio or Vertex AI)

Instead of waiting for Google's crawlers, you build an automated pipeline that asks Gemini questions about your specific niche.

The Data Points: Output tokens, citations, and semantic similarity scores.
The Enrichment Play (Synthetic RAG Testing):
You extract your top 50 target search queries (e.g., "What is the best AI compliance platform for MiCA?").
You use an automation script to ping the Gemini API with these exact queries, using a temperature of 0.0 (to force strict, factual retrieval rather than creative generation).
You parse the JSON response to see if your brand, URL, or specific proprietary terminology is cited in the generated answer.
If your brand is missing, it means your entity salience (Pillar 2) is too low, or your competitor's Knowledge Graph score (Pillar 1) is too high.

It is thrilling when this architecture finally clicks. We have covered how Google recognizes your existence (Knowledge Graph) and how it reads your syntax (Cloud Natural Language).

Now, we must measure the final output: How does the machine actually answer a human question?

You cannot wait for Google Search Console to slowly trickle in "AI Overview" data. To actively engineer your Generative Engine Optimization (GEO) strategy, you must build a synthetic testing environment. You do this by plugging directly into the Gemini API (or Vertex AI for enterprise endpoints) and turning on a specific feature: Google Search Grounding.

Here is the fine-grained breakdown of the data points exposed by the Gemini API and how to weaponize them for your dashboard.

1. The Data Points: The Anatomy of a Machine's Mind

When you send a prompt to the Gemini API with the GoogleSearch tool enabled, you are not just asking an LLM to guess an answer. You are forcing the model to query the live Google Search index, extract factual chunks, and synthesize a cited response.

The API returns a standard text response, but hidden inside the JSON payload is a critical object called groundingMetadata. This is the absolute goldmine for AEO.

Here are the specific data points exposed inside groundingMetadata:

Data Point	What it means	The GEO Value
`webSearchQueries`	An array of the exact search terms the LLM generated to fact-check your prompt.	Query Expansion. If you ask Gemini "Best MiCA compliance tools," and its internal `webSearchQuery` is "Enterprise AI governance software EU," you instantly know the exact semantic entities the machine associates with your product category.
`groundingChunks.web.uri`	The exact URLs the LLM scraped to generate the answer.	The Citation Leaderboard. This tells you definitively who the LLM trusts. If your URL is not in this array, your Entity Salience (from our previous step) is too low.
`groundingChunks.web.title`	The `<title>` tag of the cited webpage.	Snippet Optimization. Proves exactly which page titles are enticing the RAG engine to extract data.
`groundingSupports.segment`	The exact sentence in the LLM's generated response that corresponds to a specific chunk.	Factual Mapping. It mathematically maps which competitor's website is responsible for feeding which specific claim to the LLM.

The JSON Payload (What it actually looks like)

When you query the API, the metadata block looks exactly like this. This is the raw data your Next.js dashboard will parse:

"groundingMetadata": {
  "webSearchQueries": [
    "Kakunin MiCA AI compliance",
    "EU AI Act software solutions"
  ],
  "groundingChunks": [
    {
      "web": {
        "uri": "https://kakunin.io/docs/mica-framework",
        "title": "Automating MiCA Compliance | Kakunin Docs"
      }
    },
    {
      "web": {
        "uri": "https://techcrunch.com/2026/01/ai-regulation",
        "title": "How startups are navigating EU AI Rules"
      }
    }
  ],
  "groundingSupports": [
    {
      "segment": {
        "startIndex": 0,
        "endIndex": 85,
        "text": "Kakunin is an enterprise software platform that automates MiCA compliance for AI agents."
      },
      "groundingChunkIndices": [0]
    }
  ]
}

2. The Enrichment Play: The "Share of Model" Tracker

In traditional SEO, you use tools like Ahrefs to track your "Share of Voice" (how many keywords you rank for compared to competitors).

In the AEO era, you use the Gemini API to track your "Share of Model" (how often an LLM cites your architecture as the definitive source of truth).

Here is the exact enrichment play you build into your Next.js application:

The Prompt Matrix: You create a database of the top 50 questions an enterprise architect would ask about your industry (e.g., "What are the data retention requirements under MiCA for autonomous AI?" or "Compare Kakunin vs. [Competitor] for GDPR compliance.")
The Automation Loop: Every Friday, your backend pings the Gemini API with all 50 questions, ensuring tools=[Tool(google_search=GoogleSearch())] is passed in the request.
The Data Extraction: You ignore the text the LLM generates. You only care about the groundingMetadata.groundingChunks.web.uri array.
The Dashboard Visualization: Your Next.js frontend aggregates those URLs into a pie chart.

The Result: You now have a real-time, deterministic dashboard showing that out of 50 industry questions, Gemini cited Kakunin.io 14 times, cited Wikipedia 22 times, and cited your biggest competitor 31 times. You now know exactly where you stand in the machine's hierarchy of trust.

3. The Variance Data Points ("The Gotchas")

Testing with the Gemini API introduces a few strict architectural constraints that differ from standard web APIs:

Temperature Constraints: LLMs have a temperature setting that dictates creativity. If you set it to 0.0, the model becomes rigid and highly deterministic (ideal for strict RAG testing on your own internal documents). However, Google's 2026 documentation specifically states that when using Google Search Grounding, you must set the temperature to 1.0 for the algorithm to properly fan out and fetch live search results.
The Hallucination Gap: Just because your URL appears in the groundingChunks array does not mean the LLM actually said something positive about you. It just means it read your page. You must cross-reference the output with the Google Cloud Natural Language API's sentiment.score to ensure the LLM isn't extracting your URL merely to criticize your pricing model.
Cost Scaling: Unlike the standard Google Search Console API (which is free), hitting the Gemini API 500 times a day with Grounding enabled incurs token costs plus a flat fee per Search Grounding request. You must cache the JSON responses in your database rather than running the simulation live every time you load your dashboard.

This completely closes the loop. You track what humans search (GSC), how your code runs (Cloud Monitoring), how the machine reads your syntax (Cloud NLP), and finally, how the machine regurgitates your facts (Gemini Grounding API).

This is the exact script you need to build your "Share of Model" tracking dashboard.

Google recently released their official @google/genai SDK, which streamlines how we enable Google Search Grounding and extract the metadata payload.

Here is the complete Node.js script to run your automated prompt matrix, extract the machine's citations, and mathematically calculate your Share of Model against your competitors.

Step 1: Install the SDK

Initialize your project and install the official Google Gen AI SDK.

npm init -y
npm install @google/genai

Required Environment Configurations

GOOGLE_APPLICATION_CREDENTIALS="./gcp-service-account.json"
GEMINI_API_KEY="AIzaSy..."

Set your Gemini API key as an environment variable in your terminal:

export GEMINI_API_KEY="your_api_key_here"

Step 2: The Citation Tracker Script

Save this file as gemini-tracker.js. This script runs a batch of questions, forces the LLM to search the web, extracts the URLs the machine trusts, and builds a citation leaderboard.

import { GoogleGenAI } from '@google/genai';

// Initialize the official Gemini SDK
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// The entities you want to track in the leaderboard
const TARGET_DOMAIN = "kakunin.io";
const COMPETITOR_DOMAINS = ["techcrunch.com", "ibm.com", "wikipedia.org"];

// Your Prompt Matrix (The questions your target audience asks)
const promptMatrix = [
  "What are the best software platforms for automating MiCA AI compliance?",
  "Compare enterprise AI governance tools for EU regulations.",
  "How do developers ensure data retention compliance under MiCA?"
];

async function runShareOfModelTracker() {
  console.log(`\n🚀 INITIATING GEMINI CITATION TRACKER`);
  console.log(`Tracking citations for ${promptMatrix.length} queries...\n`);

  // Initialize our leaderboard scoreboard
  const scoreboard = {
    [TARGET_DOMAIN]: 0,
    "Other/Competitors": 0
  };
  COMPETITOR_DOMAINS.forEach(domain => scoreboard[domain] = 0);

  // Iterate through the Prompt Matrix
  for (const [index, prompt] of promptMatrix.entries()) {
    console.log(`\n[Query ${index + 1}/${promptMatrix.length}]: "${prompt}"`);

    try {
      // Ping Gemini with Google Search Grounding enabled
      const response = await ai.models.generateContent({
        model: "gemini-3.5-flash", // Use the latest flash model for fast/cheap RAG extraction
        contents: prompt,
        config: {
          // Temperature 1.0 is required for optimal Google Search fanning
          temperature: 1.0, 
          // This is the trigger that turns on Answer Engine features
          tools: [{ googleSearch: {} }] 
        }
      });

      // Navigate the JSON payload to extract the Grounding Metadata
      const metadata = response.candidates[0]?.groundingMetadata;

      if (!metadata?.groundingChunks) {
        console.log("   ⚠️ No web citations found for this query.");
        continue;
      }

      // Log the internal queries Gemini generated to find the answer
      if (metadata?.webSearchQueries) {
        console.log(`   🔍 Internal LLM Searches: [${metadata.webSearchQueries.join(", ")}]`);
      }

      console.log(`   🔗 URLs Cited by Gemini:`);

      // Analyze every URL the LLM extracted a fact from
      for (const chunk of metadata.groundingChunks) {
        if (!chunk.web?.uri) continue;

        const citedUrl = chunk.web.uri;
        console.log(`      - ${citedUrl}`);

        // Update the Share of Model Scoreboard
        let matched = false;

        if (citedUrl.includes(TARGET_DOMAIN)) {
          scoreboard[TARGET_DOMAIN]++;
          matched = true;
        } else {
          for (const competitor of COMPETITOR_DOMAINS) {
            if (citedUrl.includes(competitor)) {
              scoreboard[competitor]++;
              matched = true;
              break;
            }
          }
        }

        // Group all other citations into the generic bucket
        if (!matched) {
          scoreboard["Other/Competitors"]++;
        }
      }

    } catch (error) {
      console.error(`   ❌ API Error on query:`, error.message);
    }
  }

  // Calculate and Print the Final Share of Model Leaderboard
  console.log(`\n======================================================`);
  console.log(`📊 FINAL "SHARE OF MODEL" LEADERBOARD`);
  console.log(`======================================================`);

  // Calculate total citations to generate percentages
  const totalCitations = Object.values(scoreboard).reduce((a, b) => a + b, 0);

  if (totalCitations === 0) {
    console.log("No citations extracted across the prompt matrix.");
    return;
  }

  // Sort the scoreboard from highest citations to lowest
  const sortedLeaderboard = Object.entries(scoreboard).sort((a, b) => b[1] - a[1]);

  sortedLeaderboard.forEach(([domain, count]) => {
    const percentage = ((count / totalCitations) * 100).toFixed(1);
    const label = domain === TARGET_DOMAIN ? `🎯 ${domain} (YOU)` : `   ${domain}`;
    console.log(`${label.padEnd(25)} | ${count} citations (${percentage}%)`);
  });
  console.log(`======================================================\n`);
}

// Execute the tracker
runShareOfModelTracker();

How to Run and Interpret This Data

Run the script directly in your terminal:

node gemini-tracker.js

The Console Output:
When the script finishes, you will see a leaderboard that looks like this:

======================================================
📊 FINAL "SHARE OF MODEL" LEADERBOARD
======================================================
   wikipedia.org          | 12 citations (40.0%)
🎯 kakunin.io (YOU)       | 8 citations (26.6%)
   Other/Competitors      | 7 citations (23.3%)
   techcrunch.com         | 3 citations (10.0%)
   ibm.com                | 0 citations (0.0%)
======================================================

The Enrichment Play (What to do next)

This script moves you from reactive SEO to proactive GEO.

Spot Semantic Theft: If techcrunch.com is outscoring you on queries about your own proprietary features, you look at their page using the Cloud Natural Language API. You will likely find their Subject-Verb structures are tighter than yours. You update your landing page to match.
Find New Verticals: Look at the Internal LLM Searches printed in the console. If you prompt Gemini about "AI Compliance," but it natively translates that prompt into an internal search for "LLM bias mitigation frameworks," you now have the exact entity language you need to inject into your <h2> tags.
Dashboard Integration: Instead of running this in the terminal, you can easily plug this exact JavaScript logic into your Next.js API route we built earlier. You store the scoreboard payload in a database every week and render a line chart showing your "Share of Model" growing over time as your optimization efforts compound.

The Architectural "Gotcha" of the GEO Era: Zero-Click Variance

In the traditional dashboard we built earlier, the ultimate metric was GA4 sessions. In the AEO/GEO era, you must prepare executives for the Zero-Click Reality.

If you optimize perfectly for Answer Engines, Google will extract your factual payload (e.g., Kakunin's pricing tiers) and display it directly in the AI Overview. The user gets their answer and never clicks your link.

Your GSC impressions will skyrocket, but your GA4 sessions will plummet. If you do not decouple "Brand Visibility" (Impressions + LLM Citations) from "Traffic Acquisition" (Clicks + Sessions) in your reporting architecture, perfect GEO execution will look like a catastrophic traffic failure on your dashboard.

By unifying client-side visibility data with server-side natural language audits and synthetic LLM simulations, developers can move from blindly chasing legacy keywords to systematically commanding their brand's narrative across the entire generative web ecosystem.

References and citations

Here is the complete, structured list of official documentation and reference links for the APIs we discussed to build your God-Mode tracking architecture.

1. Google Knowledge Graph Search API (AEO Layer)

This API is used to query Google’s semantic database to measure Entity Salience, extract Machine-Readable Entity IDs (MREIDs), and benchmark brand authority.

Official Documentation: Google Knowledge Graph Search API
Enterprise Edition Overview: Enterprise Knowledge Graph Documentation

2. Google Cloud Natural Language API (GEO Semantic Vector Layer)

This API exposes the machine learning models required to run a "Salience Audit," analyze syntax dependency trees, and measure emotional polarity to ensure content is RAG-friendly.

Official Documentation & Quickstarts: Cloud Natural Language API Documentation
REST API Reference: Natural Language API Reference
Client Libraries (Node.js/Python): Natural Language Client Libraries

3. Google Search Console API (Visibility Layer)

Used to extract standard web traffic data and, crucially, to filter for the AI_OVERVIEWS search appearance to decouple human search behavior from machine-synthesized answers.

Official Documentation: Google Search Console API
API Reference (Search Analytics Query): SearchAnalytics.query Method
Search Appearance Dimensions Guide: Search Console Dimensions & Filters

4. Google Gemini API / Vertex AI (Simulation & Grounding Layer)

Used to build the synthetic testing loop that pings the LLM with Google Search Grounding enabled. This allows you to extract groundingMetadata and calculate your "Share of Model."

Gemini API Search Grounding Guide: Grounding with Google Search (Google AI for Developers)
Vertex AI Search Grounding (Enterprise): Grounding in Vertex AI
Official Google Gen AI SDK (@google/genai): Google Gen AI SDK GitHub/NPM Reference

Honorable Mention: Supplemental Reference

Google APIs Explorer: APIs Explorer Dashboard (A highly useful tool for testing these specific API payloads directly in the browser without writing an extraction script first).