Building Glippy MCP: giving Claude the power to audit a site's AI readiness

I spent the last few weekends turning Glippy, a desktop app and browser extension that scores a site's readiness for AI crawlers, into a Model Context Protocol (MCP) server. The result is glippy-mcp: a Node.js binary that plugs into Claude Desktop, Claude Code, Cursor, Windsurf, and anything else that speaks MCP, then exposes nine tools for analysing, comparing, and exporting GEO reports on any domain.

This post walks through why I built it, what it does, and the handful of design decisions that actually mattered.

Why an MCP server at all?

Glippy already had a perfectly good desktop app. The engine geo-checker.js fetches robots.txt, llms.txt, the homepage HTML, sitemap.xml, and a few security headers, then runs 10 weighted scoring categories (Structured Data, Semantic HTML, Machine Readability, Citability & Answer-Readiness, and so on). You paste a domain, you get a report.

The problem is that I kept finding myself copy-pasting that report back into Claude to ask follow-up questions like "which of these issues should I fix first for a Shopify site?" or "compare this to the three competitors in the report I ran yesterday." The conversation loop was slow and lossy — Claude was operating on stale text instead of live crawls.

MCP fixes that. Instead of me being a ferry between two tools, Claude can call analyze_domain, compare_domains, or analyze_sitemap directly during the conversation. The model decides when a fresh crawl is needed, and my job shrinks to asking good questions.

What the server actually exposes

Nine tools, all stdio-transport JSON-RPC 2.0 under the hood:

Tool	What it does
`analyze_domain`	Full 10-category GEO analysis of one domain
`check_robots_txt`	Which AI crawlers (GPTBot, ClaudeBot, …) are blocked
`check_llms_txt`	Is there an `llms.txt`? Show the contents
`get_geo_summary`	Quick score + top 3 strengths and weaknesses
`compare_domains`	Run 2–10 domains in parallel, rank them
`analyze_sitemap`	Fetch a sitemap, score every page
`analyze_urls`	Same, but for an arbitrary URL list
`export_report`	Styled Markdown or HTML report for one domain
`export_bulk_report`	Same, for comparisons / sitemaps / URL sets

Everything interesting lives in src/geo-checker.js (the scoring engine reused from the desktop app) and src/index.js (the MCP wrapper).

The skeleton: less code than you'd think

The MCP SDK does most of the heavy lifting. A minimal version of the server is about twenty lines:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import { checkGEO } from "./geo-checker.js";

const server = new McpServer({ name: "glippy-mcp", version: "0.1.0" });

server.tool(
  "analyze_domain",
  "Full GEO analysis of a domain",
  {
    domain: z.string().describe("e.g. example.com — no https:// prefix"),
    max_pages: z.number().int().min(1).max(10).optional(),
  },
  async ({ domain, max_pages = 10 }) => {
    const result = await checkGEO(domain, { maxPages: max_pages });
    return { content: [{ type: "text", text: formatReport(result) }] };
  },
);

await server.connect(new StdioServerTransport());

Zod schemas double as both runtime validation and the JSON schema Claude sees when deciding how to call the tool, so clear .describe() text matters more than the parameter name. "Do not include https://" in the description saves a lot of round-trips where the model would otherwise guess wrong.

The decisions that actually mattered

1. Reuse the engine, don't rewrite it

geo-checker.js is ~4,800 lines of cheerio-based HTML inspection that has already been battle-tested against thousands of real-world sites. The MCP wrapper imports its public functions (checkGEO, analyseHTML, analyseRobotsTxt, parseSitemapUrls, throttledFetchUrl, aggregatePageScores) and does zero scoring itself. Every bug fix in the desktop app flows through to the MCP server for free.

If you're MCP-ifying an existing tool, resist the urge to "do it properly this time." Wrap what you have.

2. Keep everything local except the license check

A Glippy MCP license key (GLMCP-XXXX-XXXX-XXXX) hits a Cloudflare Worker (mcp-worker/) on first use and caches the result for 24 hours. Actual crawling and scoring run on the user's machine: no domains, results, or HTML ever leave their box.

That choice kept the server very cheap to run (my Worker handles only verify/deactivate/Stripe-webhook traffic) and kept privacy-sensitive users happy. The validation logic falls back gracefully: if the license server is unreachable but a cached valid license exists, the tool keeps working.

3. Two tricks to avoid re-crawling

Crawling a sitemap of 500 pages is expensive and rude. I added two layers of deduplication.

In-memory cache, 5-minute TTL. Keyed on domain + maxPages. The clever bit: if you ask for max_pages=3 and there's already a cached run at max_pages=5, the cache hits. Subsequent tools in the same conversation (get_geo_summary, export_report) reuse the crawl automatically.

Explicit JSON output mode. For workflows where the model needs to generate multiple report formats, every analysis tool accepts output_format="json". The raw result object can then be handed to export_report or export_bulk_report via an analysis_result parameter, bypassing the cache entirely. This shows up in practice as:

analyze_domain  domain="example.com" max_pages=5 output_format="json"
→ export_report format="html" analysis_result=<from above>
→ export_report format="markdown_full" analysis_result=<from above>

One crawl, three artifacts.

4. Per-domain rate limiting, not global

The batch tools (analyze_sitemap, analyze_urls, compare_domains) fan out concurrent fetches. Naively doing this against a single origin will get you rate-limited — or worse, get you blocked. The throttledFetchUrl helper in geo-checker.js keeps a per-host queue with a configurable delay (default 5 rps, tunable via the GLIPPY_RATE_LIMIT env var or a rate_limit parameter) while a global semaphore caps total in-flight requests at 10.

Result: comparing example.com and competitor.com runs effectively in parallel because they're different origins, but hammering a single sitemap stays polite.

5. Stderr for logs, stdout is sacred

MCP over stdio means stdout is a JSON-RPC channel. A stray console.log anywhere in the engine will corrupt the frame and the client will disconnect with a cryptic parse error. Route every log through console.error, and audit third-party dependencies for chatty output. I caught one cheerio helper printing a deprecation warning to stdout in an older version; pinning fixed it.

Client config: the part users actually interact with

Getting an MCP server installed is still the single biggest adoption barrier. I wrote one config block and then copy-pasted it across every guide:

{
  "mcpServers": {
    "glippy-geo": {
      "command": "npx",
      "args": ["-y", "glippy-mcp"],
      "env": {
        "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
      }
    }
  }
}

Same JSON works for Claude Desktop, Claude Code (.mcp.json), Cursor (.cursor/mcp.json), Windsurf (.windsurf/mcp.json), and Continue.dev. Using npx -y means users don't manage a global install; they always get the latest published version.

For ChatGPT / OpenAI, which doesn't speak MCP natively yet, a small bridge does the job, but that's a post for another day.

What I'd do differently

Ship JSON-mode from day one. I added it in v0.1 after realising chained exports were the most common workflow. Cache-hit logic is fine, but explicit result passing is faster and more predictable for agents.
Fewer tools, sharper descriptions. Nine is on the edge of "too many to reason about." In hindsight, analyze_domain with rich options subsumes half of the others. Next major version might consolidate.
Streaming responses. A full sitemap crawl can take a minute. Right now it's a single tool call; a streaming update ("scored 42/500 pages…") would be a nicer UX once MCP clients support progress notifications more widely.

Try it

npx -y glippy-mcp   # needs a license key — grab one at glippy.dev

Drop the config block into your MCP client of choice and ask Claude something like:

Give me a GEO readiness summary for stripe.com and explain the top three issues in plain English.

The whole project ended up being a small reminder that MCP is mostly just "expose your existing tool well." The SDK is thin, the protocol is boring in a good way, and the hard problems: caching, rate limiting and clean log separation are the same ones you already know from building any CLI.