Meriç Cintosun

Posted on Apr 6 • Originally published at mericcintosun.com

Web MCP and Agent-Ready Architectures: Building Next.js Sites for AI Agents

#agents #ai #mcp #nextjs

What the Model Context Protocol Actually Does on the Web

The Model Context Protocol (MCP) is an open standard, originally published by Anthropic, that defines how AI agents communicate with external data sources and tools. On the server side, MCP has been used extensively to connect language models to databases, APIs, and file systems. The web variant of MCP, often called Web MCP, extends this idea to the browser and to publicly accessible HTTP surfaces, enabling agents to read, reason over, and take action on web pages and web-hosted interfaces.

When an AI agent navigates the web today without any MCP integration, it is essentially screen-reading. It receives rendered HTML, attempts to extract structure from what is visually presented, and uses heuristics to decide which elements are interactive. This approach works poorly at scale. Pages built for human visual perception carry enormous noise from a machine-reasoning perspective: decorative wrappers, layout scaffolding, duplicate navigation trees, and implicit semantics that humans resolve through visual context. Web MCP addresses this by giving developers a formal channel through which they can expose structured, action-oriented representations of their interfaces directly to agents.

Google's work on Web MCP, announced as part of the broader ecosystem push following the MCP open-sourcing, focuses on making web pages first-class participants in agentic workflows. A page that implements Web MCP exposes a manifest describing its capabilities: what actions are available, what data can be read, and what schemas govern both input and output. The agent consults this manifest before attempting any interaction, which eliminates the guesswork that makes headless browsing fragile and expensive.

Why HTML Semantics Are an Agent Interface

Before reaching for MCP tooling, every developer building an agent-ready interface needs to understand a more fundamental constraint: agents parse HTML, and the information density of that HTML determines how much useful signal the agent can extract without any additional tooling.

Semantic HTML is not a best practice retrofit for accessibility audits. For an agent, the difference between a <button> and a <div class="btn"> is the difference between a clearly typed interaction target and an element that requires learned heuristics to classify. The <button> element carries implicit ARIA role button, is keyboard focusable by default, and appears in the accessibility tree in a way that programmatic clients can enumerate. The styled <div> carries none of that information unless the developer adds it manually and correctly.

The same principle applies at every level of document structure. A <main> landmark tells an agent where primary content begins, allowing it to skip navigation, sidebars, and footer content when answering a question about page content. An <article> element signals a self-contained unit of content with its own meaning. <nav> delineates navigation regions. <section> with an associated aria-labelledby attribute creates a labeled content region that an agent can reference by name rather than by position. None of this requires MCP. It is the baseline on top of which MCP becomes genuinely useful rather than a patch over a structural deficit.

For interactive surfaces, the implications are even more direct. An agent attempting to complete a form needs to know which label corresponds to which input. The <label for="..."> / id pairing provides this unambiguously. The aria-describedby attribute points to supplementary instructions. The required attribute on an <input> communicates validation constraints without requiring the agent to submit the form and parse an error response. These are not conveniences; they are the load-bearing structural elements of a machine-readable interface.

The Web MCP Architecture

A Web MCP implementation has three components: a capability manifest served at a well-known URL, a set of action endpoints that the agent can invoke, and the annotated HTML interface that the agent reads and interacts with when direct API access is insufficient or unavailable.

The manifest is a JSON document, typically served at /.well-known/mcp.json, that describes the page or application in structured terms. It lists named actions with their input schemas, specifies the data types a reading agent can extract, and declares any authentication requirements. When an agent encounters a domain that serves this file, it can shift from heuristic page-reading to structured capability invocation.

{
  "schema_version": "1.0",
  "name": "Product Catalog Interface",
  "description": "Agent-accessible product search and detail retrieval for an e-commerce storefront.",
  "actions": [
    {
      "name": "search_products",
      "description": "Search the product catalog by keyword, category, or price range.",
      "endpoint": "/api/mcp/search",
      "method": "POST",
      "input_schema": {
        "type": "object",
        "properties": {
          "query": { "type": "string" },
          "category": { "type": "string" },
          "max_price": { "type": "number" }
        },
        "required": ["query"]
      },
      "output_schema": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "id": { "type": "string" },
            "name": { "type": "string" },
            "price": { "type": "number" },
            "url": { "type": "string" }
          }
        }
      }
    }
  ],
  "readable_regions": [
    {
      "selector": "main[aria-label='Product listing']",
      "description": "Paginated list of products matching current filters."
    }
  ]
}

This manifest allows an agent to invoke search_products via a direct HTTP call rather than simulating user interaction. When the agent needs information that only exists in the rendered interface (because it is computed client-side, for example), it falls back to the readable_regions specification, which tells it exactly where to look and what it will find there.

Action Endpoints in Next.js

Next.js Route Handlers provide a natural implementation layer for MCP action endpoints. Each action in the manifest maps to a Route Handler that validates input, executes business logic, and returns a structured response. We colocate these under /app/api/mcp/ to make the MCP surface easy to audit and version.

// app/api/mcp/search/route.ts
import { NextRequest, NextResponse } from "next/server";
import { z } from "zod";
import { searchProducts } from "@/lib/catalog";

const SearchInputSchema = z.object({
  query: z.string().min(1).max(200),
  category: z.string().optional(),
  max_price: z.number().positive().optional(),
});

export async function POST(request: NextRequest) {
  const body = await request.json();
  const parsed = SearchInputSchema.safeParse(body);

  if (!parsed.success) {
    return NextResponse.json(
      { error: "Invalid input", details: parsed.error.flatten() },
      { status: 400 }
    );
  }

  const results = await searchProducts(parsed.data);

  return NextResponse.json(results, {
    headers: {
      "Content-Type": "application/json",
      "X-MCP-Action": "search_products",
    },
  });
}

Zod handles input validation before any database or service call executes. The X-MCP-Action response header makes it straightforward to trace agent-originated requests in server logs, which matters when you need to separate agent traffic from human traffic in analytics or rate limiting.

Serving the Manifest

The manifest itself is a static file in most cases, but Next.js allows us to generate it dynamically if the available actions vary by authentication state or deployment configuration.

// app/.well-known/mcp.json/route.ts
import { NextResponse } from "next/server";
import { getMCPManifest } from "@/lib/mcp-manifest";

export async function GET() {
  const manifest = getMCPManifest();
  return NextResponse.json(manifest, {
    headers: {
      "Cache-Control": "public, max-age=3600",
      "Access-Control-Allow-Origin": "*",
    },
  });
}

The Cache-Control header matters here. Agents that repeatedly visit a site should not trigger a manifest fetch on every page load; a one-hour cache is a reasonable default for a stable manifest. The Access-Control-Allow-Origin: * header allows agents operating in browser contexts to fetch the manifest from any origin, which is appropriate because the manifest contains no sensitive information.

Bootstrapping an Agent-Ready Next.js Project

The standard create-next-app scaffolding gives us a solid base, but building an agent-ready application from scratch means making deliberate choices during setup that are easier to establish early than to retrofit.

We start with the App Router, TypeScript, and Tailwind, which is the current default configuration:

npx create-next-app@latest my-agent-app \
  --typescript \
  --tailwind \
  --eslint \
  --app \
  --src-dir \
  --import-alias "@/*"

We then install the packages needed for MCP integration and schema validation:

npm install zod
npm install --save-dev @types/node

For projects that need to expose server-sent events or WebSocket connections to agents (for streaming action results), we also add:

npm install eventsource-parser

The directory structure for an agent-ready application diverges from a purely user-facing site in a few predictable ways. We isolate MCP-specific logic to prevent it from bleeding into application code and to make it easy to audit:

src/
  app/
    .well-known/
      mcp.json/
        route.ts          # Dynamic manifest generation
    api/
      mcp/
        search/
          route.ts        # MCP action: search
        read/
          route.ts        # MCP action: structured read
    (site)/
      layout.tsx          # Human-facing layout with semantic landmarks
      page.tsx
  lib/
    mcp-manifest.ts       # Manifest builder
    mcp-auth.ts           # Agent authentication helpers
  components/
    semantic/             # Components built with correct ARIA roles

TypeScript Contracts for MCP Actions

Every MCP action should have a TypeScript type that mirrors its JSON schema. This creates a compile-time guarantee that the manifest and the implementation agree, and it makes the action contract readable without consulting external documentation.

// lib/mcp-types.ts

export interface MCPAction<TInput, TOutput> {
  name: string;
  description: string;
  endpoint: string;
  method: "GET" | "POST" | "PUT" | "DELETE";
  inputSchema: Record<string, unknown>;
  outputSchema: Record<string, unknown>;
  handler: (input: TInput) => Promise<TOutput>;
}

export interface ProductSearchInput {
  query: string;
  category?: string;
  max_price?: number;
}

export interface ProductSearchOutput {
  id: string;
  name: string;
  price: number;
  url: string;
}

export interface MCPManifest {
  schema_version: string;
  name: string;
  description: string;
  actions: Array<{
    name: string;
    description: string;
    endpoint: string;
    method: string;
    input_schema: Record<string, unknown>;
    output_schema: Record<string, unknown>;
  }>;
  readable_regions: Array<{
    selector: string;
    description: string;
  }>;
}

These types do not replace the Zod schemas used for runtime validation; they complement them. Zod schemas validate untrusted input at the boundary. TypeScript types express the contract within the codebase.

Building Semantic Components for Agent Readability

React components in a Next.js application abstract away HTML. When developers choose component primitives carelessly, that abstraction strips semantic meaning from the rendered output. We need to be deliberate about what HTML each component produces.

Page Layout with Correct Landmarks

The application shell should use the full set of HTML5 sectioning elements. Agents use these to orient themselves within the page before attempting to read or interact.

// app/(site)/layout.tsx
import { ReactNode } from "react";

interface SiteLayoutProps {
  children: ReactNode;
}

export default function SiteLayout({ children }: SiteLayoutProps) {
  return (
    <div className="min-h-screen flex flex-col">
      <header role="banner" className="border-b">
        <nav aria-label="Primary navigation">
          <a href="/">Home</a>
          <a href="/products">Products</a>
          <a href="/about">About</a>
        </nav>
      </header>
      <main id="main-content" className="flex-1">
        {children}
      </main>
      <footer role="contentinfo" className="border-t">
        <p>2024 My Agent App</p>
      </footer>
    </div>
  );
}

The role="banner" on the <header> and role="contentinfo" on the <footer> are redundant when these elements are direct children of <body>, because the browser assigns those roles implicitly. They become necessary when the elements are nested inside other structural containers, as is common in Next.js layouts. Adding them explicitly costs nothing and removes ambiguity.

Data Regions with Machine-Readable Annotations

When we want agents to read structured data from the DOM rather than calling an API, we annotate the relevant regions with attributes that correspond to the readable_regions entries in the manifest.

// components/ProductListing.tsx
interface Product {
  id: string;
  name: string;
  price: number;
  url: string;
  category: string;
}

interface ProductListingProps {
  products: Product[];
  totalCount: number;
}

export function ProductListing({ products, totalCount }: ProductListingProps) {
  return (
    <main aria-label="Product listing" data-mcp-region="product-listing">
      <p aria-live="polite">
        Showing {products.length} of {totalCount} products
      </p>
      <ul role="list" aria-label="Products">
        {products.map((product) => (
          <li
            key={product.id}
            role="article"
            aria-label={product.name}
            data-product-id={product.id}
            data-product-category={product.category}
          >
            <a href={product.url}>
              <h2>{product.name}</h2>
            </a>
            <p>
              <span aria-label="Price">
                <data value={product.price}>${product.price.toFixed(2)}</data>
              </span>
            </p>
          </li>
        ))}
      </ul>
    </main>
  );
}

The <data> element pairs a machine-readable value attribute with a human-readable display string. An agent reading this page can extract the numeric price directly from data.value without parsing "$29.99" from text. The data-mcp-region attribute provides a stable selector for the manifest's readable_regions entry without coupling to CSS class names, which change whenever a designer updates the stylesheet.

Forms as Agent Interaction Points

Forms are among the most common agent interaction targets. A well-constructed form is self-documenting for both human users and agents.

// components/SearchForm.tsx
"use client";

import { useState, FormEvent } from "react";
import { useRouter } from "next/navigation";

export function SearchForm() {
  const router = useRouter();
  const [query, setQuery] = useState("");

  function handleSubmit(event: FormEvent<HTMLFormElement>) {
    event.preventDefault();
    const params = new URLSearchParams({ q: query });
    router.push(`/products?${params.toString()}`);
  }

  return (
    <form
      role="search"
      aria-label="Search products"
      onSubmit={handleSubmit}
      data-mcp-form="product-search"
    >
      <div>
        <label htmlFor="search-query">Search products</label>
        <input
          id="search-query"
          name="q"
          type="search"
          value={query}
          onChange={(e) => setQuery(e.target.value)}
          aria-describedby="search-hint"
          placeholder="Enter product name or keyword"
          autoComplete="off"
          required
          minLength={1}
          maxLength={200}
        />
        <p id="search-hint" className="sr-only">
          Enter one or more keywords to search the product catalog. Results
          update automatically.
        </p>
      </div>
      <button type="submit">Search</button>
    </form>
  );
}

The role="search" on the form element designates it as a search landmark in the accessibility tree, which an agent can locate directly. The aria-describedby attribute links the input to its hint text. The minLength and maxLength attributes encode the same validation constraints as the Zod schema on the server side, which means an agent can read the form's constraints and validate its planned input before submission rather than after rejection.

Agent Authentication and Rate Limiting

Not every MCP action should be publicly accessible. Some actions read private user data; others mutate state and carry business risk if invoked at scale without controls. We handle agent authentication at the Route Handler level, separating it from human session authentication.

Agents can authenticate using API keys passed in the Authorization header following the Bearer token pattern. We validate these keys in a shared middleware helper rather than duplicating validation logic across every MCP route.

// lib/mcp-auth.ts
import { NextRequest, NextResponse } from "next/server";

const VALID_AGENT_KEYS = new Set(
  process.env.MCP_AGENT_KEYS?.split(",").map((k) => k.trim()) ?? []
);

export interface AgentContext {
  agentId: string;
  tier: "standard" | "premium";
}

export function authenticateAgent(
  request: NextRequest
): AgentContext | NextResponse {
  const authHeader = request.headers.get("Authorization");
  if (!authHeader?.startsWith("Bearer ")) {
    return NextResponse.json(
      { error: "Missing or malformed Authorization header" },
      { status: 401 }
    );
  }

  const token = authHeader.slice(7);
  if (!VALID_AGENT_KEYS.has(token)) {
    return NextResponse.json({ error: "Invalid agent key" }, { status: 403 });
  }

  // In production, look up the agent record by token to get tier and ID.
  return {
    agentId: token.slice(0, 8),
    tier: "standard",
  };
}

We load valid keys from an environment variable at module initialization rather than from a database on each request. For small key sets this is appropriate; for large deployments we replace the Set lookup with a Redis query that the Next.js middleware layer can execute with acceptable latency.

Rate limiting for agent traffic deserves separate accounting from human traffic. Agents can invoke actions in tight loops without the natural pacing that human users provide. We apply rate limits at the agent key level using a sliding window counter, which is straightforward to implement with Upstash Redis or a similar edge-compatible store.

// lib/mcp-rate-limit.ts
import { NextRequest } from "next/server";

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetAt: number;
}

// In production, replace this in-memory store with a Redis-backed implementation.
const requestCounts = new Map<string, { count: number; windowStart: number }>();

const WINDOW_MS = 60_000; // 1 minute
const MAX_REQUESTS = 60;  // 60 requests per agent per minute

export function checkRateLimit(agentId: string): RateLimitResult {
  const now = Date.now();
  const existing = requestCounts.get(agentId);

  if (!existing || now - existing.windowStart > WINDOW_MS) {
    requestCounts.set(agentId, { count: 1, windowStart: now });
    return { allowed: true, remaining: MAX_REQUESTS - 1, resetAt: now + WINDOW_MS };
  }

  if (existing.count >= MAX_REQUESTS) {
    return {
      allowed: false,
      remaining: 0,
      resetAt: existing.windowStart + WINDOW_MS,
    };
  }

  existing.count += 1;
  return {
    allowed: true,
    remaining: MAX_REQUESTS - existing.count,
    resetAt: existing.windowStart + WINDOW_MS,
  };
}

The in-memory store in this example resets on every server restart and does not share state across multiple Next.js instances. A production deployment on Vercel or any horizontally scaled infrastructure requires the Redis-backed version. The interface stays the same; only the store changes.

Structured Data as a Complementary Signal

JSON-LD, embedded in the <head> of Next.js pages via the Metadata API, gives agents a third reading channel alongside the MCP manifest and the semantic DOM. JSON-LD encodes page-level structured data according to Schema.org vocabularies, which major AI systems are trained to recognize and parse.

// app/(site)/products/[id]/page.tsx
import { Metadata } from "next";
import { getProduct } from "@/lib/catalog";

interface PageProps {
  params: { id: string };
}

export async function generateMetadata({ params }: PageProps): Promise<Metadata> {
  const product = await getProduct(params.id);
  return {
    title: product.name,
    description: product.description,
  };
}

export default async function ProductPage({ params }: PageProps) {
  const product = await getProduct(params.id);

  const jsonLd = {
    "@context": "https://schema.org",
    "@type": "Product",
    name: product.name,
    description: product.description,
    sku: product.id,
    offers: {
      "@type": "Offer",
      price: product.price,
      priceCurrency: "USD",
      availability: product.inStock
        ? "https://schema.org/InStock"
        : "https://schema.org/OutOfStock",
      url: `https://example.com/products/${product.id}`,
    },
  };

  return (
    <>
      <script
        type="application/ld+json"
        dangerouslySetInnerHTML={{ __html: JSON.stringify(jsonLd) }}
      />
      <article aria-label={product.name} data-mcp-region="product-detail">
        <h1>{product.name}</h1>
        <p>{product.description}</p>
        <p>
          <data value={product.price}>${product.price.toFixed(2)}</data>
        </p>
      </article>
    </>
  );
}

JSON-LD and MCP serve different purposes. JSON-LD is passive: it describes what the page contains. MCP is active: it specifies what an agent can do. A complete agent-ready implementation uses both, because they inform different parts of the agent's decision process. JSON-LD helps the agent decide whether the page is relevant to its task; MCP tells it how to act once it has decided to proceed.

Testing Agent Readability

An agent-ready interface requires a different testing posture than a standard web application. Beyond functional tests that verify correct behavior for human users, we need tests that verify the interface remains machine-readable under code changes.

We can write playwright tests that assert on the accessibility tree rather than the visual DOM, which gives us confidence that the semantic structure agents depend on is preserved across refactors.

// tests/agent-readability.spec.ts
import { test, expect } from "@playwright/test";

test("product listing exposes correct landmark structure", async ({ page }) => {
  await page.goto("/products");

  const main = page.getByRole("main");
  await expect(main).toBeVisible();
  await expect(main).toHaveAttribute("aria-label", "Product listing");

  const searchForm = page.getByRole("search");
  await expect(searchForm).toBeVisible();

  const productList = page.getByRole("list", { name: "Products" });
  await expect(productList).toBeVisible();
});

test("MCP manifest is served and valid", async ({ request }) => {
  const response = await request.get("/.well-known/mcp.json");
  expect(response.ok()).toBe(true);
  expect(response.headers()["content-type"]).toContain("application/json");

  const manifest = await response.json();
  expect(manifest.schema_version).toBeDefined();
  expect(Array.isArray(manifest.actions)).toBe(true);
  expect(manifest.actions.length).toBeGreaterThan(0);
});

test("product search MCP action returns valid schema", async ({ request }) => {
  const response = await request.post("/api/mcp/search", {
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${process.env.TEST_AGENT_KEY}`,
    },
    data: { query: "test" },
  });

  expect(response.ok()).toBe(true);
  const results = await response.json();
  expect(Array.isArray(results)).toBe(true);

  if (results.length > 0) {
    expect(typeof results[0].id).toBe("string");
    expect(typeof results[0].name).toBe("string");
    expect(typeof results[0].price).toBe("number");
  }
});

These tests run in CI alongside unit and integration tests. When a developer refactors a component and removes an aria-label that the manifest's readable_regions depends on, the accessibility tree test catches the regression before it reaches production.

Observability for Agent Traffic

Human users generate log patterns that are relatively easy to interpret: page views, clicks, form submissions, session lengths. Agent traffic has a different signature: high request frequency, consistent user agents, sequential action invocations that correspond to task completion steps, and no idle time between requests.

We surface agent traffic distinctly in application logs by tagging every MCP request with a structured log entry.

// lib/mcp-logger.ts

interface MCPRequestLog {
  timestamp: string;
  agentId: string;
  action: string;
  duration_ms: number;
  status: number;
  input_size_bytes: number;
}

export function logMCPRequest(log: MCPRequestLog): void {
  // In production, send this to your structured logging provider.
  // Here we write to stdout in JSON Lines format.
  process.stdout.write(JSON.stringify({ type: "mcp_request", ...log }) + "\n");
}

We call this logger from each Route Handler after the response is prepared. The duration_ms field measures the time from when the handler began executing to when the response was assembled, which excludes network transit time. This gives us accurate latency numbers for capacity planning, and it makes it straightforward to identify which actions are slow under agent load.

If the agent key identifies the calling agent, we can also track per-agent usage over time. This data is valuable both for billing purposes if we charge for API access, and for understanding which agent workflows are most common. Common workflows inform which MCP actions to prioritize for performance optimization and which to add next.

If a professional Web3 documentation project or a production-grade Next.js application is something you need built correctly the first time, I am available for contract work via my Fiverr profile.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.