DEV Community: Ethan Cole

Building an MCP server — lessons from thunderbit-mcp

Ethan Cole — Mon, 11 May 2026 04:05:23 +0000

When we started building thunderbit-mcp, the plan sounded straightforward: expose Thunderbit's web extraction API to AI coding agents through the Model Context Protocol.

In practice, the hard parts were not the SDK calls. The hard parts were product-shaped:

How many tools should the server expose?
What should a tool return when a page is blocked, slow, or only partially extracted?
Should the server run locally over stdio, remotely over HTTP, or both?
How much should the LLM decide, and how much should the tool force into a schema?
What makes an MCP server feel dependable instead of magical?

This post is a field guide from shipping an MCP server for web data extraction. The examples use Thunderbit because that is the system we were working on, but the lessons apply to most MCP servers that wrap an existing API.

MCP is small. The product surface is not.

MCP gives you a clean frame: a host application talks to an MCP server; the server exposes tools, resources, prompts, and capabilities; messages move over JSON-RPC; the connection goes through initialization, operation, and shutdown.

That sounds tiny, which is part of the appeal.

But the minute you ship a server to real users, you are no longer only designing a protocol adapter. You are designing an interface for an AI agent.

That changes the questions.

A REST API can assume the caller is a developer who read the docs. An MCP tool is often called by a model that inferred intent from one sentence:

"Grab the pricing tables from these competitor pages and give me a CSV."

The model may not know whether to fetch raw HTML, render JavaScript, extract structured fields, paginate, or retry with a different region. A good MCP server turns that ambiguity into a small number of safe, predictable decisions.

For thunderbit-mcp, we treated the MCP layer as a product API, not a thin wrapper around every internal endpoint.

Lesson 1: Fewer tools are usually better

The first temptation is to expose everything:

distill
extract
batchDistill
batchExtract
getJob
cancelJob
listJobs
render
screenshot
proxyDebug
credits
schemaInfer

That looks complete, but it creates decision fatigue for the model. Tool descriptions start overlapping. The agent has to decide between five similar verbs before it has even helped the user.

We had better results when tools mapped to user intent instead of internal API shape:

fetch_page_content(url, options)
extract_structured_data(url, schema, options)
extract_many_pages(urls, schema_or_mode, webhook?)
check_extraction_job(job_id)

The important detail is not the exact names. It is that each tool answers a distinct question:

"I need readable page content."
"I need fields that match a schema."
"I need to run this across many URLs."
"I need to check async progress."

If two tools are hard for you to explain in one sentence without using implementation words, merge them or make one an option.

Lesson 2: Tool descriptions are part of the runtime

In normal API design, descriptions are documentation. In MCP, descriptions are also model steering.

This means vague descriptions are expensive.

Bad:

Extract data from a URL.

Better:

Extract structured data from a public web page using a JSON Schema.
Use this when the user asks for specific fields such as prices, names,
emails, dates, reviews, listings, tables, or product attributes.
Returns JSON that conforms to the provided schema when possible.

That description teaches the agent when to call the tool. It also prevents the common mistake of using extraction when the user only needs a readable summary.

We also learned to include negative guidance:

Do not use this tool for private pages that require the user's logged-in
browser session. Do not use it for actions such as clicking buttons,
submitting forms, or making purchases.

Negative guidance matters because web automation is a broad mental category. If your server only reads pages, say so. If it can act on pages, be even more explicit.

Lesson 3: Schemas beat clever prompts

For web extraction, a natural first version is:

{
  "url": "https://example.com/product",
  "prompt": "Get the product name, price, rating, and availability."
}

That works for demos. It is less fun in production.

We moved toward JSON Schema as the primary contract:

{
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "description": "The product name as shown on the page"
    },
    "price": {
      "type": "number",
      "description": "Current listed price in USD, excluding shipping"
    },
    "inStock": {
      "type": "boolean",
      "description": "Whether the product appears available to buy"
    }
  },
  "required": ["name", "price"]
}

This did three useful things:

It made the user's expected output machine-checkable.
It let the model create or refine the schema before calling the tool.
It reduced downstream cleanup because the result already had shape.

The funny thing about agents is that they are often better at writing a schema than at remembering all the implicit constraints in a prose prompt. Use that.

Lesson 4: Return boring errors

AI agents do not need poetic error messages. They need errors they can act on.

For thunderbit-mcp, we tried to keep tool failures in a small set of categories:

INVALID_INPUT
AUTH_REQUIRED
RATE_LIMITED
FETCH_FAILED
EXTRACTION_FAILED
PARTIAL_RESULT
JOB_PENDING

Each error includes:

a short human-readable message
whether retrying makes sense
any safe next action
the request or job ID for debugging

Example:

{
  "code": "RATE_LIMITED",
  "message": "The request hit the current account rate limit.",
  "retryable": true,
  "retryAfterSeconds": 60
}

The goal is not to hide complexity. It is to keep the agent from improvising. A model that sees retryAfterSeconds is much more likely to wait or explain the limit than to spam the same tool call five times.

Lesson 5: stdio is the best first transport

The MCP spec currently defines two standard transports: stdio and Streamable HTTP.

For a first server, stdio is usually the calmest path:

The client launches your server as a subprocess.
You read JSON-RPC messages from stdin.
You write protocol messages to stdout.
You write logs to stderr.

That last point is worth underlining. Do not log to stdout. In stdio MCP, stdout is protocol space. A single stray console.log("debug") can break the client connection.

stdio is a good fit when:

users run the server locally
configuration is mostly environment variables
the server is a wrapper around an API
you want broad compatibility with desktop agents and coding tools

Streamable HTTP becomes attractive when:

you want a hosted server
auth is browser-based or OAuth-based
multiple clients need to connect
you need resumability, session management, or server-to-client notifications

For Thunderbit, stdio made the initial developer workflow simple: install, add config to the MCP client, pass an API key, and start using tools. A remote HTTP server is a better second step once the auth and tenancy story are mature.

Lesson 6: Treat authentication as UX

Auth is not just a security feature. In MCP, auth is often the first moment of truth.

If setup requires five steps, three dashboards, and a mystery config file, many users will assume the server is broken.

The local stdio version should make the happy path obvious:

npx thunderbit-mcp

And the MCP client config should be boring:

{
  "mcpServers": {
    "thunderbit": {
      "command": "npx",
      "args": ["thunderbit-mcp"],
      "env": {
        "THUNDERBIT_API_KEY": "your_api_key"
      }
    }
  }
}

For hosted transports, use real auth. The MCP transport docs call out important security protections for HTTP servers, including origin validation, localhost binding for local servers, and proper authentication. Do not treat "it is just an agent tool" as a reason to relax security. Agent tools are exactly where you want clean boundaries.

Lesson 7: Make the model ask less often

A good MCP server should reduce clarification loops.

For web extraction, the model often needs to know:

Should JavaScript be rendered?
Should the server follow pagination?
Should it return Markdown or JSON?
Should it run one URL or many?
Is partial data acceptable?

You can force the model to ask the user every time, but that makes the workflow feel brittle. Instead, set defaults that match the common case and expose options for the edge cases.

For example:

{
  "url": "https://example.com",
  "renderMode": "auto",
  "countryCode": "US",
  "maxPages": 1,
  "includeLinks": false
}

The model can still override these when the user says "include all pagination" or "check the German version." But the default path stays short.

Lesson 8: Async jobs need a narrative

Batch extraction is not instant. That is fine, as long as the tool gives the agent a narrative it can relay to the user.

Bad async response:

{
  "id": "job_123"
}

Better:

{
  "jobId": "job_123",
  "status": "queued",
  "submittedUrls": 80,
  "estimatedCompletionSeconds": 120,
  "nextAction": "Call check_extraction_job with this jobId."
}

Agents are very literal. If there is a next action, put it in the result. If there is no next action, say that too.

Lesson 9: Registry submission is packaging work

The MCP Registry is now the official centralized metadata repository for publicly accessible MCP servers, currently in preview. That is good news for discovery, but it also raises the bar for packaging.

Before submitting, check the unglamorous parts:

Is the package name stable?
Is the README installation flow tested from scratch?
Are required environment variables documented?
Does the server expose a useful version?
Are tool names stable enough to avoid breaking users?
Does the server fail gracefully without credentials?
Is there a minimal example for at least one popular MCP client?

Registry metadata is not a substitute for a good first run. If the first command fails silently, discovery will not save you.

Lesson 10: An MCP server should have opinions

The best MCP servers are not neutral pipes. They encode judgment.

For thunderbit-mcp, those opinions were:

Prefer structured output when the user asks for fields.
Prefer cleaned Markdown when the user asks to read, summarize, or compare pages.
Prefer batch tools when the user provides many URLs.
Avoid browser actions unless the capability is explicitly supported.
Return partial results clearly instead of pretending everything succeeded.
Keep credentials out of prompts and tool outputs.

Your opinions will be different. The point is to have them.

An MCP server that exposes every knob equally forces the model to become your product manager at runtime. That is rarely what you want.

A small implementation checklist

If I were starting another MCP server tomorrow, I would use this checklist:

Start with three to five tools.
Write tool descriptions like model instructions, not API docs.
Use structured inputs and outputs everywhere.
Put logs on stderr for stdio servers.
Add stable error codes before adding more features.
Test with real agent prompts, not only direct tool calls.
Include one copy-paste client config in the README.
Document auth failure, rate limits, retries, and partial results.
Decide which transport is primary before designing auth.
Treat registry submission as part of the release, not an afterthought.

Where Thunderbit fits

Thunderbit is an AI web scraper and web extraction platform. The API is designed to turn web pages into clean Markdown or structured JSON while handling common scraping problems like JavaScript rendering, noisy HTML, anti-bot friction, geo-routing, batch jobs, and webhooks.

That makes it a natural fit for MCP: agents often need fresh web data, but they should not have to manage a browser cluster or maintain brittle CSS selectors just to answer a question.

The weakness is also clear: Thunderbit is not the right tool for every MCP job. If you only need to read local files, query your own database, or call a simple internal API, a tiny custom MCP server will be cheaper and more direct. Thunderbit makes sense when the hard part is the public web.

That distinction matters. MCP works best when each server has a sharp job.

Final thought

Building an MCP server is easy in the same way building a CLI is easy: the first command can work in an afternoon.

Shipping one people trust takes longer.

You have to design the verbs, the defaults, the errors, the auth flow, the packaging, and the story the agent tells when something goes wrong. The protocol gives you the rail. The product work is deciding where the rail should go.

That was the biggest lesson from thunderbit-mcp: the server is not just how an AI calls your API. It is how your API becomes part of somebody's thinking loop.

Designing Reliable Tool Schemas with Zod for LLM Agents

Ethan Cole — Mon, 11 May 2026 03:34:43 +0000

LLM agents often fail in surprisingly ordinary places.

Not in the model call. Not in the prompt. Not even in the function that eventually does the work.

They fail at the boundary between "the model produced some arguments" and "my application trusted those arguments enough to run code."

That boundary is where tool schemas matter.

If you are building anything agent-like in TypeScript, such as an internal automation, an MCP server, a CLI helper, or a backend workflow that lets a model call functions, Zod is a practical way to make that boundary explicit.

The goal is simple: treat model output as untrusted input without filling the codebase with scattered defensive checks.

This article walks through the pattern I use for designing reliable tool schemas with Zod.

No framework required. No product pitch. Just TypeScript boundaries.

The problem with "almost valid" tool calls

Imagine we expose a tool like this:

async function searchDocs(input: {
  query: string;
  limit?: number;
  includeDrafts?: boolean;
}) {
  // Search implementation...
}

If a human developer calls this function, TypeScript helps.

If a model calls it, TypeScript does not help at runtime.

The model might produce:

{
  "query": "OAuth callback errors",
  "limit": "10",
  "includeDrafts": "false"
}

That looks close. It is also wrong.

Depending on how the code handles it, "10" might silently work, "false" might behave as truthy, and the tool might return draft documents even though the user did not ask for them.

This is why I like to think of tool inputs as API requests from a slightly chaotic client. The model is not malicious, but it is not type-safe either.

Start with a runtime schema

The first improvement is to define the input shape with Zod:

import { z } from "zod";

const SearchDocsInput = z.object({
  query: z.string().min(1).max(200),
  limit: z.number().int().min(1).max(20).default(5),
  includeDrafts: z.boolean().default(false),
});

type SearchDocsInput = z.infer<typeof SearchDocsInput>;

Now the actual tool accepts a validated type:

async function searchDocs(input: SearchDocsInput) {
  // input.query is a non-empty string
  // input.limit is an integer from 1 to 20
  // input.includeDrafts is a boolean
}

Then the runtime boundary becomes explicit:

async function runSearchDocs(rawInput: unknown) {
  const input = SearchDocsInput.parse(rawInput);
  return searchDocs(input);
}

That is already safer. But for model-facing tools, I usually prefer safeParse.

async function runSearchDocs(rawInput: unknown) {
  const result = SearchDocsInput.safeParse(rawInput);

  if (!result.success) {
    return {
      ok: false,
      error: "Invalid tool input",
      issues: result.error.issues.map((issue) => ({
        path: issue.path.join("."),
        message: issue.message,
      })),
    };
  }

  const data = await searchDocs(result.data);

  return {
    ok: true,
    data,
  };
}

Instead of crashing the whole run, the tool can return a structured validation error that the agent or application can handle.

Coerce deliberately, not accidentally

Coercion can be useful. Models often produce numbers as strings, especially when values came from natural language.

Zod supports this:

const SearchDocsInput = z.object({
  query: z.string().min(1).max(200),
  limit: z.coerce.number().int().min(1).max(20).default(5),
  includeDrafts: z.coerce.boolean().default(false),
});

But be careful with booleans.

JavaScript boolean coercion is not the same thing as parsing user intent:

Boolean("false"); // true

For model-facing schemas, I usually avoid broad boolean coercion and define a stricter helper:

const BooleanFromModel = z.union([
  z.boolean(),
  z.literal("true").transform(() => true),
  z.literal("false").transform(() => false),
]);

const SearchDocsInput = z.object({
  query: z.string().min(1).max(200),
  limit: z.coerce.number().int().min(1).max(20).default(5),
  includeDrafts: BooleanFromModel.default(false),
});

The point is not "never coerce." The point is to make every coercion a design choice.

Keep schemas boring

Tool schemas should be boring.

That sounds small, but it matters. If a schema is too clever, the model has a harder time producing valid input, and humans have a harder time debugging failures.

Prefer this:

const CreateIssueInput = z.object({
  title: z.string().min(1).max(120),
  body: z.string().max(4000).optional(),
  priority: z.enum(["low", "medium", "high"]).default("medium"),
});

Over this:

const CreateIssueInput = z.object({
  payload: z.object({
    meta: z.object({
      attributes: z.record(z.unknown()),
    }),
  }),
});

Nested shapes are sometimes necessary, but for model-called tools, flat and literal usually wins.

A good tool schema answers three questions quickly:

What fields are allowed?
What values are valid?
What defaults will be applied?

If a future maintainer has to read several transforms to understand the shape, the schema is probably doing too much.

Use enums instead of open strings

Open strings give the model too much room to improvise.

For example:

const ExportReportInput = z.object({
  format: z.string(),
});

The model might send:

{ "format": "spreadsheet" }

But your code expected "csv" or "xlsx".

Use an enum:

const ExportReportInput = z.object({
  format: z.enum(["csv", "xlsx", "json"]),
});

This helps in two ways.

First, runtime validation becomes safer.

Second, if you convert the Zod schema into JSON Schema for a tool definition, the model can see the allowed values directly.

Separate public input from internal options

One mistake I see in tool design is exposing internal options too early.

Suppose your search system supports these options:

type InternalSearchOptions = {
  query: string;
  limit: number;
  indexName: string;
  rankingProfile: "fast" | "balanced" | "deep";
  debugTraceId?: string;
};

That does not mean the model-facing tool should expose all of them.

Create a smaller public schema:

const SearchInput = z.object({
  query: z.string().min(1).max(200),
  limit: z.number().int().min(1).max(10).default(5),
});

Then map it into internal options:

function toInternalSearchOptions(input: SearchInput): InternalSearchOptions {
  return {
    query: input.query,
    limit: input.limit,
    indexName: "docs",
    rankingProfile: "balanced",
  };
}

This is one of the simplest ways to make tools safer.

The model should control intent, not infrastructure.

Return structured errors

When validation fails, do not return a giant stack trace to the model. Also do not return a vague "bad input".

Return compact, structured feedback:

function formatZodError(error: z.ZodError) {
  return error.issues.map((issue) => ({
    field: issue.path.join(".") || "(root)",
    problem: issue.message,
  }));
}

Example response:

{
  "ok": false,
  "error": "Invalid tool input",
  "issues": [
    {
      "field": "limit",
      "problem": "Number must be less than or equal to 20"
    }
  ]
}

This is useful for three audiences:

The model can retry with better arguments.
The developer can see what went wrong.
The application can log validation failures without leaking sensitive internals.

Add descriptions where your tool runtime supports them

Zod itself is not an LLM tool spec. But many stacks let you turn Zod schemas into JSON Schema, OpenAPI-like definitions, or tool descriptors.

Descriptions help the model choose fields correctly:

const SearchDocsInput = z.object({
  query: z
    .string()
    .min(1)
    .max(200)
    .describe("The plain-language search query."),
  limit: z
    .number()
    .int()
    .min(1)
    .max(20)
    .default(5)
    .describe("Maximum number of results to return."),
  includeDrafts: z
    .boolean()
    .default(false)
    .describe("Whether unpublished draft documents may be included."),
});

Keep descriptions short and literal.

Bad description:

.describe("Use this when the user really wants to go deep and find all the things.")

Better description:

.describe("Maximum number of results to return.")

The model does not need vibes. It needs constraints.

Put authorization outside the schema

Zod can validate shape. It cannot decide whether the caller is allowed to do something.

Keep those separate:

const DeleteDocumentInput = z.object({
  documentId: z.string().uuid(),
});

async function runDeleteDocument(rawInput: unknown, user: User) {
  const result = DeleteDocumentInput.safeParse(rawInput);

  if (!result.success) {
    return {
      ok: false,
      error: "Invalid tool input",
      issues: formatZodError(result.error),
    };
  }

  const canDelete = await permissions.canDeleteDocument(
    user.id,
    result.data.documentId,
  );

  if (!canDelete) {
    return {
      ok: false,
      error: "Not authorized to delete this document",
    };
  }

  await documents.delete(result.data.documentId);

  return {
    ok: true,
  };
}

The schema proves the input is shaped correctly.

The permission check proves the action is allowed.

You need both.

A small wrapper pattern

After writing a few tools, the validation wrapper starts to repeat. I usually extract a tiny helper:

import { z } from "zod";

type ToolResult<T> =
  | { ok: true; data: T }
  | {
      ok: false;
      error: string;
      issues?: Array<{ field: string; problem: string }>;
    };

function defineTool<InputSchema extends z.ZodTypeAny, Output>(
  schema: InputSchema,
  handler: (input: z.infer<InputSchema>) => Promise<Output>,
) {
  return async function run(rawInput: unknown): Promise<ToolResult<Output>> {
    const result = schema.safeParse(rawInput);

    if (!result.success) {
      return {
        ok: false,
        error: "Invalid tool input",
        issues: formatZodError(result.error),
      };
    }

    const data = await handler(result.data);

    return {
      ok: true,
      data,
    };
  };
}

Now tools stay small:

const searchDocsTool = defineTool(SearchDocsInput, async (input) => {
  return searchDocs(input);
});

This is not a full agent framework. It is just a clean runtime boundary.

That is often enough.

Checklist for model-facing Zod schemas

Before I ship a tool schema, I like to check:

Are all fields explicitly listed?
Are strings bounded with min and max?
Are numbers bounded with min, max, and int where appropriate?
Are open strings replaced with enums when possible?
Are defaults safe?
Are coercions deliberate?
Are internal options hidden?
Are validation errors structured?
Are authorization checks separate from validation?

Most tool bugs come from skipping one of those.

Final thought

The best tool schemas are not fancy.

They are narrow, readable, and a little suspicious of everything crossing the model-to-code boundary.

That suspicion is healthy. It lets you build agents that can recover from bad inputs, explain what went wrong, and call real application code without pretending a language model is a type checker.

Zod is not the only way to do this.

But if your agent code is already in TypeScript, it is one of the fastest ways to make the boundary concrete.

How to Give Your AI Agent Live Web Access Without Feeding It Raw HTML

Ethan Cole — Fri, 08 May 2026 08:12:48 +0000

Most AI agents eventually run into the same awkward problem: the web is right there, but reading it cleanly is still annoying.

Your agent can plan tasks, write code, summarize text, call tools, and reason through multi-step workflows. Then someone gives it a URL and the nice abstraction gets messy fast.

How should the agent actually read the page?

Raw HTML is noisy. Modern websites render content with JavaScript. Pages have navigation, cookie banners, modals, ads, related posts, footers, and a surprising number of things that are technically text but absolutely not useful context.

If you dump all of that into an LLM, you burn tokens and usually get worse answers.

The setup I keep coming back to is:

Fetch a URL.
Convert the page into clean Markdown or structured JSON.
Pass the cleaned result to your agent as context.

I will use the Thunderbit Web Scraper API for the extraction step. The product is not really the point, though. The point is the boundary: clean the webpage first, then let the agent work with the cleaned input.

Why agents need cleaner web context

An agent might need live webpage context for all kinds of ordinary product work:

answer questions about a specific article
summarize a competitor page
extract pricing from product pages
monitor job listings
enrich a company database
collect sources for a research workflow
turn documentation pages into RAG-ready content

The first quick version usually looks like this:

const html = await fetch(url).then((res) => res.text());

It is fine for a demo. It is not much of a foundation.

The HTML might not include the rendered content. The useful text might be buried between scripts, nav links, cookie text, and layout markup. You can clean it yourself, but now your agent project has a side quest: building a web extraction pipeline.

For an agent, the best input is usually not raw HTML. It is either:

clean Markdown for reading and reasoning
structured JSON for fields the agent needs to act on

For this pattern, I mostly care about two endpoints:

Distill: URL to clean Markdown
Extract: URL plus schema to JSON or CSV

Use Distill when the agent needs to read a page. Use Extract when your app needs specific fields.

Step 1: Turn a webpage into Markdown

Start with the simplest version: take a URL and turn it into Markdown.

Here is a curl request to the Distill endpoint:

curl -X POST "https://openapi.thunderbit.com/openapi/v1/distill" \
  -H "Authorization: Bearer $THUNDERBIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/article"
  }'

The response includes Markdown that is much easier for an LLM to use than raw page HTML.

In Python:

import os
import requests

API_KEY = os.environ["THUNDERBIT_API_KEY"]

response = requests.post(
    "https://openapi.thunderbit.com/openapi/v1/distill",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json={"url": "https://example.com/article"},
    timeout=60,
)

response.raise_for_status()
result = response.json()

markdown = result["data"]["markdown"]
print(markdown[:1000])

Now the agent does not need to parse HTML, ignore nav bars, or guess which chunk of the page matters. It gets the cleaned content directly.

Step 2: Give the Markdown to your agent

The prompt can stay plain:

You are helping analyze a webpage.

Use the webpage content below as your source of truth.
If the answer is not supported by the content, say so.

WEBPAGE:
{{markdown}}

USER QUESTION:
{{question}}

That "source of truth" line is doing real work. It keeps the answer grounded in the fetched page instead of letting the model blend page content with whatever it already knows.

In a real app, you might wrap this in a function:

def build_page_context_prompt(markdown: str, question: str) -> str:
    return f"""
You are helping analyze a webpage.

Use the webpage content below as your source of truth.
If the answer is not supported by the content, say so.

WEBPAGE:
{markdown}

USER QUESTION:
{question}
""".strip()

For a lot of small workflows, this gets you surprisingly far:

"Summarize this article in five bullets."
"What are the pricing tiers on this page?"
"Does this documentation mention webhooks?"
"Extract the integration steps from this guide."

Step 3: Use structured extraction when the agent needs fields

Markdown is good when the agent needs to understand a page. Sometimes the app needs fields it can act on.

For example:

product name and price
job title and location
company name and description
article title, author, and date
event name, date, venue, and registration link

When I care about those fields, I reach for schema-based extraction.

Instead of asking the agent to read a big page and then pull fields out of prose, ask the extraction layer to return structured JSON.

curl -X POST "https://openapi.thunderbit.com/openapi/v1/extract" \
  -H "Authorization: Bearer $THUNDERBIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product",
    "schema": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "The product name"
        },
        "price": {
          "type": "string",
          "description": "The current displayed price, including currency"
        },
        "availability": {
          "type": "string",
          "description": "Whether the product is in stock, unavailable, or preorder"
        }
      },
      "required": ["name", "price"]
    }
  }'

Now the next step gets a predictable object, not a wall of text.

For example:

{
  "name": "Example Product",
  "price": "$49.00",
  "availability": "In stock"
}

From there, the object can feed a monitoring workflow, a database update, a Slack notification, or another agent step.

A simple Node.js tool function

If you are building with tool calling, expose web reading as a normal tool.

Here is a minimal Node.js function using built-in fetch:

async function distillUrl(url) {
  const response = await fetch("https://openapi.thunderbit.com/openapi/v1/distill", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.THUNDERBIT_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ url }),
  });

  if (!response.ok) {
    throw new Error(`Distill failed: ${response.status} ${await response.text()}`);
  }

  const result = await response.json();
  return result.data.markdown;
}

The agent can call this when the user gives it a URL:

const markdown = await distillUrl("https://example.com/article");

const prompt = `
You are analyzing a webpage.
Answer using only the webpage content below.

WEBPAGE:
${markdown}

QUESTION:
What are the main takeaways?
`;

This code is intentionally boring. The useful part is the boundary: the agent asks for a URL, the tool returns readable Markdown.

A tool function for structured extraction

You can also expose an extraction tool:

async function extractFromUrl(url, schema) {
  const response = await fetch("https://openapi.thunderbit.com/openapi/v1/extract", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.THUNDERBIT_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ url, schema }),
  });

  if (!response.ok) {
    throw new Error(`Extract failed: ${response.status} ${await response.text()}`);
  }

  return response.json();
}

Then define schemas for the objects your app understands.

For a job listing:

const jobSchema = {
  type: "object",
  properties: {
    title: {
      type: "string",
      description: "The job title",
    },
    company: {
      type: "string",
      description: "The hiring company",
    },
    location: {
      type: "string",
      description: "The job location or remote policy",
    },
    salary: {
      type: "string",
      description: "The listed salary range, if available",
    },
    requirements: {
      type: "array",
      description: "Key candidate requirements",
      items: { type: "string" },
    },
    applyUrl: {
      type: "string",
      description: "The application URL, if visible",
    },
  },
  required: ["title", "company"],
};

For most downstream code, that beats making the agent inspect the whole page every time.

Distill vs Extract

My rule of thumb:

Use Distill when the task is reading-heavy:

summarize this page
answer questions from this article
ingest docs into a knowledge base
compare two landing pages
create notes from a report

Use Extract when the task is field-heavy:

get the product price
pull all job listings
extract event details
convert a directory page into rows
enrich a CRM record

In real workflows, you may use both. Distill gives the agent broad context. Extract gives the system reliable fields.

Why not just let the LLM browse?

If your platform already has built-in browsing, that can be useful for general research. Product features usually need something more controlled:

stable API calls
predictable output shape
server-side API key management
batch processing
retries and error handling
logs for debugging
clean content that can be stored or embedded

When this is part of a product, "the model browsed somewhere and said a thing" is hard to debug. A repeatable pipeline is much easier to reason about.

That is why I prefer to separate the pieces:

Web extraction API fetches and cleans the page.
Your app validates and stores the result.
The LLM reasons over the cleaned content.

This gives you logs, retry points, validation points, and fewer mystery failures.

A few practical tips

Keep API keys server-side. Do not put your Thunderbit API key in client-side JavaScript.

Cache page reads when possible. If ten users ask about the same URL, you probably do not need to distill it ten times in five minutes.

Store the source URL with every result. When your agent gives an answer, you want to know which page it used.

Validate structured extraction before taking action. If a field is required for a workflow, check it before sending emails, updating records, or triggering automations.

Use retries for temporary failures. Timeouts, rate limits, and transient server errors should be handled differently from invalid URLs or invalid schemas.

Respect the sites you access. Follow applicable laws, terms, robots policies where relevant, and use reasonable request patterns.

Example use cases

A few places where I would use this:

Research assistant

A user gives an article URL. Your app distills it into Markdown, then the agent summarizes it, extracts claims, and suggests follow-up questions.

Sales enrichment

The user enters a company website. Your app extracts company name, positioning, target audience, product categories, and contact links, then your agent drafts a personalized outreach note.

Competitive monitoring

Your app checks competitor pricing pages on a schedule. Extract returns structured pricing data. The agent summarizes changes and highlights anything important.

Documentation helper

Your app distills docs pages into Markdown and stores them in a vector database. The agent answers support questions from up-to-date docs instead of stale model memory.

Job board tracker

Your app extracts jobs from multiple company career pages using one schema. The agent ranks matches for a candidate profile.

Final thoughts

Giving an AI agent live web access sounds bigger than it has to be.

Do not make the agent fight the webpage.

Give it clean Markdown when it needs to read. Give it structured JSON when it needs fields. Keep the messy parts of web extraction behind a tool boundary.

A scraper API is useful here because it gives you that boundary: URL in, Markdown or JSON out. Thunderbit's Web Scraper API does that with Distill for Markdown and Extract for schema-based JSON. It also handles JavaScript-heavy pages and batch workflows, which are the parts I would rather not rebuild for every agent project.

You can get an API key and try it here: Thunderbit Web Scraper API

Start with one tool: read_url(url). Once your agent can reliably read a page, a lot of web-aware workflows become easier to build.