DEV Community

Cover image for Designing Reliable Tool Schemas with Zod for LLM Agents
Ethan Cole
Ethan Cole

Posted on

Designing Reliable Tool Schemas with Zod for LLM Agents

LLM agents often fail in surprisingly ordinary places.

Not in the model call. Not in the prompt. Not even in the function that eventually does the work.

They fail at the boundary between "the model produced some arguments" and "my application trusted those arguments enough to run code."

That boundary is where tool schemas matter.

If you are building anything agent-like in TypeScript, such as an internal automation, an MCP server, a CLI helper, or a backend workflow that lets a model call functions, Zod is a practical way to make that boundary explicit.

The goal is simple: treat model output as untrusted input without filling the codebase with scattered defensive checks.

This article walks through the pattern I use for designing reliable tool schemas with Zod.

No framework required. No product pitch. Just TypeScript boundaries.

The problem with "almost valid" tool calls

Imagine we expose a tool like this:

async function searchDocs(input: {
  query: string;
  limit?: number;
  includeDrafts?: boolean;
}) {
  // Search implementation...
}
Enter fullscreen mode Exit fullscreen mode

If a human developer calls this function, TypeScript helps.

If a model calls it, TypeScript does not help at runtime.

The model might produce:

{
  "query": "OAuth callback errors",
  "limit": "10",
  "includeDrafts": "false"
}
Enter fullscreen mode Exit fullscreen mode

That looks close. It is also wrong.

Depending on how the code handles it, "10" might silently work, "false" might behave as truthy, and the tool might return draft documents even though the user did not ask for them.

This is why I like to think of tool inputs as API requests from a slightly chaotic client. The model is not malicious, but it is not type-safe either.

Start with a runtime schema

The first improvement is to define the input shape with Zod:

import { z } from "zod";

const SearchDocsInput = z.object({
  query: z.string().min(1).max(200),
  limit: z.number().int().min(1).max(20).default(5),
  includeDrafts: z.boolean().default(false),
});

type SearchDocsInput = z.infer<typeof SearchDocsInput>;
Enter fullscreen mode Exit fullscreen mode

Now the actual tool accepts a validated type:

async function searchDocs(input: SearchDocsInput) {
  // input.query is a non-empty string
  // input.limit is an integer from 1 to 20
  // input.includeDrafts is a boolean
}
Enter fullscreen mode Exit fullscreen mode

Then the runtime boundary becomes explicit:

async function runSearchDocs(rawInput: unknown) {
  const input = SearchDocsInput.parse(rawInput);
  return searchDocs(input);
}
Enter fullscreen mode Exit fullscreen mode

That is already safer. But for model-facing tools, I usually prefer safeParse.

async function runSearchDocs(rawInput: unknown) {
  const result = SearchDocsInput.safeParse(rawInput);

  if (!result.success) {
    return {
      ok: false,
      error: "Invalid tool input",
      issues: result.error.issues.map((issue) => ({
        path: issue.path.join("."),
        message: issue.message,
      })),
    };
  }

  const data = await searchDocs(result.data);

  return {
    ok: true,
    data,
  };
}
Enter fullscreen mode Exit fullscreen mode

Instead of crashing the whole run, the tool can return a structured validation error that the agent or application can handle.

Coerce deliberately, not accidentally

Coercion can be useful. Models often produce numbers as strings, especially when values came from natural language.

Zod supports this:

const SearchDocsInput = z.object({
  query: z.string().min(1).max(200),
  limit: z.coerce.number().int().min(1).max(20).default(5),
  includeDrafts: z.coerce.boolean().default(false),
});
Enter fullscreen mode Exit fullscreen mode

But be careful with booleans.

JavaScript boolean coercion is not the same thing as parsing user intent:

Boolean("false"); // true
Enter fullscreen mode Exit fullscreen mode

For model-facing schemas, I usually avoid broad boolean coercion and define a stricter helper:

const BooleanFromModel = z.union([
  z.boolean(),
  z.literal("true").transform(() => true),
  z.literal("false").transform(() => false),
]);

const SearchDocsInput = z.object({
  query: z.string().min(1).max(200),
  limit: z.coerce.number().int().min(1).max(20).default(5),
  includeDrafts: BooleanFromModel.default(false),
});
Enter fullscreen mode Exit fullscreen mode

The point is not "never coerce." The point is to make every coercion a design choice.

Keep schemas boring

Tool schemas should be boring.

That sounds small, but it matters. If a schema is too clever, the model has a harder time producing valid input, and humans have a harder time debugging failures.

Prefer this:

const CreateIssueInput = z.object({
  title: z.string().min(1).max(120),
  body: z.string().max(4000).optional(),
  priority: z.enum(["low", "medium", "high"]).default("medium"),
});
Enter fullscreen mode Exit fullscreen mode

Over this:

const CreateIssueInput = z.object({
  payload: z.object({
    meta: z.object({
      attributes: z.record(z.unknown()),
    }),
  }),
});
Enter fullscreen mode Exit fullscreen mode

Nested shapes are sometimes necessary, but for model-called tools, flat and literal usually wins.

A good tool schema answers three questions quickly:

  1. What fields are allowed?
  2. What values are valid?
  3. What defaults will be applied?

If a future maintainer has to read several transforms to understand the shape, the schema is probably doing too much.

Use enums instead of open strings

Open strings give the model too much room to improvise.

For example:

const ExportReportInput = z.object({
  format: z.string(),
});
Enter fullscreen mode Exit fullscreen mode

The model might send:

{ "format": "spreadsheet" }
Enter fullscreen mode Exit fullscreen mode

But your code expected "csv" or "xlsx".

Use an enum:

const ExportReportInput = z.object({
  format: z.enum(["csv", "xlsx", "json"]),
});
Enter fullscreen mode Exit fullscreen mode

This helps in two ways.

First, runtime validation becomes safer.

Second, if you convert the Zod schema into JSON Schema for a tool definition, the model can see the allowed values directly.

Separate public input from internal options

One mistake I see in tool design is exposing internal options too early.

Suppose your search system supports these options:

type InternalSearchOptions = {
  query: string;
  limit: number;
  indexName: string;
  rankingProfile: "fast" | "balanced" | "deep";
  debugTraceId?: string;
};
Enter fullscreen mode Exit fullscreen mode

That does not mean the model-facing tool should expose all of them.

Create a smaller public schema:

const SearchInput = z.object({
  query: z.string().min(1).max(200),
  limit: z.number().int().min(1).max(10).default(5),
});
Enter fullscreen mode Exit fullscreen mode

Then map it into internal options:

function toInternalSearchOptions(input: SearchInput): InternalSearchOptions {
  return {
    query: input.query,
    limit: input.limit,
    indexName: "docs",
    rankingProfile: "balanced",
  };
}
Enter fullscreen mode Exit fullscreen mode

This is one of the simplest ways to make tools safer.

The model should control intent, not infrastructure.

Return structured errors

When validation fails, do not return a giant stack trace to the model. Also do not return a vague "bad input".

Return compact, structured feedback:

function formatZodError(error: z.ZodError) {
  return error.issues.map((issue) => ({
    field: issue.path.join(".") || "(root)",
    problem: issue.message,
  }));
}
Enter fullscreen mode Exit fullscreen mode

Example response:

{
  "ok": false,
  "error": "Invalid tool input",
  "issues": [
    {
      "field": "limit",
      "problem": "Number must be less than or equal to 20"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This is useful for three audiences:

  • The model can retry with better arguments.
  • The developer can see what went wrong.
  • The application can log validation failures without leaking sensitive internals.

Add descriptions where your tool runtime supports them

Zod itself is not an LLM tool spec. But many stacks let you turn Zod schemas into JSON Schema, OpenAPI-like definitions, or tool descriptors.

Descriptions help the model choose fields correctly:

const SearchDocsInput = z.object({
  query: z
    .string()
    .min(1)
    .max(200)
    .describe("The plain-language search query."),
  limit: z
    .number()
    .int()
    .min(1)
    .max(20)
    .default(5)
    .describe("Maximum number of results to return."),
  includeDrafts: z
    .boolean()
    .default(false)
    .describe("Whether unpublished draft documents may be included."),
});
Enter fullscreen mode Exit fullscreen mode

Keep descriptions short and literal.

Bad description:

.describe("Use this when the user really wants to go deep and find all the things.")
Enter fullscreen mode Exit fullscreen mode

Better description:

.describe("Maximum number of results to return.")
Enter fullscreen mode Exit fullscreen mode

The model does not need vibes. It needs constraints.

Put authorization outside the schema

Zod can validate shape. It cannot decide whether the caller is allowed to do something.

Keep those separate:

const DeleteDocumentInput = z.object({
  documentId: z.string().uuid(),
});

async function runDeleteDocument(rawInput: unknown, user: User) {
  const result = DeleteDocumentInput.safeParse(rawInput);

  if (!result.success) {
    return {
      ok: false,
      error: "Invalid tool input",
      issues: formatZodError(result.error),
    };
  }

  const canDelete = await permissions.canDeleteDocument(
    user.id,
    result.data.documentId,
  );

  if (!canDelete) {
    return {
      ok: false,
      error: "Not authorized to delete this document",
    };
  }

  await documents.delete(result.data.documentId);

  return {
    ok: true,
  };
}
Enter fullscreen mode Exit fullscreen mode

The schema proves the input is shaped correctly.

The permission check proves the action is allowed.

You need both.

A small wrapper pattern

After writing a few tools, the validation wrapper starts to repeat. I usually extract a tiny helper:

import { z } from "zod";

type ToolResult<T> =
  | { ok: true; data: T }
  | {
      ok: false;
      error: string;
      issues?: Array<{ field: string; problem: string }>;
    };

function defineTool<InputSchema extends z.ZodTypeAny, Output>(
  schema: InputSchema,
  handler: (input: z.infer<InputSchema>) => Promise<Output>,
) {
  return async function run(rawInput: unknown): Promise<ToolResult<Output>> {
    const result = schema.safeParse(rawInput);

    if (!result.success) {
      return {
        ok: false,
        error: "Invalid tool input",
        issues: formatZodError(result.error),
      };
    }

    const data = await handler(result.data);

    return {
      ok: true,
      data,
    };
  };
}
Enter fullscreen mode Exit fullscreen mode

Now tools stay small:

const searchDocsTool = defineTool(SearchDocsInput, async (input) => {
  return searchDocs(input);
});
Enter fullscreen mode Exit fullscreen mode

This is not a full agent framework. It is just a clean runtime boundary.

That is often enough.

Checklist for model-facing Zod schemas

Before I ship a tool schema, I like to check:

  • Are all fields explicitly listed?
  • Are strings bounded with min and max?
  • Are numbers bounded with min, max, and int where appropriate?
  • Are open strings replaced with enums when possible?
  • Are defaults safe?
  • Are coercions deliberate?
  • Are internal options hidden?
  • Are validation errors structured?
  • Are authorization checks separate from validation?

Most tool bugs come from skipping one of those.

Final thought

The best tool schemas are not fancy.

They are narrow, readable, and a little suspicious of everything crossing the model-to-code boundary.

That suspicion is healthy. It lets you build agents that can recover from bad inputs, explain what went wrong, and call real application code without pretending a language model is a type checker.

Zod is not the only way to do this.

But if your agent code is already in TypeScript, it is one of the fastest ways to make the boundary concrete.

Top comments (1)

Collapse
 
bhavin-allinonetools profile image
Bhavin Sheth

Really liked the part about treating LLM output as β€œuntrusted input.” I learned that the hard way after a tool accepted "false" as truthy and returned the wrong data πŸ˜…
Keeping schemas boring + strict enums honestly makes agent tools way more reliable and easier to debug.