The Dangerous Bugs Are the Ones That Don't Crash: Building Input Validation for My MCP Server

David Tappert — Sun, 03 May 2026 20:34:54 +0000

I was building an MCP server for an event platform that automates speaker communications (confirmations, reminders, calendar invites, follow-ups). An agent created a session confirmation for "Monday March 8th." March 8th was a Sunday.

I caught it. But catching it was just the beginning.

The confirmation email had already been drafted with "Monday March 8th." The calendar invite had the wrong day. The follow-up survey was timestamped against a date that didn't exist on the schedule. One silent error had propagated into every downstream artifact the system touched.

Now I'm not fixing one mistake. I'm chasing it through four different outputs, correcting each one, regenerating, re-checking. Every correction creates noise. Every re-check takes time. And the whole time I'm wondering: what else did it get wrong that I didn't catch?

That's the thing about silent errors. They don't announce themselves. They don't crash. They just quietly spread, and the cleanup costs more than the original task saved you.

I shouldn't have had to catch it. The MCP server should have rejected the input before any of those artifacts were generated.

Why: One Silent Error, Everywhere, All at Once

Most validation advice focuses on the errors that blow up: wrong types, missing fields, injection attacks. Those are easy. Your server crashes, the agent gets an error, everyone knows something went wrong.

The harder problem is the errors that don't blow up. The ones where your server happily does exactly what the LLM asked, and what the LLM asked was wrong. No crash. No warning. Just wrong data flowing downstream with full confidence.

Here's why this is uniquely dangerous for MCP servers:

You don't control the caller. Your MCP server might be called by any number of agents, IDEs, or automation pipelines, each powered by a different model with different strengths and weaknesses. You can't rely on the caller to get it right.

The agent runtime doesn't help. It takes the LLM's tool call (name + JSON arguments) and forwards it to your server. It doesn't validate. It doesn't transform. Whatever the LLM generates, your server receives.

Silent errors propagate. This is the real cost. A wrong date in a session creation doesn't just create one bad record. It poisons every downstream artifact. The confirmation, the reminder, the calendar invite, the follow-up. Each one carries the same wrong date into a different system, a different channel, a different person's inbox. By the time a human notices, the error isn't in one place. It's everywhere. Cleaning it up means touching everything it touched.

The LLM is confident. It doesn't say "I'm not sure about this date." It says "Monday March 8th" with the same certainty it says "Tuesday March 10th." There's no signal that something is wrong, unless your server provides one.

Your MCP server is the one constant across all callers, all models, all agents. It's not a nice-to-have. It's the only thing standing between one confident mistake and a cleanup that costs more than the automation saved you.

How: Validate the Model, Not Just the Fields

Most people stop at field validation. Is the date valid? Is the string non-empty? Is the number positive? That's necessary, but it misses the entire class of bugs I'm talking about.

A session scheduled for "Monday March 8th" passes every field-level check. The date is valid. The weekday is a real weekday. The title is non-empty. The duration is positive. Every field is correct in isolation, but the model is wrong.

The question isn't "is each field valid?" It's "does the whole input make sense together?"

Three principles make this work:

Validate cross-field coherence

A model_validator runs after all fields are parsed and checks relationships between them. The weekday matches the date. The end time is after the start time. The duration fits the time window. The reminder comes before the event. No single field is wrong, but together, they might be nonsense.

Collect all errors in one pass

If the agent sends a request with three problems, report all three so it can fix them in one retry. Don't play whack-a-mole. That wastes round-trips and burns tokens. Pydantic does this naturally; it collects all field validation errors before raising a single ValidationError. Your model_validator can do the same: accumulate errors in a list and raise once at the end.

Write error messages for machines

The error message isn't for a human reading a log. It's for an LLM that needs to fix its own output. "2026-03-08 is a Sunday, not a Monday" is actionable. "Invalid date" is not. The more specific the message, the faster the self-correction.

What: The Implementation

Here's the concrete code. I built this with Kiro using spec-driven development: requirements with acceptance criteria, a design with Pydantic models, and a task breakdown. The implementation follows the spec.

Starting point: MCP tools without validation

My MCP server had tools for creating sessions, searching documents, managing speakers. Each tool accepted arguments directly from the LLM and passed them through. The models looked like this:

class CreateSessionInput(BaseModel):
    title: "str"
    session_date: date
    day_of_week: str | None = None
    start_time: time
    end_time: time | None = None
    duration_minutes: int
    speaker_aliases: list[str]
    room: str
    reminder_date: date | None = None

Clean. Typed. Totally unvalidated beyond basic types. day_of_week is never checked against session_date. end_time could be before start_time. reminder_date could be after the event. duration_minutes could contradict the time window. Every field is correct in isolation, and the model is wrong.

This is the bug from the opening. "Monday March 8th" passes every type check. The date is valid. The weekday is a real weekday. Pydantic says it's fine. The server creates the session. Four downstream artifacts, all wrong.

Adding validation to an existing spec

The spec was already built. Some tasks were done, some were in progress. Doesn't matter.

When I realized the gap, I added a new validation design standard to my spec workflow (validation patterns, error message quality, boundary checks) that Kiro applies as a review pass before (or sometimes after) implementation starts. Think of it as a standards review, but for the spec. I review the gaps it finds and use them as the input for the next iteration of the spec.

# Remediation Report — Round 2: Pydantic Validators

## Summary
- **Gaps:** 3
- **Overall:** The models use plain BaseModel with type hints only.
  No @field_validator or @model_validator. Validation is handled
  procedurally or not at all.

## Gaps

### GAP-1: Tool accepts any day_of_week without checking the date
- The model accepts any string for day_of_week, including weekdays
  that don't match session_date. This is the "Monday March 8th" bug.
- **Suggested action:** Add @model_validator that checks day_of_week
  against session_date.weekday()

### GAP-2: No cross-field time validation
- end_time can be before start_time. duration_minutes can contradict
  the time window. reminder_date can be after the event.
- **Suggested action:** Add @model_validator that checks all time
  relationships in one pass, collects all errors.

### GAP-3: No protection against empty speaker lists
- speaker_aliases accepts an empty list. A session with no speakers
  generates confirmations addressed to nobody.
- **Suggested action:** Add min_length=1 on the Field, or a
  @field_validator that rejects empty lists.

Then I told Kiro:

"I have a gap on the spec. We need to apply better validation. Take a look at my new remediation file and update the spec please."

Kiro updated all three layers (requirements, design, and tasks) and the new tasks showed up unchecked alongside the completed ones. That's the point: the spec isn't a planning document you write once and forget. It stays in sync with your code, and you can update it at any point: mid-build, after shipping, during a review.

Here's what the updated spec looks like:

Requirements: a new requirement with acceptance criteria, each one specific and testable:

### Requirement: Model-Level Input Validation

1. WHEN a create_session tool call includes a day_of_week that does
   not match the session_date, THEN the model SHALL raise a
   ValidationError with the actual weekday and the provided one
2. WHEN end_time is before or equal to start_time, THEN the model
   SHALL raise a ValidationError
3. WHEN duration_minutes does not match the time window between
   start_time and end_time, THEN the model SHALL raise a
   ValidationError
4. WHEN reminder_date is on or after session_date, THEN the model
   SHALL raise a ValidationError
5. WHEN speaker_aliases is empty, THEN the model SHALL raise a
   ValidationError
6. ALL validation errors SHALL be collected and returned in a single
   response with messages specific enough for the LLM to self-correct

Design: validators added directly to the model:

from datetime import date, time, datetime, timedelta
from pydantic import BaseModel, Field, field_validator, model_validator

WEEKDAYS = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]

class CreateSessionInput(BaseModel):
    title: "str = Field(min_length=1, max_length=200)"
    session_date: date
    day_of_week: str | None = Field(default=None)
    start_time: time
    end_time: time | None = Field(default=None)
    duration_minutes: int = Field(ge=10, le=480)
    speaker_aliases: list[str] = Field(min_length=1)
    room: str = Field(min_length=1)
    reminder_date: date | None = Field(default=None)

    @model_validator(mode='after')
    def check_model_coherence(self):
        errors = []

        # Weekday must match the date
        if self.day_of_week:
            actual = WEEKDAYS[self.session_date.weekday()]
            if actual.lower() != self.day_of_week.lower():
                errors.append(
                    f"{self.session_date} is a {actual}, "
                    f"not a {self.day_of_week}"
                )

        # End time must be after start time
        if self.end_time and self.end_time <= self.start_time:
            errors.append(
                f"end_time ({self.end_time}) must be after "
                f"start_time ({self.start_time})"
            )

        # Duration must match the time window
        if self.end_time:
            start_dt = datetime.combine(self.session_date, self.start_time)
            end_dt = datetime.combine(self.session_date, self.end_time)
            actual_minutes = (end_dt - start_dt).total_seconds() / 60
            if abs(actual_minutes - self.duration_minutes) > 5:
                errors.append(
                    f"duration_minutes ({self.duration_minutes}) doesn't "
                    f"match the time window ({int(actual_minutes)} minutes)"
                )

        # Reminder must be before the event
        if self.reminder_date and self.reminder_date >= self.session_date:
            errors.append(
                f"reminder_date ({self.reminder_date}) must be before "
                f"session_date ({self.session_date})"
            )

        if errors:
            raise ValueError(" | ".join(errors))
        return self

Tasks: new tasks with traceability, plus updates to existing ones:

- [ ] Add model_validator to CreateSessionInput
    - Weekday vs date check, end_time vs start_time, duration vs
      time window, reminder vs session date
    - Collect all errors in one pass, raise once
    - Requirements: 1–6

- [ ] Write tests for model-level validation
    - Valid input passes, each invalid combination raises
      ValidationError with specific message
    - Validates: Requirements 1–6

Kiro also updated existing tasks that were affected. The session creation task now validates before processing instead of assuming clean input.

From spec to code

Once the spec was updated, Kiro implemented the tasks. The validator lives in the model. Wiring it into the MCP tool is one line:

@mcp.tool()
def create_session(title: "str, session_date: str, day_of_week: str = None, ...) -> str:"
    validated = CreateSessionInput(
        title=title, session_date=session_date, day_of_week=day_of_week, ...
    )
    # Every field is now coherent as a unit

That's it. The validation runs before any downstream logic. If the input is incoherent, the tool returns the errors immediately. No confirmation drafted, no calendar invite created, no follow-up queued.

The cascade that didn't happen

The bad input:

CreateSessionInput(
    title="Kiro Deep Dive",
    session_date=date(2026, 3, 8),
    day_of_week="Monday",
    start_time=time(14, 0),
    duration_minutes=60,
    ...
)

Without model validation: the server creates the session. The confirmation email says "Monday March 8th." The calendar invite is scheduled for Sunday March 8th. The follow-up survey is timestamped against a date that doesn't match the agenda. Three downstream artifacts, all wrong, zero errors.

With model validation:

ValidationError: 2026-03-08 is a Sunday, not a Monday

One error, caught at the boundary, before anything is created. The agent self-corrects and retries with the right date.

What Pydantic catches that type checking doesn't

What the LLM sends	Why it's wrong	Pydantic catches it?
`date: "2026-03-08"` with `day: "Monday"`	March 8 is Sunday	✅ model_validator
`start: "14:00"` with `end: "13:30"`	End before start	✅ model_validator
`duration: 20` with a 30-min window	Duration mismatch	✅ model_validator
`reminder: "2026-03-10"` for a March 8 event	Reminder after event	✅ model_validator

Beyond This Example

The date-weekday check is one example of validation against silent errors. Others that might make sense for your MCP server:

Session overlap detection: is the speaker already presenting in another session at the same time? This is the kind of cross-record validation that no single model can catch on its own, but your server has the context to check.
Business hours sanity check: a session at 3 AM is technically valid but probably wrong for a community event.
Duplicate detection: flag if a session with the same title and date already exists.

How much validation is the right amount? That depends on the use case. You still want to be engaged and thinking about this. Not every field needs a custom validator. The goal is to catch the errors that propagate silently, not to validate everything the LLM could theoretically get wrong.

One thing that helps more than validation: good data models. Think about which fields are independent inputs and which are derived. If you accept start_time, end_time, and duration_minutes, you have three interrelated values, and now you need a validator just to check they agree. You could accept two and compute the third. Or you could keep all three and treat the redundancy as a checksum, the same way the weekday validates the date. The right call depends on whether the LLM is computing the value (checksum it) or the user is providing it directly (don't duplicate it).

The date-weekday case is a good example. Users say things like "schedule me a session for next Monday" and the LLM resolves that to a date. The weekday is the user's intent; the date is the LLM's computation. Accepting both and validating that they agree is how you catch the LLM's reasoning errors. That's not redundancy, it's a checksum.

Each new check is a new acceptance criterion in the spec. Add it, run the tasks, review the output. Same workflow, same patterns.

Try It Yourself

Get started with Kiro.

No spec yet? If you already have an MCP server but no spec, you can vibe in the validation directly, or you can ask Kiro to build a spec from your existing code first. Building the spec from your hand-crafted or vibe-coded MCP will document your design and help you, Kiro, and others understand how to add to it. It makes it easier to see what's validated and what isn't, and to add new checks systematically.

Have a spec? Describe the validation you want to add, and the new tasks show up alongside your existing ones.

Spec has drifted from the code? Maybe you wrote the spec, executed the tasks, and then kept building without updating it. Ask Kiro to update the spec based on the current code. It isn't foolproof, but it's a good way to get back in sync, and you might understand your own codebase better once you see what it's actually doing described as requirements and design decisions.

Field validation is table stakes. Model validation is where you stop the cascade.

About the Author

I'm David, a Technical Account Manager at AWS. My background is in software development, business systems, product development, and program management. I started building MCP servers and agent tooling not as a side project, but because I needed them. I quickly realized that making LLMs work with my existing workflows, rather than rebuilding everything around them, is the harder and more interesting problem. This article is a first example of that: a real bug, a real fix, and a workflow that scales.

I'm planning more articles in this space. A few topics I'm exploring:

Standards reviews for specs: encoding your best practices (validation patterns, error message quality, security checks) so they're applied automatically to every spec, not just the ones you remember to check
A spec workflow driven by "Start with Why": how I structure every spec around Why → How → What, so the problem shapes the requirements instead of the other way around
Working backwards from code to spec: how to reverse-engineer a spec from an existing codebase and use it to regain control of a project that's drifted

What would be most useful to you? Drop a comment. I'd love to hear what problems you're running into with your MCP servers.

DEV Community: David Tappert