DEV Community

Cover image for Agentic AI: Schema-Validated Tool Execution and Deterministic Caching
Sudarshan Gouda
Sudarshan Gouda

Posted on

Agentic AI: Schema-Validated Tool Execution and Deterministic Caching

Agentic AI systems do not fail because models cannot reason.They fail because tool execution is unmanaged.

Once agents are allowed to plan, retry, self-criticize, or collaborate, tool calls multiply rapidly. Without strict controls, this leads to infrastructure failures, unpredictable cost growth, and non-deterministic behavior.

This article explains how to engineer the tool execution layer of an agentic AI system using two explicit and independent mechanisms:

  1. Contract-driven tool execution
  2. Deterministic tool result caching

Each mechanism solves a different class of production failures and must be implemented separately.


Real Production Scenario

Context

You are building an Incident Analysis Agent for SRE teams.

What the agent does

  • Fetch logs for a service
  • Analyze error patterns
  • Re-fetch logs if confidence is low
  • Allow a second agent (critic) to validate findings

Tool characteristics

  • Tool name: fetch_service_logs
  • Backend: Elasticsearch / Loki / Splunk
  • Latency: 300–800 ms
  • Rate-limited
  • Expensive per execution

This is a common real-world agent workload.


Part I: Contract-Driven Tool Execution in Agentic AI Systems

The problem without contracts

When LLMs emit tool arguments directly, the runtime receives inputs like:

{"service": "auth", "window": "24 hours"}
{"service": "Auth Service", "window": "yesterday"}
{"service": ["auth"], "window": 24}
{"service": "", "window": "24h"}
Enter fullscreen mode Exit fullscreen mode

Why this happens

  • LLMs reason in natural language
  • LLMs paraphrase arguments
  • LLMs are not type-safe systems

What breaks in production

  • Invalid Elasticsearch queries
  • Full index scans
  • Query builder crashes
  • Silent data corruption
  • Retry loops amplify failures

Relying on the model to always produce valid input is not system design.


What contract-driven tool execution means

Contract-driven execution means:

  • The runtime owns the tool interface
  • The model must conform to that interface
  • Invalid input never reaches infrastructure

This is the same boundary enforcement used in production APIs.


Step 1: Define a strict tool contract

from pydantic import BaseModel, Field, field_validator
import re
from typing import List

class FetchServiceLogsInput(BaseModel):
    service: str = Field(
        ...,
        description="Kubernetes service name, lowercase, no spaces"
    )
    window: str = Field(
        ...,
        description="Time window format: 5m, 1h, 24h"
    )

    @field_validator("service")
    @classmethod
    def validate_service(cls, value: str) -> str:
        if not value:
            raise ValueError("service cannot be empty")

        if not re.fullmatch(r"[a-z0-9\-]+", value):
            raise ValueError(
                "service must be lowercase alphanumeric with dashes"
            )
        return value

    @field_validator("window")
    @classmethod
    def validate_window(cls, value: str) -> str:
        if not re.fullmatch(r"\d+(m|h)", value):
            raise ValueError(
                "window must be like 5m, 1h, 24h"
            )
        return value


class FetchServiceLogsOutput(BaseModel):
    logs: List[str]
Enter fullscreen mode Exit fullscreen mode

What these validations prevent

Invalid input Prevented issue
Empty service Full log scan
Mixed case or spaces Query mismatch
Natural language time Ambiguous queries
Lists or numbers Query builder crashes

Nothing reaches infrastructure unless it passes this gate.


Step 2: Implement the actual tool

def fetch_service_logs(service: str, window: str) -> list[str]:
    print(f"QUERY logs for service={service}, window={window}")
    return [
        f"[ERROR] timeout detected in {service}",
        f"[WARN] retry triggered in {service}",
    ]
Enter fullscreen mode Exit fullscreen mode

Step 3: Runtime-owned tool registry

TOOLS = {
    "fetch_service_logs": {
        "version": "v1",
        "input_model": FetchServiceLogsInput,
        "output_model": FetchServiceLogsOutput,
        "handler": fetch_service_logs,
        "cache_ttl": 3600,
    }
}
Enter fullscreen mode Exit fullscreen mode

The agent cannot invent tools, bypass schemas, or change versions.


Step 4: Contract-driven execution boundary

def execute_tool_contract(tool_name: str, raw_args: dict):
    tool = TOOLS[tool_name]

    args = tool["input_model"](**raw_args)
    raw_result = tool["handler"](**args.model_dump())

    return tool["output_model"](logs=raw_result)
Enter fullscreen mode Exit fullscreen mode

Execution flow for contract enforcement

Agent emits tool call
        ↓
Raw arguments (untrusted)
        ↓
Schema validation
   ┌───────────────┐
   │ Invalid       │ → reject and replan
   └───────────────┘
          ↓
       Valid
          ↓
Tool executes
          ↓
Infrastructure queried safely
Enter fullscreen mode Exit fullscreen mode

Part II: Deterministic Caching in Agentic AI Systems

The problem after contracts are added

Even with perfect validation, agents repeat work.

execute_tool_contract(
    "fetch_service_logs",
    {"service": "auth-service", "window": "24h"}
)

execute_tool_contract(
    "fetch_service_logs",
    {"window": "24h", "service": "auth-service"}
)
Enter fullscreen mode Exit fullscreen mode

Same intent.

Same backend.

Executed twice.

Why naive caching fails

{"service": "auth-service", "window": "24h"}
{"window": "24h", "service": "auth-service"}
Enter fullscreen mode Exit fullscreen mode

Different strings, same meaning.

Agentic systems require semantic equivalence, not string equality.


Infrastructure required for deterministic caching

  • Redis as shared cache
  • Hash-based cache keys
  • Tool-level TTL
  • Canonicalization logic

Redis is used because it is fast, shared across agents, and supports expiration.


Step 1: Canonicalize validated arguments

def canonicalize(tool_name: str, args, version: str) -> str:
    values = "|".join(str(v) for v in args.model_dump().values())
    return f"{tool_name}|{values}|{version}"
Enter fullscreen mode Exit fullscreen mode

Example canonical form:

fetch_service_logs|auth-service|24h|v1
Enter fullscreen mode Exit fullscreen mode

Step 2: Cache setup

import redis
import hashlib
import json

redis_client = redis.Redis(host="localhost", port=6379)

def cache_key(canonical: str) -> str:
    return hashlib.sha256(canonical.encode()).hexdigest()
Enter fullscreen mode Exit fullscreen mode

Step 3: Cached tool execution

def execute_tool_cached(tool_name: str, raw_args: dict):
    tool = TOOLS[tool_name]

    args = tool["input_model"](**raw_args)

    canonical = canonicalize(
        tool_name,
        args,
        tool["version"]
    )
    key = cache_key(canonical)

    cached = redis_client.get(key)
    if cached:
        print("CACHE HIT — skipping infra call")
        return tool["output_model"](**json.loads(cached))

    print("CACHE MISS — executing tool")

    raw_result = tool["handler"](**args.model_dump())
    validated = tool["output_model"](logs=raw_result)

    redis_client.setex(
        key,
        tool["cache_ttl"],
        validated.model_dump_json()
    )

    return validated
Enter fullscreen mode Exit fullscreen mode

Execution flow for deterministic caching

Validated tool request
        ↓
Canonicalization
        ↓
Hash generation
        ↓
Redis lookup
   ┌───────────────┐
   │ Cache HIT     │ → return cached result
   └───────────────┘
          ↓
       Cache MISS
          ↓
Execute expensive tool
          ↓
Validate output
          ↓
Store result with TTL
          ↓
Return result
Enter fullscreen mode Exit fullscreen mode

Separation of responsibilities

Problem Solved by
Invalid input Contract-driven execution
Infrastructure crashes Contract-driven execution
Duplicate execution Deterministic caching
Cost explosion Deterministic caching

Final takeaway

Agentic AI systems become production-ready when tool execution is engineered like backend infrastructure, not treated as an LLM side effect.

Contracts make execution safe.

Caching makes execution scalable.

Skipping either guarantees failure.

Top comments (0)