TL;DR
You can stop babysitting AI agents by implementing three key systems: guardrails (hard constraints to prevent major failures), observability (detailed logs and metrics for visibility), and checkpoints (automatic pauses for human verification). With these in place, agents can run autonomously for hours, not just minutes. Tools like Apidog let you define strict API contracts, so your API layer acts as a safety net your agents can’t bypass.
Introduction
Last week I saw a developer spend 4 hours supervising an AI agent that was supposed to save time. Every few minutes, he’d step in, fix mistakes, and restart the process. At the end, he did more manual work than if he’d coded from scratch.
This is the babysitting problem—the main reason AI agents often fail to deliver value. The models are capable, but without the right setup, teams get stuck in constant supervision mode.
The root issue: most AI agent workflows treat LLMs like junior devs who need hand-holding. But LLMs are more like fast, unpredictable interns—they’ll confidently make mistakes unless you set clear boundaries.
💡 Tip: If you’re building APIs or working with AI agents that call APIs, Apidog helps you enforce those boundaries. By defining exact request/response schemas, you build contracts agents can’t accidentally break—a map instead of letting them wander.
Define API contracts your AI agents can follow
By the end of this guide, you’ll have:
- A mental model for agent autonomy
- Concrete patterns for guardrails, observability, and checkpoints
- Ready-to-use code examples
- A checklist to assess if an agent is ready for unsupervised execution
Why Agents Need Constant Supervision
AI agents fail in predictable ways. Knowing these failure modes helps you prevent them.
Failure mode 1: Scope creep
Request: “Add authentication to the API endpoint.”
Agent adds authentication, then rate limiting, then refactors the database, then deletes what it thinks are unused files. Why? No one told it when to stop. LLMs lack an innate sense of “done.”
Failure mode 2: Wrong abstractions
Ask an agent to “improve error handling” and it might add try-catch blocks everywhere—technically correct, practically useless. The agent follows literal instructions, missing the real intent.
Failure mode 3: Cascading failures
A small mistake in step 1 propagates through subsequent decisions. What starts as a typo becomes a broken API, failed tests, and hours of debugging. The agent doesn’t notice the failure at each step.
Failure mode 4: Resource exhaustion
Without constraints, agents can loop forever—retrying APIs, spawning sub-agents, or generating endless code until you hit a quota or billing limit.
The Autonomy Framework: Guardrails, Observability, Checkpoints
Solve these problems with three layers—a pyramid:
- Guardrails (bottom, prevention)
- Observability (middle, detection)
- Checkpoints (top, human-in-the-loop recovery)
Layer 1: Guardrails (prevention)
Guardrails are hard rules your agent cannot break, enforced by code.
Hard constraints via code:
# Don't just tell the agent what not to do. Enforce it.
import os
from pathlib import Path
ALLOWED_DIRECTORIES = {"src", "tests", "docs"}
def validate_file_path(path: str) -> bool:
"""Agent cannot write outside allowed directories."""
abs_path = Path(path).resolve()
return any(
str(abs_path).startswith(str(Path(d).resolve()))
for d in ALLOWED_DIRECTORIES
)
def agent_write_file(path: str, content: str):
if not validate_file_path(path):
raise ValueError(f"Cannot write to {path}: outside allowed directories")
with open(path, 'w') as f:
f.write(content)
API schema constraints:
When agents call APIs, use schemas to reject malformed requests. Apidog enforces these contracts.
// apidog-schema.ts
export const CreateUserSchema = {
type: 'object',
required: ['email', 'name'],
properties: {
email: { type: 'string', format: 'email' },
name: { type: 'string', minLength: 1, maxLength: 100 },
role: { type: 'string', enum: ['user', 'admin', 'guest'] }
},
additionalProperties: false
}
// Validate before calling API
function validateRequest(schema: object, data: unknown): void {
const valid = ajv.validate(schema, data)
if (!valid) {
throw new Error(`Invalid request: ${JSON.stringify(ajv.errors)}`)
}
}
Budget constraints:
import time
from dataclasses import dataclass
@dataclass
class AgentBudget:
max_steps: int = 50
max_tokens: int = 100000
max_time_seconds: int = 600
max_api_calls: int = 100
class BudgetEnforcer:
def __init__(self, budget: AgentBudget):
self.budget = budget
self.start_time = time.time()
self.steps = 0
self.tokens_used = 0
self.api_calls = 0
def check(self) -> bool:
elapsed = time.time() - self.start_time
if self.steps >= self.budget.max_steps:
raise RuntimeError(f"Step limit reached: {self.steps}")
if self.tokens_used >= self.budget.max_tokens:
raise RuntimeError(f"Token limit reached: {self.tokens_used}")
if elapsed >= self.budget.max_time_seconds:
raise RuntimeError(f"Time limit reached: {elapsed:.0f}s")
if self.api_calls >= self.budget.max_api_calls:
raise RuntimeError(f"API call limit reached: {self.api_calls}")
return True
def record_step(self, tokens: int, api_calls: int = 0):
self.steps += 1
self.tokens_used += tokens
self.api_calls += api_calls
self.check()
Layer 2: Observability (detection)
With long-running agents, you need visibility into what they’re doing.
Structured logging:
import json
from datetime import datetime
from typing import Any
class AgentLogger:
def __init__(self, log_file: str = "agent_trace.jsonl"):
self.log_file = log_file
self.entries = []
def log(self, event: str, data: dict[str, Any] | None = None):
entry = {
"timestamp": datetime.utcnow().isoformat(),
"event": event,
"data": data or {}
}
self.entries.append(entry)
with open(self.log_file, 'a') as f:
f.write(json.dumps(entry) + '\n')
def log_decision(self, decision: str, reasoning: str, confidence: float):
self.log("decision", {
"decision": decision,
"reasoning": reasoning,
"confidence": confidence
})
def log_action(self, action: str, params: dict, result: str):
self.log("action", {
"action": action,
"params": params,
"result": result[:200]
})
def log_error(self, error: str, context: dict):
self.log("error", {
"error": error,
"context": context
})
# Usage in agent
logger = AgentLogger()
logger.log_decision(
decision="Add rate limiting to API",
reasoning="Current endpoint has no protection against abuse",
confidence=0.85
)
logger.log_action(
action="write_file",
params={"path": "src/middleware/rate-limit.ts"},
result="Successfully wrote 45 lines"
)
Metrics dashboard:
from collections import Counter
from dataclasses import dataclass, field
@dataclass
class AgentMetrics:
actions_taken: Counter = field(default_factory=Counter)
files_modified: list[str] = field(default_factory=list)
api_calls: dict[str, int] = field(default_factory=dict)
errors: list[str] = field(default_factory=list)
decisions_by_confidence: dict[str, int] = field(default_factory=lambda: {
"high (>0.9)": 0,
"medium (0.7-0.9)": 0,
"low (<0.7)": 0
})
def record_action(self, action: str):
self.actions_taken[action] += 1
def record_file_modification(self, path: str):
if path not in self.files_modified:
self.files_modified.append(path)
def record_api_call(self, endpoint: str):
self.api_calls[endpoint] = self.api_calls.get(endpoint, 0) + 1
def record_error(self, error: str):
self.errors.append(error)
def record_decision(self, confidence: float):
if confidence > 0.9:
self.decisions_by_confidence["high (>0.9)"] += 1
elif confidence >= 0.7:
self.decisions_by_confidence["medium (0.7-0.9)"] += 1
else:
self.decisions_by_confidence["low (<0.7)"] += 1
def summary(self) -> str:
return f"""
Agent Metrics Summary
=====================
Actions: {dict(self.actions_taken)}
Files modified: {len(self.files_modified)}
API calls: {self.api_calls}
Errors: {len(self.errors)}
Decisions by confidence: {self.decisions_by_confidence}
"""
Layer 3: Checkpoints (recovery)
Checkpoints are automatic pauses for human verification—catch issues early.
Automatic checkpoints:
from enum import Enum
from typing import Callable
from dataclasses import dataclass
class CheckpointTrigger(Enum):
BEFORE_FILE_WRITE = "before_file_write"
BEFORE_API_CALL = "before_api_call"
BEFORE_GIT_COMMIT = "before_git_commit"
BEFORE_DELETE = "before_delete"
AFTER_N_STEPS = "after_n_steps"
@dataclass
class Checkpoint:
trigger: CheckpointTrigger
description: str
data: dict
requires_approval: bool = True
class CheckpointManager:
def __init__(self, auto_approve: set[CheckpointTrigger] | None = None):
self.auto_approve = auto_approve or set()
self.pending: list[Checkpoint] = []
def create_checkpoint(
self,
trigger: CheckpointTrigger,
description: str,
data: dict
) -> bool:
if trigger in self.auto_approve:
return True
checkpoint = Checkpoint(
trigger=trigger,
description=description,
data=data
)
self.pending.append(checkpoint)
return False
def approve(self, checkpoint_id: int) -> None:
if 0 <= checkpoint_id < len(self.pending):
self.pending.pop(checkpoint_id)
def reject(self, checkpoint_id: int) -> None:
raise RuntimeError(f"Checkpoint rejected: {self.pending[checkpoint_id]}")
# Usage
checkpoints = CheckpointManager(
auto_approve={CheckpointTrigger.BEFORE_FILE_WRITE}
)
if not checkpoints.create_checkpoint(
trigger=CheckpointTrigger.BEFORE_DELETE,
description="About to delete src/legacy/ directory",
data={"path": "src/legacy/", "files": ["old_handler.ts", "deprecated.ts"]}
):
# Wait for human approval
agent.pause("Waiting for approval to delete files")
Building Autonomous Agents with Apidog
When AI agents interact with APIs, malformed requests are a major risk. Apidog lets you define strict API schemas and generate validated clients for your agents.
Setting up API contracts:
- Import or define your OpenAPI spec in Apidog
- Generate client code with built-in validation
- Provide the validated client to your agent (not raw HTTP)
// Don't let agent call APIs directly
const response = await fetch('/api/users', {
method: 'POST',
body: JSON.stringify(data) // No validation
})
// Use validated client
import { UsersApi } from './generated/apidog-client'
const usersApi = new UsersApi()
// Agent can only send valid requests - schema enforced
const response = await usersApi.createUser({
email: 'user@example.com',
name: 'Test User',
role: 'user' // Must match enum
})
Now your API layer acts as a guardrail—the agent cannot send invalid data.
Generate validated API clients for your AI agents
Proven Patterns and Common Mistakes
Pattern 1: The Approval Sandwich
For risky operations, require approval both before and after the action.
def risky_operation(agent, operation):
# Pre-approval
if not agent.checkpoint(f"About to: {operation.description}"):
return "Cancelled by user"
# Execute
result = operation.execute()
# Post-approval
if not agent.checkpoint(f"Verify result of: {operation.description}"):
operation.rollback()
return "Rolled back by user"
return result
Pattern 2: Confidence Thresholds
Don’t let agents act on low-confidence decisions.
MIN_CONFIDENCE = 0.75
def agent_decide(options: list[dict]) -> dict:
best = max(options, key=lambda x: x.get('confidence', 0))
if best['confidence'] < MIN_CONFIDENCE:
# Escalate to human
return {
'action': 'escalate',
'reason': f"Best option has confidence {best['confidence']:.2f} < {MIN_CONFIDENCE}",
'options': options
}
return best
Pattern 3: Idempotent Operations
Make agent actions repeatable and safe.
import hashlib
def idempotent_write(path: str, content: str) -> bool:
"""Only write if content changed."""
content_hash = hashlib.sha256(content.encode()).hexdigest()
existing_hash = None
if os.path.exists(path):
with open(path, 'r') as f:
existing_hash = hashlib.sha256(f.read().encode()).hexdigest()
if content_hash == existing_hash:
logger.log_action("write_file", {"path": path}, "Skipped - no changes")
return False
with open(path, 'w') as f:
f.write(content)
logger.log_action("write_file", {"path": path}, f"Wrote {len(content)} bytes")
return True
Common Mistakes to Avoid
- Trusting prompts as constraints: “Don’t delete files” in a prompt isn’t a real constraint. File permissions are.
- No rollback plan: Always use git or backups so you can undo mistakes.
- Ignoring confidence scores: Most LLMs provide or can be prompted for confidence. Low confidence? Pause and escalate.
- Over-monitoring: If you’re watching every step, it’s not automation—just manual work with extra steps.
- Under-specifying success: The agent needs a clear completion signal. “Fix the bug” is vague. “Fix the bug and all tests pass” is actionable.
Alternatives and Comparisons
| Approach | Autonomy | Risk | Best for |
|---|---|---|---|
| Manual coding | None | Low | Complex, critical work |
| Pair programming with AI | Low | Low | Learning, exploration |
| Supervised agents | Medium | Medium | Routine tasks |
| Autonomous agents with guardrails | High | Controlled | Bulk operations, migrations |
| Fully autonomous agents | Very high | High | Trusted, well-tested workflows |
Most teams should aim for “autonomous with guardrails”—it delivers 80% of the time savings with just 10% of the risk.
Real-World Use Cases
Codebase migration:
A team migrated 200 API endpoints from REST to GraphQL using an autonomous agent. Guardrails blocked schema changes; checkpoints required approvals before deleting old endpoints. Migration finished in 3 days (not 3 weeks) with zero production incidents.
Documentation generation:
An agent auto-generates API docs from code. Guardrails restrict it to specific directories. Checkpoints pause before publishing, so the team reviews docs weekly instead of writing manually.
Test coverage:
An agent analyzes code and writes missing tests. Budget constraints prevent runaway generation. Confidence thresholds flag uncertain tests. Coverage improved from 60% to 85% in a month.
Wrapping Up
Key takeaways:
- AI agents fail in predictable ways: scope creep, wrong abstractions, cascading failures, resource exhaustion
- Three layers solve most issues: guardrails (prevention), observability (detection), checkpoints (recovery)
- Guardrails should be enforced in code—not prompts
- Observability = structured logs and metrics, not manual supervision
- Checkpoints let humans verify decisions at key moments
- API schemas from Apidog make your API layer a guardrail
Your next steps:
- Identify your most repetitive AI-assisted task
- Define guardrails: what must the agent never do?
- Add structured logging to monitor behavior
- Create checkpoints for high-risk operations
- Let it run for 30 minutes and review the logs
The goal isn’t to eliminate humans, but to have them in the right part of the loop: making high-level decisions, not fixing low-level errors.
Build API guardrails for your AI agents—free
FAQ
What’s the difference between an AI agent and an AI assistant?
An assistant waits for your next instruction and responds. An agent takes a goal and autonomously plans and executes steps. Assistants need you at every step; agents run until a checkpoint or completion.
How do I know if my agent is ready to run autonomously?
Run it in supervised mode for 10 sessions. Track interventions. If you intervene less than twice per session, and only for minor clarifications, it’s ready. Frequent or major interventions? Add more guardrails.
What’s the biggest risk with autonomous agents?
Cascading failures the agent doesn’t detect. Small mistakes can snowball. Checkpoints break the chain, forcing verification.
Can I use these patterns with any LLM?
Yes. Guardrails, observability, and checkpoints are model-agnostic—use with Claude, GPT-4, Gemini, etc.
How much does observability slow down the agent?
Negligible—logging is fast. The main slowdown is from checkpoints waiting for human input. Use checkpoints only at high-risk moments for maximum autonomy.
What if the agent makes a decision I disagree with?
Checkpoints enable you to reject those decisions. The agent can roll back or try another approach. Also, update your instructions to reflect preferences.
Should I start with supervised or autonomous agents?
Start supervised. Add checkpoints to every significant action until trust builds. Gradually reduce checkpoints on low-risk operations.
How does Apidog specifically help with AI agents?
Apidog generates validated API clients from your schemas. Agents using these clients can’t send malformed requests—a whole class of errors is prevented before reaching your backend.
Top comments (0)