Stop Building Chatbots: The Case for Infrastructure-Driven AI Agents

#agents #ai #architecture #serverless

Everyone is building chatbots right now. They are the “Hello World” of the GenAI era. But in the real world applications, the real value of AI is not in conversation — it’s in execution.
Real business value comes from AI agents that take actions, make decisions for you when possible, integrate with tools and systems, and of course operate within strict governance and audit boundaries.
When people build AI agents they have two options:

To include the multi-step reasoning and decision logic inside the code.
Use agentic workflow with serverless services like AWS Step Functions and AWS Lambda The first option takes time and effort and eventually becomes a liability because of the following:

No clear audit trail
Limited observability
Fragile retries
Painful debugging
Hard-to-enforce human approval

In this post, I explain the agentic workflow. Instead of orchestrating AI in code, we move orchestration into infrastructure using:

AWS Step Functions for explicit control flow
Amazon Bedrock for multimodal and text reasoning
AWS Lambda for integration and validation

The result is an enterprise-grade AI orchestration pattern that is observable, auditable, secure, and production-ready.
I prefer to explain the concepts using hands-on and realistic use cases, so let's talk about a practical use case!

Automated Insurance Claim Processing

We are building an Automated Insurance Claim Processor that:

Analyzes an uploaded image of a car accident (multimodal AI).
Estimates repair cost and damage severity.
Retrieves the customer’s policy limits.
Decides whether to:
- Auto-approve the claim, or
- Pause for human review based on confidence and risk. This is not “just prompting.” It’s a multi-step decision workflow with financial and regulatory impact.

1. The Architecture:

Workflow Breakdown

1.Trigger (Image Upload): An S3 upload event initiates the Step Functions state machine.
2.Validate Input (AWS Lambda): Checks file integrity before invoking AI models.
3.AI Vision Analysis (Amazon Bedrock): Uses Claude 3.5 or Amazon Nova to analyze the damage.

- Pro-Tip: Native Structured Output: Bedrock now supports JSON Mode via the Converse API. By providing a schema, you guarantee valid JSON output, eliminating the need for "cleaning" or "validation" Lambdas.
To implement this, you define a schema that Bedrock uses to constrain its response. Here is the specific JSON Schema you can use:

{
  "type": "object",
  "properties": {
    "damage_type": { "type": "string" },
    "estimated_cost": { "type": "number" },
    "severity_score": { "type": "integer", "minimum": 1, "maximum": 5 },
    "confidence_score": { "type": "number", "minimum": 0, "maximum": 1 }
  },
  "required": ["damage_type", "estimated_cost", "severity_score", "confidence_score"]
}

4.Fetch Policy Data (AWS Lambda & DynamoDB): Retrieves coverage limits for comparison.
5.Choice State (Risk Assessment): Orchestrator compares AI estimates against policy data to decide between Auto-Approval or Human Review.
6.Final Update (DynamoDB): Records the outcome and full audit trail.

2. Human-in-the-loop with callback tasks

This is where Step Functions truly shines.
If the AI estimates that the cost is above policy auto-approval limits, or the confidence is below a defined threshold, then the workflow must not finalize the claim automatically.
Why do we use the Callback Pattern (.waitForTaskToken):

A Choice state detects high risk.
The workflow pauses execution and waits.
An SNS notification is sent to a human adjuster.

3. Context Checkpointing (Managing Token & Payload Limits)

Long-running agent workflows suffer from:

LLM context window limits
Exploding token costs
Step Functions payload size limits (256 KB)

We solve this with context checkpointing.
In this Checkpointing pattern we use DynamoDB to store summarized agent context, and S3 to store large artifacts like images.

How it works?

After a major reasoning step, the agent state is summarized.
A History Manager Lambda will use a smaller and cheaper model to produce a concise summary.
The summary is stored in DynamoDB.
The next step retrieves only the summary.

This ensures that:

Step Functions payloads is small
DynamoDB items is under the 400 KB limit
Bedrock token usage is predictable And we can use TTLs in DynamoDB to enforce data retention policies.

Why this wins over script-based agents?

As serverless builders, we prefer explicit orchestration over hidden loops.
Visibility
Failures are visible directly in the Step Functions console. You see which state failed and why, no log archaeology required.
Retries & Resilience
AI APIs fail. Networks glitch. Throttling happens.
Step Functions provides:

Built-in retries with exponential backoff
Explicit failure paths
Idempotent replays

Governance & Compliance
Human approval is auditable, secure, and enforceable. Not a while loop buried in a container.

Conclusion
The future of AI systems isn’t just better models, it’s better orchestration.
By treating AI prompts, decisions, and approvals as explicit infrastructure steps, we move from experimental demos to enterprise-grade autonomous systems.
The question is no longer: “How smart is the model?”
But: “Can we trust, observe, govern, and replay its decisions?”
Are you orchestrating your AI agents in code — or in infrastructure? Let’s discuss in the comments.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.