Seenivasa Ramadurai

Posted on Mar 12

Why Production AI Agents Are Hard & How Amazon Bedrock AgentCore Makes Them Production Ready

#ai #architecture #aws #agents

Introduction

Over the past couple of years, I have architected and delivered a significant number of agentic AI applications across enterprise environments. Many of these deployments ran on Azure infrastructure using Azure Web Apps for lightweight agent endpoints and Azure Container Apps for more sophisticated multi agent systems that required orchestration, scaling, and reliable session routing.

In building these systems, I have repeatedly implemented the underlying foundations myself credential vaults, memory pipelines, observability layers, and isolation mechanisms. After doing this enough times, you develop a clear understanding of both how long these pieces take to build and where the real production challenges tend to surface.

When I first evaluated Amazon Bedrock AgentCore, it was the first platform I encountered that appeared to address many of these challenges holistically. Not just through surface level abstractions, but with production grade depth designed for real world deployments.

That practical experience is the perspective I bring to this blog.

Before we talk about AWS Bedrock AgentCore, we need to answer a more fundamental question what exactly is an AI agent?, and why is it so different from a regular chatbot or API call?

What is an AI Agent?

“An AI agent is a software system that uses a large language model not just to generate text, but to reason, plan, take actions, and work toward a goal often across multiple steps, over time, with minimal human involvement.”

Most people encounter AI through a prompt response loop type something in, get something back. That model is useful, but it is fundamentally passive. The language model sits in a box, waits to be asked, generates text, and stops.

An AI agent is something entirely different. Think of a brilliant expert locked in a room with no tools. They can give extraordinary advice but they cannot act on it. Give that same expert a phone, a laptop, access to databases, the ability to send emails, run code, and call APIs. They no longer just advise. They act, verify, execute, and report back. That is the agentic paradigm.

Formally, an AI agent is a software system that uses a large language model not just to generate text, but to reason, plan, take actions, and work toward a goal — across multiple steps, over time, with minimal human direction.

“An AI agent doesn’t just answer your question. It takes on your objective, plans a path to achieve it, executes that plan, monitors its own progress, and self corrects when things go wrong without you directing each step.”

A Concrete Example

Ask an agent: “Find our top three open support tickets today, check each against the known issues database, draft replies, and email them to the support team.”

A plain language model cannot do this it has no access to your ticketing system, knowledge base, or email infrastructure. An AI agent handles the entire workflow end to end.

Step 1: Query the ticketing tool for today’s open critical tickets

Step 2: Search the knowledge base for related known issues

Step 3: Reason about which tickets match which issues

Step 4: Draft personalized reply emails using the LLM

Step 5: Send those emails via the email API (This may Tool /MCP server)

The LLM is the reasoning engine. The tools are how the agent reaches into real systems. And it does not stop after one response it pursues the objective through every step until the goal is met.

Agents Are Goal Driven

The most critical characteristic of an AI agent and the one most often glossed over is that it is goal driven, not prompt driven.

Prompt driven systems (plain LLMs) receive an input and produce an output. The interaction is complete. No awareness of a broader objective, no adaptation if the first attempt fails.

Goal driven systems (agents) receive an objective and autonomously determine the steps, tool calls, and decisions required to achieve it. They persist, adapt, retry, and self correct until the goal is met or explicitly report that it cannot be.

The Agentic Loop:

Observe, Think, Act, Repeat
The mechanics of goal driven behaviour are captured in what is called the agentic loop the cognitive cycle every agent runs until its objective is achieved. Strands Agents, AWS’s own open source framework, describes this as its core architecture. in each loop iteration the model is invoked with the prompt, agent context, and available tools, and it decides whether to respond in natural language, plan next steps, reflect on prior results, or select one or more tools to use. This loop continues until the task is complete.

1. Observe

The agent reads its current goal and decomposed sub goals. It reviews all results from prior steps. It retrieves relevant short term memory. It incorporates new information from the environment since the last cycle.

2. Think

The LLM reasons over accumulated context and available tools to determine the single best next action. It outputs either a tool call with exact parameters, or if the goal is satisfied a final answer. Planning capable models may first decompose the goal into an explicit sub task sequence.

3. Act

If a tool call was selected, the framework executes the real function calling an API, querying a database, running code, navigating a browser, or invoking any registered tool. If a Human In The Loop(HITL) checkpoint is configured, the agent pauses and waits for approval. The result is captured and fed back into context.

4. Loop

T*he action result becomes new input to the next Observe phase*. Is the goal achieved? If yes, the agent produces its final answer and terminates. If no, the loop continues potentially dozens or hundreds of iterations for complex, long running tasks.

ReAct: The Reasoning Pattern Behind the Loop

ReAct (Reason + Act) is the foundational prompting pattern that implements the agentic loop. The agent produces a structured Thought describing its reasoning, followed by an Action (the chosen tool call), followed by an Observation incorporating the result. This interleaved reasoning action format makes goal pursuit explicit, traceable, and debuggable. It is the pattern that AWS Strands Agents uses internally to power its model driven approach.

Goal: Find top 3 critical GitHub bugs and draft fixes.

Thought: I need to query GitHub for open issues labelled critical.

Action: github_search(repo=‘company/api’, label=‘critical’, state=‘open’, limit=3)

Observation: Returned issues #16, #5018, #18.

Thought: I have the issues. Now I need each affected code file.

Action: github_get_file(repo=‘company/api’, path=‘src/auth.py’)

Observation: File retrieved.

Thought: Enough context. Drafting fix for issue #16.

… [loop continues for all 3 bugs] …

Final Answer: Three bugs with proposed patches attached.

Self Correction and Adaptive Planning
Goal driven agents are not fragile pipelines. When an action fails a tool returns an error, a query finds nothing, a generated solution does not compile the agent observes the failure, reasons about the cause, and adapts. A failed search triggers a reformulated query. An unavailable tool triggers a fallback. A complex goal gets decomposed into smaller achievable sub goals. When the agent determines a goal truly cannot be achieved, it surfaces a clear explanation rather than silently returning empty results.

HITL Human In The Loop

Goal driven does not mean unsupervised. Production agents are designed with explicit human in the loop checkpoints moments where the agent pauses, presents its proposed action, and waits for approval before taking any irreversible step, sending emails, deleting records, initiating payments, deploying code. AgentCore Runtime’s bi-directional WebSocket streaming makes these pause-and-resume flows practical within long running sessions, enabling real-time human collaboration without terminating and restarting the session.

The 4 Pillars of Every Production AI Agent

Pillar 1

Tools: How Agents Act on the external or real World
Without tools, a goal driven agent has nowhere to go. Tools allow agents to reach beyond language generation into real business systems.

Read tools retrieve information: database queries, document reads, semantic search against knowledge bases, API calls to Salesforce, GitHub, Jira, Slack, and any other SaaS tool.

Write tools create or modify data: email senders, database writers, file generators, CRM updaters, ticket creators, calendar schedulers.

Execution tools run processes: code interpreters, browser automation for web based applications that have no API, and shell command runners.

The production challenge: A prototype might hard code three tools. An enterprise deployment often needs fifty tools across ten SaaS platforms, each with its own authentication scheme, error patterns, and schema. Tool management becomes a major engineering project on its own.

Pillar 2

Memory: How Agents Remember
Language models (LLM) are stateless. Every API call starts blank. For an agent serving the same user across weeks of ongoing work, statelessness is a fundamental blocker.

Short term memory covers the active session: conversation history, task state, intermediate tool results, and reasoning steps. It requires intelligent summarization to manage the LLM’s context window limits without losing critical thread.

Long term memory persists across sessions. User preferences, past project outcomes, accumulated domain knowledge, and learned patterns must survive session end and be retrievable in future sessions. This requires extraction logic, persistent storage, and semantic retrieval.

Episodic memory is the most powerful form: storing specific past experiences what the agent tried, what worked, what failed, what the outcome was so it can recall and apply successful strategies in future similar situations. This is the mechanism by which agents genuinely improve over time.

Pillar 3

Observability: How Agents Are Understood and Governed
When an AI agent produces a wrong output after twelve reasoning steps and seven tool calls, traditional logs tell you almost nothing useful. You cannot search for ‘sessions where the agent called the wrong tool’ in standard APM tools.

“You cannot safely govern what you cannot observe. For AI agents in enterprise production, observability is not optional it is the difference between a system you can audit and a black box waiting to cause a compliance incident.”

Agent native observability must capture the full reasoning chain in step by step order, every tool invocation with exact inputs and outputs, every LLM prompt and response with token counts, decision points where the agent chose between alternatives, failure attribution pinpointing which specific step caused a wrong downstream output, and token consumption per step for cost control. Without this, AI assisted decisions in regulated environments cannot be explained, investigated, or defended.

Pillar 4

MCP Server bridge the agent and external Data sources and MCP solves the M×N Integration Problem

MCP: The Universal Connectivity Standard (USB-C)

For years, every team connecting agents to external services built bespoke adapters custom code per tool, per framework, per model. This created the classic M×N integration problem if there are M agent frameworks and N external services, teams end up building M × N separate integrations.

A LangChain Salesforce connector did not work with a Strands agent. Every framework switch meant rewriting all integrations. As the number of models, frameworks, and enterprise systems grew, the integration burden multiplied.

MCP the Model Context Protocol is the open standard that ended this fragmentation. Published by Anthropic in 2024 and now adopted across the industry by AWS, Microsoft, Google, and others, MCP defines a universal language for agent-to-tool communication.

Instead of building M × N bespoke connectors, developers can build one MCP server for a data source, and any MCP compatible agent regardless of framework or model can connect to it immediately. In effect, MCP transforms the integration landscape from M × N complexity to reusable connectivity, much like USB-C standardized hardware connectivity across devices.

The MCP architecture is built around three roles:

MCP Host —the agent framework that initiates connections and sends tool requests

MCP Server —the lightweight connector process wrapping an external service

MCP Resources and Tools —the capabilities exposed: actions the agent can invoke, data sources it can read, and prompt templates it can use

By introducing a standard protocol layer, MCP removes the need to repeatedly rebuild integrations and enables true interoperability across agent frameworks, models, and enterprise systems.

The Production Gap Why Building Enterprise AI Agents Is Mostly an Infrastructure Problem?

Across nearly every enterprise agent project, the same pattern **appears. Before the agent logic can even be written, **engineering teams must build a large amount of supporting infrastructure, including:

Session routing
Credential vaults
Memory extraction pipelines
Observability wiring
Multi tenant context isolation
Policy enforcement

In practice, a substantial portion of early development effort goes into these foundations before the agent’s intelligence is implemented.

Let’s walk through the key engineering challenges that create this gap.

Problem 1: Infrastructure for Long Running Stateful Sessions

Traditional serverless platforms are designed for short lived, stateless workloads.

Agents behave very differently.

They often require long running, stateful execution environments that maintain context across many tool calls and reasoning steps.

Supporting this requires infrastructure for:

Session routing
Per user state management
Lifecycle management
Dynamic scaling of execution environments

Constructing this infrastructure on top of general purpose compute platforms can become a significant engineering effort before any agent logic is written.

Problem 2: Security Isolation at Scale

Enterprise agents frequently process sensitive user data.

When thousands of users run concurrent sessions, strong isolation between sessions becomes critical. Without proper safeguards, a defect could potentially expose:

One user’s data to another user
Information across tenants
Privileged credentials or tokens

Achieving secure isolation at scale requires carefully designed execution environments, container isolation, and strict identity boundaries, rather than relying solely on application level safeguards.

Problem 3:Identity, OAuth, and Credential Management

Agents rarely operate in isolation.

They interact with external services on behalf of users, which introduces the need to manage authentication and authorization flows such as:

OAuth consent processes
Secure token storage
Automatic token refresh
Fine grained permission enforcement
Audit trails for every access

When agents integrate with multiple SaaS platforms across thousands of users, credential management becomes a full platform capability, not just a small feature.

Problem 4: Memory Infrastructure

Agents depend heavily on memory systems to function effectively.

Short Term Memory

Maintaining conversation context across long interactions often requires summarization pipelines that compress earlier dialogue while preserving meaning.

Long Term Memory

Persistent knowledge typically involves:
Information extraction pipelines
Vector storage
Semantic retrieval
Mechanisms to reconcile new information with existing knowledge

Each of these components introduces potential failure modes that can gradually degrade agent behaviour if not carefully managed, particularly in multi-tenant environments.

Problem 5: Observability for Agent Reasoning

Traditional monitoring tools measure metrics such as:

Latency
Error rates
Throughput
But production AI agents require deeper visibility.
Engineers often need to understand:
Which reasoning step produced an incorrect output
Which tool call returned unexpected data
Why the agent chose a particular decision path

Achieving this level of visibility requires trace level instrumentation, structured logs, and AI aware observability dashboards.

Problem 6: Policy Enforcement Outside the Agent

Early agent systems often embed governance rules directly inside prompts.

This approach is fragile.

A carefully crafted user input can sometimes influence the agent to ignore or reinterpret its own instructions.

Production systems therefore require external policy enforcement layers that evaluate permissions and constraints independently of the agent’s reasoning process.

This ensures governance cannot be bypassed.

Problem 7: Multi Agent Coordination

Real enterprise workflows rarely rely on a single agent.

Instead, they often involve multiple specialized agents working together. For example:

A research agent to gather information
A writing agent to generate responses
A verification agent to validate outputs
An approval agent to enforce governance
Supporting these workflows requires infrastructure for:
Inter agent communication
Shared state management
Workflow orchestration
Failure handling and retries

This coordination layer introduces yet another architectural component to an already complex system.

Introducing Amazon Bedrock AgentCore

Amazon Bedrock AgentCore is an agentic platform from AWS designed to build, deploy, and operate AI agents securely at scale. It provides a set of modular, enterprise grade services that handle the infrastructure required to run production grade AI agents without developers having to manage the underlying systems.

In real world deployments, building an agent is only a small part of the challenge. Production systems must manage runtime execution, memory, tool connectivity, identity, security, and observability before agents can reliably interact with enterprise data and services. These infrastructure concerns often become the primary barrier to moving from prototype agents to production systems.

Amazon Bedrock AgentCore addresses this challenge by providing fully managed services that remove the undifferentiated heavy lifting of building agent infrastructure. Developers can focus on implementing the agent’s reasoning and workflows while AgentCore manages the operational backbone required to run agents reliably in enterprise environments.

AgentCore services are modular and composable, meaning they can be used together or independently depending on the architecture of the system. The platform is also framework agnostic and model agnostic, supporting popular open source agent frameworks such as LangGraph, CrewAI, LlamaIndex, and Strands Agents, and it can work with foundation models from Amazon Bedrock or external providers.

At a high level, AgentCore provides capabilities such as:

AgentCore Runtime : A secure serverless environment for running agents and tools

AgentCore Memory : Managed short term and long term memory for context aware agents

AgentCore Gateway : A service that converts APIs and services into MCP-compatible tools for agents

AgentCore Identity : Identity and access management designed specifically for AI agents

Built in tools and observability : Including code execution, browser automation, monitoring, and evaluation capabilities

Together, these services form a production infrastructure layer for agentic systems, allowing teams to deploy AI agents that are secure, scalable, observable, and capable of interacting with real enterprise systems

AgentCore Runtime

AgentCore Runtime is the secure, serverless execution environment for AI agents. Each user session runs inside a dedicated, hardware isolated microVM, providing strong isolation of CPU, memory, and filesystem resources.

Isolation is enforced at the virtualization layer, ensuring one user’s agent cannot access another user’s data. When a session ends due to 15 minutes of inactivity, user termination, or the 8 hour maximum session limit the microVM is destroyed and memory is fully sanitized, preventing cross session data leakage.

Framework Compatibility

AgentCore Runtime is framework agnostic and works with common agent frameworks such as:

Strands Agents (AWS)
LangChain / LangGraph
LlamaIndex
Microsoft Agent Framework (Autogen + Semantic Kernel)

It can also host any custom agent implementation that runs inside a container.

Minimal Integration

Existing agents can be deployed with a small wrapper:

from bedrock_agentcore.runtime import BedrockAgentCoreApp

app = BedrockAgentCoreApp()

@app.entrypoint
def invoke(payload):
    return your_agent(payload.get("prompt", ""))

Deployment:

agentcore configure
agentcore deploy
Model Support

AgentCore is model agnostic and works with major foundation models including:

Amazon Nova
Anthropic Claude
OpenAI GPT
Google Gemini
Meta Llama
Mistral

Your agent chooses the model; AgentCore only provides the execution environment.

Communication

AgentCore supports two interaction modes:

HTTP API—standard request/response execution

Bi directional WebSocket streaming real-time conversational and multi-turn agents

Using a sessionId keeps requests routed to the same microVM session, preserving state.

Strands Agents

Strands Agents is AWS’s open source agent framework designed around a model first approach. A Strands agent is defined by three elements:

Model
Tools
Prompt

The model drives planning and tool usage. Strands agents deploy to AgentCore Runtime using the same lightweight SDK wrapper

Deployment Options

AgentCore supports two deployment paths.

Direct code upload
AgentCore automatically builds the container and deploys the agent — no Dockerfile required.

Container deployment
Provides full control over runtime dependencies and system configuration.

Both use the same lifecycle:

agentcore configure
agentcore deploy

Deployments are immutable and versioned, allowing multiple versions and canary testing before traffic promotion.

AgentCore Gateway

AgentCore Gateway converts existing APIs, AWS Lambda functions, and OpenAPI specifications into agent ready MCP tools automatically without writing custom adapters.

From API to Agent Tool

Point Gateway to a Lambda function or OpenAPI specification and it automatically:

Generates the MCP tool schema

Handles protocol translation

Exposes the API as a discoverable agent tool

What previously required weeks of custom integration can now be done in minutes.

agentcore gateway create \
--name "crm-tools" \
--lambda-arn "arn:aws:lambda:us-east-1:123:function:crm-api" \
--protocol MCP

Once registered, any MCP compatible agent can discover and invoke the tool.

MCP Native Architecture

Gateway is built around the Model Context Protocol (MCP). Registered tools become automatically usable by MCP compatible frameworks such as:

Strands
LangGraph
CrewAI

Agents can dynamically discover tools at runtime rather than requiring tools to be hardcoded during initialization.

SaaS Integration

Gateway provides built in connectors for common enterprise platforms such as:

GitHub
Salesforce
Slack
Google Workspace
Microsoft 365
Jira / Confluence

These connectors handle authentication, schema generation, and error handling automatically.

Agent-to-Agent Communication (A2A)

Gateway also supports the Agent2Agent (A2A) protocol, which standardizes how agents communicate with each other.

Agents built using different frameworks can delegate tasks across systems while communicating through standardized A2A messages.

AgentCore Identity

AgentCore Identity manages authentication and credential delegation for AI agents accessing external systems.

It controls both:

Who can invoke the agent
How the agent authenticates to external services
Supported authentication mechanisms include:
AWS IAM SigV4 for internal services
OAuth 2.0 and OpenID Connect for external users and applications

Compatible identity providers include Amazon Cognito, Okta, Microsoft Entra ID, and Auth0.

Machine-to-Machine Access (2LO)

For system-level tasks, agents authenticate using OAuth Client Credentials without a user involved.

Common scenarios:

Scheduled workflows
Background analytics
System integrations

User Delegated Access (3LO)

When agents act on behalf of a user, AgentCore manages the full OAuth lifecycle:

User consent flow
Encrypted token storage
Token refresh
Access auditing

All credentials are stored in an encrypted vault protected by customer managed KMS keys.

AgentCore Memory

AgentCore Memory provides built in memory management for agents without requiring developers to build custom vector pipelines.

It supports three types of memory:

Short Term Memory

Maintains session context, including conversation history, tool outputs, and reasoning state.

Long Term Memory

Stores extracted knowledge such as user preferences, decisions, and discovered facts so future sessions begin with relevant context.

Episodic Memory

Stores past experiences what actions were attempted and which strategies succeeded enabling agents to improve behavior over time.

AgentCore Browser

Some enterprise systems can only be accessed through a web interface.

AgentCore Browser provides isolated browser instances that agents can use to interact with websites and web applications.

Agents can:

Navigate multi step workflows
Fill forms
Extract information from dynamic pages
Interact with internal portals

Each session runs in a sandboxed browser environment, which is destroyed when the session ends.

AgentCore Code Interpreter

When agents generate code for analysis or computation, that code must execute safely.

AgentCore Code Interpreter provides an isolated execution sandbox where generated code can run securely.

Agents can use it to:

Analyze datasets
Run calculations
Generate charts and files
Validate generated code

Each execution occurs in a separate ephemeral sandbox with no access to other sessions or infrastructure.

Conclusion

The Platform for the Production Agent Era

Having architected agentic systems across Azure Web Apps, Azure Container Apps, and custom infrastructure, I know how much engineering effort goes into the layers that production agents require.

Session routing, credential management, memory pipelines, observability, governance policies, and multi tenant isolation are all necessary pieces of a reliable agent system. None of them are impossible to build but they consume time that should be spent improving the reasoning, behavior, and usefulness of the agent itself.

This is the problem Amazon Bedrock AgentCore is designed to solve.

AgentCore provides 7 purpose built services that handle the production infrastructure required for agent systems:

Runtime: Secure microVM execution for agents

Gateway: MCP-native tool integration and API exposure

Identity: OAuth credential lifecycle and delegated access

Memory: Short term and Long term persistent memory for agents

Browser: managed browser automation for web interactions

Code Interpreter: Isolated sandbox for executing generated code

Observability: CloudWatch native tracing with OpenTelemetry support

AgentCore is framework agnostic and works with common agent frameworks such as Strands, LangChain, LangGraph, LlamaIndex, CrewAI, and AutoGen, as well as custom implementations.

It is also model agnostic, allowing agents to use foundation models including Amazon Nova, Anthropic Claude, OpenAI GPT models, Google Gemini, Meta Llama, and Mistral, or any model accessible through an API.

The question is no longer whether a production AI agent can be built.
With AgentCore, the real question becomes what agent you want to build and how quickly you can deliver it to the people who need it.

Getting Started

pip install bedrock-agentcore bedrock-agentcore-starter-toolkit

Thanks
Sreeni Ramadorai

DEV Community