Introduction
Over the past couple of years, I have architected and delivered a significant number of agentic AI applications across enterprise environments. Many of these deployments ran on Azure infrastructure using Azure Web Apps for lightweight agent endpoints and Azure Container Apps for more sophisticated multi agent systems that required orchestration, scaling, and reliable session routing.
In building these systems, I have repeatedly implemented the underlying foundations myself credential vaults, memory pipelines, observability layers, and isolation mechanisms. After doing this enough times, you develop a clear understanding of both how long these pieces take to build and where the real production challenges tend to surface.
When I first evaluated Amazon Bedrock AgentCore, it was the first platform I encountered that appeared to address many of these challenges holistically. Not just through surface level abstractions, but with production grade depth designed for real world deployments.
That practical experience is the perspective I bring to this blog.
Before we talk about AWS Bedrock AgentCore, we need to answer a more fundamental question what exactly is an AI agent?, and why is it so different from a regular chatbot or API call?
What is an AI Agent?
“An AI agent is a software system that uses a large language model not just to generate text, but to reason, plan, take actions, and work toward a goal often across multiple steps, over time, with minimal human involvement.”
Most people encounter AI through a prompt response loop type something in, get something back. That model is useful, but it is fundamentally passive. The language model sits in a box, waits to be asked, generates text, and stops.
An AI agent is something entirely different. Think of a brilliant expert locked in a room with no tools. They can give extraordinary advice but they cannot act on it. Give that same expert a phone, a laptop, access to databases, the ability to send emails, run code, and call APIs. They no longer just advise. They act, verify, execute, and report back. That is the agentic paradigm.
Formally, an AI agent is a software system that uses a large language model not just to generate text, but to reason, plan, take actions, and work toward a goal — across multiple steps, over time, with minimal human direction.
“An AI agent doesn’t just answer your question. It takes on your objective, plans a path to achieve it, executes that plan, monitors its own progress, and self corrects when things go wrong without you directing each step.”
A Concrete Example
Ask an agent: “Find our top three open support tickets today, check each against the known issues database, draft replies, and email them to the support team.”
A plain language model cannot do this it has no access to your ticketing system, knowledge base, or email infrastructure. An AI agent handles the entire workflow end to end.
Step 1: Query the ticketing tool for today’s open critical tickets
Step 2: Search the knowledge base for related known issues
Step 3: Reason about which tickets match which issues
Step 4: Draft personalized reply emails using the LLM
Step 5: Send those emails via the email API (This may Tool /MCP server)
The LLM is the reasoning engine. The tools are how the agent reaches into real systems. And it does not stop after one response it pursues the objective through every step until the goal is met.
Agents Are Goal Driven
The most critical characteristic of an AI agent and the one most often glossed over is that it is goal driven, not prompt driven.
Prompt driven systems (plain LLMs) receive an input and produce an output. The interaction is complete. No awareness of a broader objective, no adaptation if the first attempt fails.
Goal driven systems (agents) receive an objective and autonomously determine the steps, tool calls, and decisions required to achieve it. They persist, adapt, retry, and self correct until the goal is met or explicitly report that it cannot be.
The Agentic Loop:
Observe, Think, Act, Repeat
The mechanics of goal driven behaviour are captured in what is called the agentic loop the cognitive cycle every agent runs until its objective is achieved. Strands Agents, AWS’s own open source framework, describes this as its core architecture. in each loop iteration the model is invoked with the prompt, agent context, and available tools, and it decides whether to respond in natural language, plan next steps, reflect on prior results, or select one or more tools to use. This loop continues until the task is complete.
1. Observe
The agent reads its current goal and decomposed sub goals. It reviews all results from prior steps. It retrieves relevant short term memory. It incorporates new information from the environment since the last cycle.
2. Think
The LLM reasons over accumulated context and available tools to determine the single best next action. It outputs either a tool call with exact parameters, or if the goal is satisfied a final answer. Planning capable models may first decompose the goal into an explicit sub task sequence.
3. Act
If a tool call was selected, the framework executes the real function calling an API, querying a database, running code, navigating a browser, or invoking any registered tool. If a Human In The Loop(HITL) checkpoint is configured, the agent pauses and waits for approval. The result is captured and fed back into context.
4. Loop
T*he action result becomes new input to the next Observe phase*. Is the goal achieved? If yes, the agent produces its final answer and terminates. If no, the loop continues potentially dozens or hundreds of iterations for complex, long running tasks.
ReAct: The Reasoning Pattern Behind the Loop
ReAct (Reason + Act) is the foundational prompting pattern that implements the agentic loop. The agent produces a structured Thought describing its reasoning, followed by an Action (the chosen tool call), followed by an Observation incorporating the result. This interleaved reasoning action format makes goal pursuit explicit, traceable, and debuggable. It is the pattern that AWS Strands Agents uses internally to power its model driven approach.
Goal: Find top 3 critical GitHub bugs and draft fixes.
Thought: I need to query GitHub for open issues labelled critical.
Action: github_search(repo=‘company/api’, label=‘critical’, state=‘open’, limit=3)
Observation: Returned issues #16, #5018, #18.
Thought: I have the issues. Now I need each affected code file.
Action: github_get_file(repo=‘company/api’, path=‘src/auth.py’)
Observation: File retrieved.
Thought: Enough context. Drafting fix for issue #16.
… [loop continues for all 3 bugs] …
Final Answer: Three bugs with proposed patches attached.
Self Correction and Adaptive Planning
Goal driven agents are not fragile pipelines. When an action fails a tool returns an error, a query finds nothing, a generated solution does not compile the agent observes the failure, reasons about the cause, and adapts. A failed search triggers a reformulated query. An unavailable tool triggers a fallback. A complex goal gets decomposed into smaller achievable sub goals. When the agent determines a goal truly cannot be achieved, it surfaces a clear explanation rather than silently returning empty results.
HITL Human In The Loop
Goal driven does not mean unsupervised. Production agents are designed with explicit human in the loop checkpoints moments where the agent pauses, presents its proposed action, and waits for approval before taking any irreversible step, sending emails, deleting records, initiating payments, deploying code. AgentCore Runtime’s bi-directional WebSocket streaming makes these pause-and-resume flows practical within long running sessions, enabling real-time human collaboration without terminating and restarting the session.
The 4 Pillars of Every Production AI Agent
Pillar 1
Tools: How Agents Act on the external or real World
Without tools, a goal driven agent has nowhere to go. Tools allow agents to reach beyond language generation into real business systems.
Read tools retrieve information: database queries, document reads, semantic search against knowledge bases, API calls to Salesforce, GitHub, Jira, Slack, and any other SaaS tool.
Write tools create or modify data: email senders, database writers, file generators, CRM updaters, ticket creators, calendar schedulers.
Execution tools run processes: code interpreters, browser automation for web based applications that have no API, and shell command runners.
The production challenge: A prototype might hard code three tools. An enterprise deployment often needs fifty tools across ten SaaS platforms, each with its own authentication scheme, error patterns, and schema. Tool management becomes a major engineering project on its own.
Pillar 2
Memory: How Agents Remember
Language models (LLM) are stateless. Every API call starts blank. For an agent serving the same user across weeks of ongoing work, statelessness is a fundamental blocker.
Short term memory covers the active session: conversation history, task state, intermediate tool results, and reasoning steps. It requires intelligent summarization to manage the LLM’s context window limits without losing critical thread.
Long term memory persists across sessions. User preferences, past project outcomes, accumulated domain knowledge, and learned patterns must survive session end and be retrievable in future sessions. This requires extraction logic, persistent storage, and semantic retrieval.
Episodic memory is the most powerful form: storing specific past experiences what the agent tried, what worked, what failed, what the outcome was so it can recall and apply successful strategies in future similar situations. This is the mechanism by which agents genuinely improve over time.
Pillar 3
Observability: How Agents Are Understood and Governed
When an AI agent produces a wrong output after twelve reasoning steps and seven tool calls, traditional logs tell you almost nothing useful. You cannot search for ‘sessions where the agent called the wrong tool’ in standard APM tools.
“You cannot safely govern what you cannot observe. For AI agents in enterprise production, observability is not optional it is the difference between a system you can audit and a black box waiting to cause a compliance incident.”
Agent native observability must capture the full reasoning chain in step by step order, every tool invocation with exact inputs and outputs, every LLM prompt and response with token counts, decision points where the agent chose between alternatives, failure attribution pinpointing which specific step caused a wrong downstream output, and token consumption per step for cost control. Without this, AI assisted decisions in regulated environments cannot be explained, investigated, or defended.
Pillar 4
MCP Server bridge the agent and external Data sources and MCP solves the M×N Integration Problem
MCP: The Universal Connectivity Standard (USB-C)
For years, every team connecting agents to external services built bespoke adapters custom code per tool, per framework, per model. This created the classic M×N integration problem if there are M agent frameworks and N external services, teams end up building M × N separate integrations.
A LangChain Salesforce connector did not work with a Strands agent. Every framework switch meant rewriting all integrations. As the number of models, frameworks, and enterprise systems grew, the integration burden multiplied.
MCP the Model Context Protocol is the open standard that ended this fragmentation. Published by Anthropic in 2024 and now adopted across the industry by AWS, Microsoft, Google, and others, MCP defines a universal language for agent-to-tool communication.
Instead of building M × N bespoke connectors, developers can build one MCP server for a data source, and any MCP compatible agent regardless of framework or model can connect to it immediately. In effect, MCP transforms the integration landscape from M × N complexity to reusable connectivity, much like USB-C standardized hardware connectivity across devices.
The MCP architecture is built around three roles:
MCP Host —the agent framework that initiates connections and sends tool requests
MCP Server —the lightweight connector process wrapping an external service
MCP Resources and Tools —the capabilities exposed: actions the agent can invoke, data sources it can read, and prompt templates it can use
By introducing a standard protocol layer, MCP removes the need to repeatedly rebuild integrations and enables true interoperability across agent frameworks, models, and enterprise systems.
The Production Gap Why Building Enterprise AI Agents Is Mostly an Infrastructure Problem?
Across nearly every enterprise agent project, the same pattern **appears. Before the agent logic can even be written, **engineering teams must build a large amount of supporting infrastructure, including:
- Session routing
- Credential vaults
- Memory extraction pipelines
- Observability wiring
- Multi tenant context isolation
- Policy enforcement
In practice, a substantial portion of early development effort goes into these foundations before the agent’s intelligence is implemented.
Let’s walk through the key engineering challenges that create this gap.
Problem 1: Infrastructure for Long Running Stateful Sessions
Traditional serverless platforms are designed for short lived, stateless workloads.
Agents behave very differently.
They often require long running, stateful execution environments that maintain context across many tool calls and reasoning steps.
Supporting this requires infrastructure for:
- Session routing
- Per user state management
- Lifecycle management
- Dynamic scaling of execution environments
Constructing this infrastructure on top of general purpose compute platforms can become a significant engineering effort before any agent logic is written.
Problem 2: Security Isolation at Scale
Enterprise agents frequently process sensitive user data.
When thousands of users run concurrent sessions, strong isolation between sessions becomes critical. Without proper safeguards, a defect could potentially expose:
- One user’s data to another user
- Information across tenants
- Privileged credentials or tokens
Achieving secure isolation at scale requires carefully designed execution environments, container isolation, and strict identity boundaries, rather than relying solely on application level safeguards.
Problem 3:Identity, OAuth, and Credential Management
Agents rarely operate in isolation.
They interact with external services on behalf of users, which introduces the need to manage authentication and authorization flows such as:
- OAuth consent processes
- Secure token storage
- Automatic token refresh
- Fine grained permission enforcement
- Audit trails for every access
When agents integrate with multiple SaaS platforms across thousands of users, credential management becomes a full platform capability, not just a small feature.
Problem 4: Memory Infrastructure
Agents depend heavily on memory systems to function effectively.
Short Term Memory
Maintaining conversation context across long interactions often requires summarization pipelines that compress earlier dialogue while preserving meaning.
Long Term Memory
- Persistent knowledge typically involves:
- Information extraction pipelines
- Vector storage
- Semantic retrieval
- Mechanisms to reconcile new information with existing knowledge
Each of these components introduces potential failure modes that can gradually degrade agent behaviour if not carefully managed, particularly in multi-tenant environments.
Problem 5: Observability for Agent Reasoning
Traditional monitoring tools measure metrics such as:
- Latency
- Error rates
- Throughput
- But production AI agents require deeper visibility.
- Engineers often need to understand:
- Which reasoning step produced an incorrect output
- Which tool call returned unexpected data
- Why the agent chose a particular decision path
Achieving this level of visibility requires trace level instrumentation, structured logs, and AI aware observability dashboards.
Problem 6: Policy Enforcement Outside the Agent
Early agent systems often embed governance rules directly inside prompts.
This approach is fragile.
A carefully crafted user input can sometimes influence the agent to ignore or reinterpret its own instructions.
Production systems therefore require external policy enforcement layers that evaluate permissions and constraints independently of the agent’s reasoning process.
This ensures governance cannot be bypassed.
Problem 7: Multi Agent Coordination
Real enterprise workflows rarely rely on a single agent.
Instead, they often involve multiple specialized agents working together. For example:
- A research agent to gather information
- A writing agent to generate responses
- A verification agent to validate outputs
- An approval agent to enforce governance
- Supporting these workflows requires infrastructure for:
- Inter agent communication
- Shared state management
- Workflow orchestration
- Failure handling and retries
This coordination layer introduces yet another architectural component to an already complex system.
Introducing Amazon Bedrock AgentCore
Amazon Bedrock AgentCore is an agentic platform from AWS designed to build, deploy, and operate AI agents securely at scale. It provides a set of modular, enterprise grade services that handle the infrastructure required to run production grade AI agents without developers having to manage the underlying systems.
In real world deployments, building an agent is only a small part of the challenge. Production systems must manage runtime execution, memory, tool connectivity, identity, security, and observability before agents can reliably interact with enterprise data and services. These infrastructure concerns often become the primary barrier to moving from prototype agents to production systems.
Amazon Bedrock AgentCore addresses this challenge by providing fully managed services that remove the undifferentiated heavy lifting of building agent infrastructure. Developers can focus on implementing the agent’s reasoning and workflows while AgentCore manages the operational backbone required to run agents reliably in enterprise environments.
AgentCore services are modular and composable, meaning they can be used together or independently depending on the architecture of the system. The platform is also framework agnostic and model agnostic, supporting popular open source agent frameworks such as LangGraph, CrewAI, LlamaIndex, and Strands Agents, and it can work with foundation models from Amazon Bedrock or external providers.
At a high level, AgentCore provides capabilities such as:
AgentCore Runtime : A secure serverless environment for running agents and tools
AgentCore Memory : Managed short term and long term memory for context aware agents
AgentCore Gateway : A service that converts APIs and services into MCP-compatible tools for agents
AgentCore Identity : Identity and access management designed specifically for AI agents
Built in tools and observability : Including code execution, browser automation, monitoring, and evaluation capabilities
Together, these services form a production infrastructure layer for agentic systems, allowing teams to deploy AI agents that are secure, scalable, observable, and capable of interacting with real enterprise systems
AgentCore Runtime
AgentCore Runtime is the secure, serverless execution environment for AI agents. Each user session runs inside a dedicated, hardware isolated microVM, providing strong isolation of CPU, memory, and filesystem resources.
Isolation is enforced at the virtualization layer, ensuring one user’s agent cannot access another user’s data. When a session ends due to 15 minutes of inactivity, user termination, or the 8 hour maximum session limit the microVM is destroyed and memory is fully sanitized, preventing cross session data leakage.
Framework Compatibility
AgentCore Runtime is framework agnostic and works with common agent frameworks such as:
- Strands Agents (AWS)
- LangChain / LangGraph
- LlamaIndex
- Microsoft Agent Framework (Autogen + Semantic Kernel)
It can also host any custom agent implementation that runs inside a container.
Minimal Integration
Existing agents can be deployed with a small wrapper:
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload):
return your_agent(payload.get("prompt", ""))
Deployment:
agentcore configure
agentcore deploy
Model Support
AgentCore is model agnostic and works with major foundation models including:
- Amazon Nova
- Anthropic Claude
- OpenAI GPT
- Google Gemini
- Meta Llama
- Mistral
Your agent chooses the model; AgentCore only provides the execution environment.
Communication
AgentCore supports two interaction modes:
HTTP API—standard request/response execution
Bi directional WebSocket streaming real-time conversational and multi-turn agents
Using a sessionId keeps requests routed to the same microVM session, preserving state.
Strands Agents
Strands Agents is AWS’s open source agent framework designed around a model first approach. A Strands agent is defined by three elements:
- Model
- Tools
- Prompt
The model drives planning and tool usage. Strands agents deploy to AgentCore Runtime using the same lightweight SDK wrapper
Deployment Options
AgentCore supports two deployment paths.
Direct code upload
AgentCore automatically builds the container and deploys the agent — no Dockerfile required.
Container deployment
Provides full control over runtime dependencies and system configuration.
Both use the same lifecycle:
- agentcore configure
- agentcore deploy
Deployments are immutable and versioned, allowing multiple versions and canary testing before traffic promotion.
AgentCore Gateway
AgentCore Gateway converts existing APIs, AWS Lambda functions, and OpenAPI specifications into agent ready MCP tools automatically without writing custom adapters.
From API to Agent Tool
Point Gateway to a Lambda function or OpenAPI specification and it automatically:
Generates the MCP tool schema
Handles protocol translation
Exposes the API as a discoverable agent tool
What previously required weeks of custom integration can now be done in minutes.
agentcore gateway create \
--name "crm-tools" \
--lambda-arn "arn:aws:lambda:us-east-1:123:function:crm-api" \
--protocol MCP
Once registered, any MCP compatible agent can discover and invoke the tool.
MCP Native Architecture
Gateway is built around the Model Context Protocol (MCP). Registered tools become automatically usable by MCP compatible frameworks such as:
- Strands
- LangGraph
- CrewAI
Agents can dynamically discover tools at runtime rather than requiring tools to be hardcoded during initialization.
SaaS Integration
Gateway provides built in connectors for common enterprise platforms such as:
- GitHub
- Salesforce
- Slack
- Google Workspace
- Microsoft 365
- Jira / Confluence
These connectors handle authentication, schema generation, and error handling automatically.
Agent-to-Agent Communication (A2A)
Gateway also supports the Agent2Agent (A2A) protocol, which standardizes how agents communicate with each other.
Agents built using different frameworks can delegate tasks across systems while communicating through standardized A2A messages.
AgentCore Identity
AgentCore Identity manages authentication and credential delegation for AI agents accessing external systems.
It controls both:
- Who can invoke the agent
- How the agent authenticates to external services
- Supported authentication mechanisms include:
- AWS IAM SigV4 for internal services
- OAuth 2.0 and OpenID Connect for external users and applications
Compatible identity providers include Amazon Cognito, Okta, Microsoft Entra ID, and Auth0.
Machine-to-Machine Access (2LO)
For system-level tasks, agents authenticate using OAuth Client Credentials without a user involved.
Common scenarios:
- Scheduled workflows
- Background analytics
- System integrations
User Delegated Access (3LO)
When agents act on behalf of a user, AgentCore manages the full OAuth lifecycle:
- User consent flow
- Encrypted token storage
- Token refresh
- Access auditing
All credentials are stored in an encrypted vault protected by customer managed KMS keys.
AgentCore Memory
AgentCore Memory provides built in memory management for agents without requiring developers to build custom vector pipelines.
It supports three types of memory:
Short Term Memory
Maintains session context, including conversation history, tool outputs, and reasoning state.
Long Term Memory
Stores extracted knowledge such as user preferences, decisions, and discovered facts so future sessions begin with relevant context.
Episodic Memory
Stores past experiences what actions were attempted and which strategies succeeded enabling agents to improve behavior over time.
AgentCore Browser
Some enterprise systems can only be accessed through a web interface.
AgentCore Browser provides isolated browser instances that agents can use to interact with websites and web applications.
Agents can:
- Navigate multi step workflows
- Fill forms
- Extract information from dynamic pages
- Interact with internal portals
Each session runs in a sandboxed browser environment, which is destroyed when the session ends.
AgentCore Code Interpreter
When agents generate code for analysis or computation, that code must execute safely.
AgentCore Code Interpreter provides an isolated execution sandbox where generated code can run securely.
Agents can use it to:
- Analyze datasets
- Run calculations
- Generate charts and files
- Validate generated code
Each execution occurs in a separate ephemeral sandbox with no access to other sessions or infrastructure.
Conclusion
The Platform for the Production Agent Era
Having architected agentic systems across Azure Web Apps, Azure Container Apps, and custom infrastructure, I know how much engineering effort goes into the layers that production agents require.
Session routing, credential management, memory pipelines, observability, governance policies, and multi tenant isolation are all necessary pieces of a reliable agent system. None of them are impossible to build but they consume time that should be spent improving the reasoning, behavior, and usefulness of the agent itself.
This is the problem Amazon Bedrock AgentCore is designed to solve.
AgentCore provides 7 purpose built services that handle the production infrastructure required for agent systems:
Runtime: Secure microVM execution for agents
Gateway: MCP-native tool integration and API exposure
Identity: OAuth credential lifecycle and delegated access
Memory: Short term and Long term persistent memory for agents
Browser: managed browser automation for web interactions
Code Interpreter: Isolated sandbox for executing generated code
Observability: CloudWatch native tracing with OpenTelemetry support
AgentCore is framework agnostic and works with common agent frameworks such as Strands, LangChain, LangGraph, LlamaIndex, CrewAI, and AutoGen, as well as custom implementations.
It is also model agnostic, allowing agents to use foundation models including Amazon Nova, Anthropic Claude, OpenAI GPT models, Google Gemini, Meta Llama, and Mistral, or any model accessible through an API.
The question is no longer whether a production AI agent can be built.
With AgentCore, the real question becomes what agent you want to build and how quickly you can deliver it to the people who need it.
Getting Started
pip install bedrock-agentcore bedrock-agentcore-starter-toolkit
Thanks
Sreeni Ramadorai



Top comments (0)