Damien Gallagher for AWS Community Builders

Posted on Jan 2

Building AI Agents on AWS in 2025: A Practitioner's Guide to Bedrock, AgentCore, and Beyond

#aws #genai #bedrock #agents

I've spent a good chunk of 2025 building with AWS's Gen AI services - from simple Bedrock invocations to multi-agent systems with AgentCore. This article is what I wish I'd had when I started: a practitioner's guide to the AWS Gen AI ecosystem, focused on actually building things rather than marketing bullet points.

Fair warning - this is aimed at developers who are past the introductory phase. I'm assuming you understand what LLMs are, have probably used the Bedrock console, and want to know how these pieces fit together for production systems.

1. The Shift: From "Using AI" to "Orchestrating Agents"

Before we get into specific services, it's worth understanding where AWS is heading conceptually.

The 2024 approach to Gen AI was largely: call an LLM, get a response, display it to the user. Maybe add some RAG for context. The 2025 approach is fundamentally different: build autonomous agents that plan, execute, learn, and operate independently.

This isn't just philosophical. It changes how you architect systems. Instead of "user request → LLM → response," you're now designing "user goal → agent network → coordinated actions → outcome."

AgentCore, Nova Act, and even Q Developer's agentic features all reflect this shift. Whether this model survives contact with production workloads at scale... well, I'm still forming opinions on that.

2. The AWS Gen AI Landscape

Here's how I think about the ecosystem:

Foundation Layer:

Amazon Bedrock - Multi-model access and orchestration
Amazon SageMaker AI - Custom training and deployment

Agent Infrastructure:

Amazon Bedrock AgentCore - The full stack for building, deploying, and operating agents
Nova Act - Specialised browser automation agents

Models:

Amazon Nova 2 family - AWS's own frontier models
Third-party models - Claude, Llama, Mistral, and 100+ others via Bedrock

Development Tools:

Amazon Q Developer - AI-assisted coding in your IDE
Kiro - Full agentic IDE with spec-driven development
PartyRock - No-code Bedrock playground

Supporting Services:

S3 Vectors - Native vector storage for RAG
CloudWatch - Agent observability

Let's go through each in detail.

3. Amazon Bedrock: The Multi-Model Foundation

Bedrock is the central hub for accessing foundation models on AWS. If you've been away for a while, here's what changed in 2025:

Model Expansion

Bedrock now offers nearly 100 serverless foundation models, with over 100 additional models available through Bedrock Marketplace. The December 2025 expansion added 18 open-weight models including:

Gemma 3 (Google)
MiniMax M2 (MiniMax AI)
Mistral Large 3 and Ministral series
Kimi K2 (Moonshot AI)
Nemotron Nano 2 (NVIDIA)

The Claude 4.5 models from Anthropic arrived in November 2025, including the most capable Claude yet.

Reinforcement Fine-Tuning

This is the capability I've found most useful for production. Instead of traditional fine-tuning with labelled datasets, you provide feedback signals and let the model learn through reinforcement. AWS claims 66% accuracy gains over base models without deep ML expertise.

The practical upside: you can customise model behaviour using your existing evaluation criteria rather than creating massive training datasets.

Cross-Region Inference

Bedrock now supports intelligent routing across regions for high-availability scenarios. If your primary region is under load, requests automatically route to secondary regions. You configure this in the model access settings.

4. Amazon Bedrock AgentCore: The Deep Dive

AgentCore is where I've spent most of my time this year. It went from preview (July) to GA (October) to significantly expanded (December). Here's what each component does and when you'd use it.

4.1 AgentCore Runtime

The Runtime provides the execution environment for agents. Key capabilities:

Session Isolation

Each agent session runs in complete isolation with low latency. This matters when you're running agents that handle sensitive data or need guaranteed resource allocation.

# Sessions are isolated automatically
# Each invocation gets its own execution context
from bedrock_agentcore import AgentRuntime

runtime = AgentRuntime()
session = runtime.create_session(
    agent_id="my-agent",
    session_config={
        "isolation_level": "full",
        "timeout_seconds": 28800  # 8 hours max
    }
)

Long-Running Workloads

Sessions can run for up to 8 hours. This is crucial for agents that need to wait for external events, poll systems, or handle multi-step workflows that span hours rather than seconds.

Bidirectional Streaming

Added in December 2025, this enables natural voice interactions where agents can listen and respond simultaneously, handling interruptions mid-conversation. If you're building voice agents, this is a significant improvement over the previous request-response model.

4.2 AgentCore Memory

Memory gives agents the ability to remember context across interactions.

Episodic Memory

The December update added episodic memory - agents that learn from experiences and build knowledge over time. Instead of treating each session as independent, the agent develops understanding of patterns and preferences.

from bedrock_agentcore import AgentMemory

memory = AgentMemory(
    memory_type="episodic",
    retention_policy={
        "max_episodes": 1000,
        "decay_factor": 0.95
    }
)

# Agent learns from each interaction
memory.record_episode(
    context=session_context,
    action_taken=agent_action,
    outcome=result,
    feedback=user_feedback
)

This is still early days for me - I need more production time to understand how episodic memory behaves at scale. The promise is agents that get better over time. The risk is agents that develop unexpected behaviours.

4.3 AgentCore Gateway

Gateway handles tool integration. The killer feature: it converts existing APIs into Model Context Protocol (MCP) compatible tools with minimal code.

MCP Integration

MCP is becoming the standard for how LLMs interact with external tools. If you have existing REST APIs, Gateway can expose them as MCP tools that any agent can discover and use.

from bedrock_agentcore import Gateway

gateway = Gateway()

# Convert existing API to MCP tool
gateway.register_api(
    name="customer_lookup",
    endpoint="https://api.mycompany.com/customers",
    schema=openapi_spec,
    authentication={
        "type": "oauth2",
        "credentials_vault": "my-vault"
    }
)

Tool Discovery

Agents can query the Gateway to discover available tools dynamically. This matters for multi-agent systems where you don't want to hardcode tool availability.

4.4 AgentCore Identity

Identity handles authentication and authorisation for agent actions.

OAuth Integration

Agents can authenticate with external services on behalf of users. The Identity service manages refresh tokens securely - you don't handle credentials directly.

Secure Vault Storage

Credentials are stored in vaults with proper encryption and access controls. The December update added native integration with additional OAuth-enabled services.

from bedrock_agentcore import Identity

identity = Identity()

# Agent acts on behalf of user
user_context = identity.establish_user_context(
    user_id="user-123",
    oauth_provider="google",
    scopes=["calendar.read", "calendar.write"]
)

# Agent can now access user's calendar
calendar_response = agent.invoke_tool(
    "google_calendar",
    action="list_events",
    user_context=user_context
)

4.5 AgentCore Observability

Observability plugs into CloudWatch for comprehensive monitoring.

What You Get:

End-to-end agent execution traces
Latency metrics per component
Token usage tracking
Error rates and patterns
Custom dashboards

The integration works with open-source frameworks too - LangChain, LangGraph, and CrewAI are all supported.

4.6 Policy and Evaluations

Added in December 2025, these are the guardrails for production deployment.

Policy (Preview)

Policy intercepts every tool call in real-time. You define boundaries in natural language, and they're converted to Cedar - AWS's open-source policy language.

# Natural language policy
"Agent can only process refunds under $500 without human approval"

# Converted to Cedar automatically
permit(
    principal,
    action == Action::"process_refund",
    resource
) when {
    resource.amount < 500
};

This is powerful for compliance and risk management. The agent can't accidentally authorise actions outside its permitted scope.

Evaluations (Preview)

13 built-in evaluators for quality dimensions:

Helpfulness
Tool selection accuracy
Response accuracy
Task completion rate
And more...

You can also create custom evaluators using your own scoring criteria.

5. Amazon Nova 2 Models: When to Use Each

The Nova 2 family includes four models. Here's my thinking on when to use each:

Model	Best For	Key Capability
Nova 2 Lite	High-volume, cost-sensitive tasks	Fast reasoning, adjustable thinking depth
Nova 2 Pro	Complex multi-step analysis	Highest capability, in preview
Nova 2 Sonic	Voice interactions	Speech-to-speech, 7 languages
Nova 2 Omni	Multimodal workflows	Text + image generation, video understanding

Adjustable Thinking Depth

All Nova 2 models let you control how much reasoning happens before responding. Three intensity levels: low, medium, high. This is genuinely useful - you can dial down for simple queries (faster, cheaper) and dial up when you need careful analysis.

response = bedrock.invoke_model(
    model_id="amazon.nova-2-lite",
    body={
        "prompt": "Analyse this contract for risks",
        "thinking_intensity": "high"  # More reasoning
    }
)

Built-in Tools

Nova 2 models include:

Code interpreter (executes Python)
Web grounding (searches for current information)
MCP tool support

1 Million Token Context

All models support up to 1 million tokens of context. That's roughly 750,000 words - enough to process entire codebases or lengthy document sets.

6. Nova Forge: Build Your Own Frontier Model

Nova Forge is AWS's answer to "I want a model that knows my business."

How It Works:

Start from Nova model checkpoints (pre-trained, mid-trained, or post-trained)
Blend your proprietary data with Nova's curated training data
Use reinforcement learning with your own reward functions
Get a custom model hosted securely on AWS

The result is what AWS calls a "Novella" - a customised variant that combines Nova's general capabilities with your specific domain knowledge.

When Would You Use This?

You have substantial proprietary data that would improve model performance
You need capabilities that fine-tuning can't achieve
You're willing to invest in a longer-term customisation effort

I haven't used Forge myself yet - it requires a deeper commitment than I've had time for. But the approach is interesting, especially the multi-turn RL training for complex agent workflows.

7. Nova Act: Browser Automation Agents

Nova Act is a specialised service for building agents that automate browser-based tasks.

Key Stats:

90% reliability on customer workflows
Powered by a custom Nova 2 Lite model
Outperforms competing models on browser automation benchmarks

Use Cases:

Web scraping at scale
Automated testing
Form filling and data entry
Competitive monitoring

If you've been building browser automation with Playwright or Puppeteer, Nova Act handles the intelligence layer - deciding what to click, how to navigate, and how to recover from unexpected states.

8. Amazon Q Developer: Agentic Coding

Q Developer evolved significantly in 2025. It's no longer just autocomplete - it's an agent that can plan, execute, and iterate.

Agentic Capabilities

Q Developer can:

Read and write files autonomously
Generate code diffs
Run shell commands
Incorporate feedback in real-time
Plan multi-step implementations

It achieved the highest scores on SWE-Bench Leaderboard, which measures ability to resolve real GitHub issues.

2025 Updates:

Language expansion: C#, C++, plus Dart, Go, Kotlin, PHP, Ruby, Rust, Scala, Bash, PowerShell, CloudFormation, Terraform
GitLab Duo integration (GA)
GitHub integration (preview, no AWS account needed)
MCP support in CLI
Persistent conversation history
Frankfurt region for EU data residency

The Free Tier

50 agentic chat interactions per month, plus up to 1,000 lines of code transformation. For getting started, this is generous.

9. Kiro IDE: Spec-Driven Development

Kiro is AWS's full agentic IDE, launched in July and GA since November.

The Core Concept

Write requirements as specs (using EARS - Easy Approach to Requirements Syntax), and the agent scaffolds everything:

Design artefacts
Task breakdown
Code implementation
Tests

Internal Adoption

Amazon made Kiro their standard AI development environment. One reported case: a 30-developer, 18-month project became 6 developers over 76 days. Take that with appropriate scepticism, but the direction is clear.

Getting Started

You can sign in with GitHub, Google, AWS Builder ID, or IAM Identity Center. No AWS account required.

I've been using Kiro for a side project and find the spec-driven approach interesting. The autonomous agent (in preview) maintains context across sessions and learns from feedback. It's not perfect, but it's changing how I think about development workflows.

10. Supporting Services

A few other services worth knowing:

Amazon SageMaker AI

Serverless MLflow for zero-infrastructure experimentation
HyperPod with checkpointless training (automatic recovery from failures)
Up to 95% training cluster efficiency

PartyRock

The no-code Bedrock playground. Free daily usage, no credit card required. Good for quick prototyping before you write real code.

S3 Vectors

Native vector storage in S3:

2 billion vectors per index
20 trillion vectors per bucket
100ms query latency
Up to 90% cost reduction vs. specialised vector databases

For RAG applications, S3 Vectors removes the need for a separate vector database. The cost savings alone make it worth investigating.

11. Production Patterns

Some observations from building with these services:

Start with Bedrock, Add AgentCore When Needed

Don't reach for AgentCore immediately. Simple Bedrock invocations handle most use cases. AgentCore makes sense when you need:

Multi-step workflows with tool usage
Session isolation for concurrent users
Episodic memory across interactions
Production-grade observability

Policy Before Production

If you're deploying agents that take real actions, set up Policy guardrails early. It's much easier to define boundaries upfront than to add them after an incident.

Monitor Token Usage

Agentic workflows consume more tokens than single-shot invocations. The agent's internal reasoning, tool calls, and iterative refinement all add up. Build cost monitoring into your architecture from the start.

MCP is the Standard

Model Context Protocol is becoming ubiquitous. When building new APIs or integrations, consider MCP compatibility from the beginning. It'll make your tools accessible to a broader range of agent frameworks.

12. Where Does This Leave Us?

The AWS Gen AI ecosystem in 2025 is comprehensive - arguably too comprehensive. There are multiple overlapping ways to achieve similar goals, and the "right" approach depends heavily on your specific requirements.

My current mental model:

Simple interactions: Bedrock direct invocation
Complex workflows: AgentCore
Browser automation: Nova Act
Development: Q Developer for inline assistance, Kiro for spec-driven projects
Custom models: Forge if you have the data and commitment
RAG: S3 Vectors + Bedrock Knowledge Bases

Is agentic AI the future of software development? Probably, in some form. Are these specific services the lasting implementations? That's less certain. AWS has deprecated services before, and the AI landscape moves fast.

What I can say is that building with these tools today is genuinely productive. The developer experience has improved dramatically over the past year. Whether you're building customer-facing agents, internal automation, or AI-assisted development tools, AWS has the pieces you need.

The question is how you put them together. And honestly, that's the fun part.

What are you building with these services? I'd love to hear about your use cases and any patterns you've discovered.