I've spent a good chunk of 2025 building with AWS's Gen AI services - from simple Bedrock invocations to multi-agent systems with AgentCore. This article is what I wish I'd had when I started: a practitioner's guide to the AWS Gen AI ecosystem, focused on actually building things rather than marketing bullet points.
Fair warning - this is aimed at developers who are past the introductory phase. I'm assuming you understand what LLMs are, have probably used the Bedrock console, and want to know how these pieces fit together for production systems.
1. The Shift: From "Using AI" to "Orchestrating Agents"
Before we get into specific services, it's worth understanding where AWS is heading conceptually.
The 2024 approach to Gen AI was largely: call an LLM, get a response, display it to the user. Maybe add some RAG for context. The 2025 approach is fundamentally different: build autonomous agents that plan, execute, learn, and operate independently.
This isn't just philosophical. It changes how you architect systems. Instead of "user request → LLM → response," you're now designing "user goal → agent network → coordinated actions → outcome."
AgentCore, Nova Act, and even Q Developer's agentic features all reflect this shift. Whether this model survives contact with production workloads at scale... well, I'm still forming opinions on that.
2. The AWS Gen AI Landscape
Here's how I think about the ecosystem:
Foundation Layer:
- Amazon Bedrock - Multi-model access and orchestration
- Amazon SageMaker AI - Custom training and deployment
Agent Infrastructure:
- Amazon Bedrock AgentCore - The full stack for building, deploying, and operating agents
- Nova Act - Specialised browser automation agents
Models:
- Amazon Nova 2 family - AWS's own frontier models
- Third-party models - Claude, Llama, Mistral, and 100+ others via Bedrock
Development Tools:
- Amazon Q Developer - AI-assisted coding in your IDE
- Kiro - Full agentic IDE with spec-driven development
- PartyRock - No-code Bedrock playground
Supporting Services:
- S3 Vectors - Native vector storage for RAG
- CloudWatch - Agent observability
Let's go through each in detail.
3. Amazon Bedrock: The Multi-Model Foundation
Bedrock is the central hub for accessing foundation models on AWS. If you've been away for a while, here's what changed in 2025:
Model Expansion
Bedrock now offers nearly 100 serverless foundation models, with over 100 additional models available through Bedrock Marketplace. The December 2025 expansion added 18 open-weight models including:
- Gemma 3 (Google)
- MiniMax M2 (MiniMax AI)
- Mistral Large 3 and Ministral series
- Kimi K2 (Moonshot AI)
- Nemotron Nano 2 (NVIDIA)
The Claude 4.5 models from Anthropic arrived in November 2025, including the most capable Claude yet.
Reinforcement Fine-Tuning
This is the capability I've found most useful for production. Instead of traditional fine-tuning with labelled datasets, you provide feedback signals and let the model learn through reinforcement. AWS claims 66% accuracy gains over base models without deep ML expertise.
The practical upside: you can customise model behaviour using your existing evaluation criteria rather than creating massive training datasets.
Cross-Region Inference
Bedrock now supports intelligent routing across regions for high-availability scenarios. If your primary region is under load, requests automatically route to secondary regions. You configure this in the model access settings.
4. Amazon Bedrock AgentCore: The Deep Dive
AgentCore is where I've spent most of my time this year. It went from preview (July) to GA (October) to significantly expanded (December). Here's what each component does and when you'd use it.
4.1 AgentCore Runtime
The Runtime provides the execution environment for agents. Key capabilities:
Session Isolation
Each agent session runs in complete isolation with low latency. This matters when you're running agents that handle sensitive data or need guaranteed resource allocation.
# Sessions are isolated automatically
# Each invocation gets its own execution context
from bedrock_agentcore import AgentRuntime
runtime = AgentRuntime()
session = runtime.create_session(
agent_id="my-agent",
session_config={
"isolation_level": "full",
"timeout_seconds": 28800 # 8 hours max
}
)
Long-Running Workloads
Sessions can run for up to 8 hours. This is crucial for agents that need to wait for external events, poll systems, or handle multi-step workflows that span hours rather than seconds.
Bidirectional Streaming
Added in December 2025, this enables natural voice interactions where agents can listen and respond simultaneously, handling interruptions mid-conversation. If you're building voice agents, this is a significant improvement over the previous request-response model.
4.2 AgentCore Memory
Memory gives agents the ability to remember context across interactions.
Episodic Memory
The December update added episodic memory - agents that learn from experiences and build knowledge over time. Instead of treating each session as independent, the agent develops understanding of patterns and preferences.
from bedrock_agentcore import AgentMemory
memory = AgentMemory(
memory_type="episodic",
retention_policy={
"max_episodes": 1000,
"decay_factor": 0.95
}
)
# Agent learns from each interaction
memory.record_episode(
context=session_context,
action_taken=agent_action,
outcome=result,
feedback=user_feedback
)
This is still early days for me - I need more production time to understand how episodic memory behaves at scale. The promise is agents that get better over time. The risk is agents that develop unexpected behaviours.
4.3 AgentCore Gateway
Gateway handles tool integration. The killer feature: it converts existing APIs into Model Context Protocol (MCP) compatible tools with minimal code.
MCP Integration
MCP is becoming the standard for how LLMs interact with external tools. If you have existing REST APIs, Gateway can expose them as MCP tools that any agent can discover and use.
from bedrock_agentcore import Gateway
gateway = Gateway()
# Convert existing API to MCP tool
gateway.register_api(
name="customer_lookup",
endpoint="https://api.mycompany.com/customers",
schema=openapi_spec,
authentication={
"type": "oauth2",
"credentials_vault": "my-vault"
}
)
Tool Discovery
Agents can query the Gateway to discover available tools dynamically. This matters for multi-agent systems where you don't want to hardcode tool availability.
4.4 AgentCore Identity
Identity handles authentication and authorisation for agent actions.
OAuth Integration
Agents can authenticate with external services on behalf of users. The Identity service manages refresh tokens securely - you don't handle credentials directly.
Secure Vault Storage
Credentials are stored in vaults with proper encryption and access controls. The December update added native integration with additional OAuth-enabled services.
from bedrock_agentcore import Identity
identity = Identity()
# Agent acts on behalf of user
user_context = identity.establish_user_context(
user_id="user-123",
oauth_provider="google",
scopes=["calendar.read", "calendar.write"]
)
# Agent can now access user's calendar
calendar_response = agent.invoke_tool(
"google_calendar",
action="list_events",
user_context=user_context
)
4.5 AgentCore Observability
Observability plugs into CloudWatch for comprehensive monitoring.
What You Get:
- End-to-end agent execution traces
- Latency metrics per component
- Token usage tracking
- Error rates and patterns
- Custom dashboards
The integration works with open-source frameworks too - LangChain, LangGraph, and CrewAI are all supported.
4.6 Policy and Evaluations
Added in December 2025, these are the guardrails for production deployment.
Policy (Preview)
Policy intercepts every tool call in real-time. You define boundaries in natural language, and they're converted to Cedar - AWS's open-source policy language.
# Natural language policy
"Agent can only process refunds under $500 without human approval"
# Converted to Cedar automatically
permit(
principal,
action == Action::"process_refund",
resource
) when {
resource.amount < 500
};
This is powerful for compliance and risk management. The agent can't accidentally authorise actions outside its permitted scope.
Evaluations (Preview)
13 built-in evaluators for quality dimensions:
- Helpfulness
- Tool selection accuracy
- Response accuracy
- Task completion rate
- And more...
You can also create custom evaluators using your own scoring criteria.
5. Amazon Nova 2 Models: When to Use Each
The Nova 2 family includes four models. Here's my thinking on when to use each:
| Model | Best For | Key Capability |
|---|---|---|
| Nova 2 Lite | High-volume, cost-sensitive tasks | Fast reasoning, adjustable thinking depth |
| Nova 2 Pro | Complex multi-step analysis | Highest capability, in preview |
| Nova 2 Sonic | Voice interactions | Speech-to-speech, 7 languages |
| Nova 2 Omni | Multimodal workflows | Text + image generation, video understanding |
Adjustable Thinking Depth
All Nova 2 models let you control how much reasoning happens before responding. Three intensity levels: low, medium, high. This is genuinely useful - you can dial down for simple queries (faster, cheaper) and dial up when you need careful analysis.
response = bedrock.invoke_model(
model_id="amazon.nova-2-lite",
body={
"prompt": "Analyse this contract for risks",
"thinking_intensity": "high" # More reasoning
}
)
Built-in Tools
Nova 2 models include:
- Code interpreter (executes Python)
- Web grounding (searches for current information)
- MCP tool support
1 Million Token Context
All models support up to 1 million tokens of context. That's roughly 750,000 words - enough to process entire codebases or lengthy document sets.
6. Nova Forge: Build Your Own Frontier Model
Nova Forge is AWS's answer to "I want a model that knows my business."
How It Works:
- Start from Nova model checkpoints (pre-trained, mid-trained, or post-trained)
- Blend your proprietary data with Nova's curated training data
- Use reinforcement learning with your own reward functions
- Get a custom model hosted securely on AWS
The result is what AWS calls a "Novella" - a customised variant that combines Nova's general capabilities with your specific domain knowledge.
When Would You Use This?
- You have substantial proprietary data that would improve model performance
- You need capabilities that fine-tuning can't achieve
- You're willing to invest in a longer-term customisation effort
I haven't used Forge myself yet - it requires a deeper commitment than I've had time for. But the approach is interesting, especially the multi-turn RL training for complex agent workflows.
7. Nova Act: Browser Automation Agents
Nova Act is a specialised service for building agents that automate browser-based tasks.
Key Stats:
- 90% reliability on customer workflows
- Powered by a custom Nova 2 Lite model
- Outperforms competing models on browser automation benchmarks
Use Cases:
- Web scraping at scale
- Automated testing
- Form filling and data entry
- Competitive monitoring
If you've been building browser automation with Playwright or Puppeteer, Nova Act handles the intelligence layer - deciding what to click, how to navigate, and how to recover from unexpected states.
8. Amazon Q Developer: Agentic Coding
Q Developer evolved significantly in 2025. It's no longer just autocomplete - it's an agent that can plan, execute, and iterate.
Agentic Capabilities
Q Developer can:
- Read and write files autonomously
- Generate code diffs
- Run shell commands
- Incorporate feedback in real-time
- Plan multi-step implementations
It achieved the highest scores on SWE-Bench Leaderboard, which measures ability to resolve real GitHub issues.
2025 Updates:
- Language expansion: C#, C++, plus Dart, Go, Kotlin, PHP, Ruby, Rust, Scala, Bash, PowerShell, CloudFormation, Terraform
- GitLab Duo integration (GA)
- GitHub integration (preview, no AWS account needed)
- MCP support in CLI
- Persistent conversation history
- Frankfurt region for EU data residency
The Free Tier
50 agentic chat interactions per month, plus up to 1,000 lines of code transformation. For getting started, this is generous.
9. Kiro IDE: Spec-Driven Development
Kiro is AWS's full agentic IDE, launched in July and GA since November.
The Core Concept
Write requirements as specs (using EARS - Easy Approach to Requirements Syntax), and the agent scaffolds everything:
- Design artefacts
- Task breakdown
- Code implementation
- Tests
Internal Adoption
Amazon made Kiro their standard AI development environment. One reported case: a 30-developer, 18-month project became 6 developers over 76 days. Take that with appropriate scepticism, but the direction is clear.
Getting Started
You can sign in with GitHub, Google, AWS Builder ID, or IAM Identity Center. No AWS account required.
I've been using Kiro for a side project and find the spec-driven approach interesting. The autonomous agent (in preview) maintains context across sessions and learns from feedback. It's not perfect, but it's changing how I think about development workflows.
10. Supporting Services
A few other services worth knowing:
Amazon SageMaker AI
- Serverless MLflow for zero-infrastructure experimentation
- HyperPod with checkpointless training (automatic recovery from failures)
- Up to 95% training cluster efficiency
PartyRock
The no-code Bedrock playground. Free daily usage, no credit card required. Good for quick prototyping before you write real code.
S3 Vectors
Native vector storage in S3:
- 2 billion vectors per index
- 20 trillion vectors per bucket
- 100ms query latency
- Up to 90% cost reduction vs. specialised vector databases
For RAG applications, S3 Vectors removes the need for a separate vector database. The cost savings alone make it worth investigating.
11. Production Patterns
Some observations from building with these services:
Start with Bedrock, Add AgentCore When Needed
Don't reach for AgentCore immediately. Simple Bedrock invocations handle most use cases. AgentCore makes sense when you need:
- Multi-step workflows with tool usage
- Session isolation for concurrent users
- Episodic memory across interactions
- Production-grade observability
Policy Before Production
If you're deploying agents that take real actions, set up Policy guardrails early. It's much easier to define boundaries upfront than to add them after an incident.
Monitor Token Usage
Agentic workflows consume more tokens than single-shot invocations. The agent's internal reasoning, tool calls, and iterative refinement all add up. Build cost monitoring into your architecture from the start.
MCP is the Standard
Model Context Protocol is becoming ubiquitous. When building new APIs or integrations, consider MCP compatibility from the beginning. It'll make your tools accessible to a broader range of agent frameworks.
12. Where Does This Leave Us?
The AWS Gen AI ecosystem in 2025 is comprehensive - arguably too comprehensive. There are multiple overlapping ways to achieve similar goals, and the "right" approach depends heavily on your specific requirements.
My current mental model:
- Simple interactions: Bedrock direct invocation
- Complex workflows: AgentCore
- Browser automation: Nova Act
- Development: Q Developer for inline assistance, Kiro for spec-driven projects
- Custom models: Forge if you have the data and commitment
- RAG: S3 Vectors + Bedrock Knowledge Bases
Is agentic AI the future of software development? Probably, in some form. Are these specific services the lasting implementations? That's less certain. AWS has deprecated services before, and the AI landscape moves fast.
What I can say is that building with these tools today is genuinely productive. The developer experience has improved dramatically over the past year. Whether you're building customer-facing agents, internal automation, or AI-assisted development tools, AWS has the pieces you need.
The question is how you put them together. And honestly, that's the fun part.
What are you building with these services? I'd love to hear about your use cases and any patterns you've discovered.
Top comments (0)