AWS Just Defined AgentOps. Here's What Amazon Bedrock AgentCore Actually Changes About Running Agents in Production.

#ai #devops #machinelearning #discuss

DevOps gave us a shared vocabulary for shipping software reliably. MLOps did the same for models. Neither one maps cleanly onto agents — systems that don't execute predetermined workflows, that reason across multiple tools, that spawn sub-agents mid-task and accrue costs in non-linear ways. AWS just published a framework for what comes next, and Amazon Bedrock AgentCore is the infrastructure it runs on.

The Problem Nobody Had Good Words For

Standard software fails in predictable ways. You write tests, you catch regressions, you trace errors back to specific lines. Agents fail differently. The same input produces different outputs depending on memory state, tool availability, and whatever the LLM decided to weight this time. When a multi-agent chain produces a wrong answer, figuring out which agent, which tool call, and which decision point introduced the error is genuinely hard.

The cost problem is equally structural. A single user request can fan out across hierarchical agent chains or collaborative swarms, each one calling tools and spinning up compute that wasn't budgeted for. You can't rate-limit an agent the way you rate-limit an API endpoint — the agent decides how many calls to make.

What's been missing is not better agents. It's the operational discipline around them: how you govern what they can access, how you version and deploy them, how you evaluate quality systematically, and how you trace decisions after the fact. AWS is calling this AgentOps, framed explicitly as the agentic extension of GenAIOps the same way MLOps extended DevOps.

How AgentCore Actually Works

Amazon Bedrock AgentCore is structured around four pillars that map directly to where agent deployments break down in practice.

Governance and Security starts with the recognition that in multi-agent systems, authorization is ambiguous by default. When Agent A calls Agent B on behalf of a user with limited permissions, the agent should inherit those restrictions — but nothing in most agent frameworks enforces this. AgentCore Identity handles cross-agent authentication protocols that maintain security boundaries as requests propagate through a chain. AgentCore Gateway transforms APIs, Lambda functions, and external services into MCP-compatible tools behind a single authenticated endpoint, so agents never handle credentials directly. Cedar-based policy evaluation intercepts tool requests and validates them against deterministic rules before execution — answering "are you allowed to do this right now" at the per-call level, not just at the role level.

Build and Operations treats every agent, tool, and memory configuration as a versioned, independently deployable artifact. The recommended structure is four separate repositories: infrastructure (account setup, registries, seed code), agent (solution code with shared modules for tools, guardrails, and prompts), tool (MCP servers with their own CI/CD), and application (the business layer consuming the agent). This separation enables independent versioning and clear ownership. The AWS Agent Registry provides a centralized catalog with an approval workflow — draft, pending, approved — before agents become discoverable across an organization.

Evaluation runs at four levels: tool accuracy, conversation turn quality, session outcome, and full system behavior. The CI/CD pipeline triggers evaluation in pre-production across seven test dimensions, including authentication flow validation and authorization checks across multi-agent chains — custom test setups that simulate requests propagating through multiple agents to verify that identity and permissions carry through at every step. This is the part most teams skip, and the part that matters most when something goes wrong in production.

Observability covers four telemetry layers: decision traces, tool invocation patterns, latency and error rates, and cost per interaction. AgentCore Observability dashboards surface all of this without requiring custom instrumentation for each agent deployment.

What Teams Are Actually Using This For

The reference architecture AWS published maps a multi-account AWS setup — shared services account, dedicated dev/pre-prod/prod accounts per line of business — with AgentCore Runtime handling deployment, AgentCore Memory handling both short-term and long-term context with namespace-based scoping, and AgentCore Gateway sitting in front of all tool access.

The memory architecture is worth understanding separately. There's a distinction between data (documents and knowledge bases accessed through RAG, governed by traditional access controls) and memory (the agent's working context — conversation history, user preferences, interaction patterns that evolve with each session). AgentCore Memory handles both, with namespaces defined at creation time to scope memory at actor, session, or application level. In the insurance example AWS uses — a fraud agent and a claims agent — each has dedicated memory resources for domain-specific signals while sharing a common user details resource, all governed by IAM policies.

Swisscom is named as a production reference implementation, using AgentCore for customer support and sales agents at enterprise scale.

Why This Is a Bigger Deal Than It Looks

What AWS is really doing here is laying the groundwork to make "how do you run agents in production" a solved problem rather than a per-team improvisation.

The CI/CD integration is the piece that matters most for where this space is heading. Right now, most teams treating agents as software deploy them the same way they'd deploy a web service — push code, hope it works, debug in production. The AgentOps model makes agent deployment look more like model deployment: versioned artifacts, evaluation gates, staged rollouts, immutable runtime versions maintained automatically by AgentCore. That's the infrastructure discipline that enterprise adoption actually requires.

The Cedar-based policy layer is underrated. Policy evaluation that intercepts tool calls before execution and validates them against deterministic rules is a different category of guarantee than "the LLM was told not to do this." It's enforceable regardless of what the agent decides to attempt. Combined with the identity layer that attributes every action to a specific agent identity rather than the user's IAM role, this gives compliance and audit teams something they can actually work with.

AgentCore is framework-agnostic and model-agnostic. The operational layer works whether you're running Strands Agents, LangGraph, CrewAI, or a custom harness. That's the bet — that the operational infrastructure becomes the durable layer even as the frameworks evolve.

Availability and Access

Amazon Bedrock AgentCore is available now. The reference architecture and implementation guidance are detailed in the AWS ML blog post. AgentCore Runtime, Memory, Gateway, and Identity are available as independent components — you can adopt any pillar without committing to the full stack. The AWS Agent Registry, which handles agent discovery and approval workflows across organizations, is part of the same release. Pricing follows standard Bedrock consumption-based models; the post links to supporting AWS documentation for setup across each pillar.

If you're building multi-agent systems and still running without versioned artifacts, evaluation pipelines, or per-agent identity, that gap is now a documented operational risk — not just a best practice to get to eventually.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.