Morgan Willis for AWS

Posted on Dec 8, 2025 • Edited on Feb 25

We Need To Talk About AI Agent Architectures

#architecture #ai #agents #aws

AI agents are getting easier to build and host. With agentic frameworks and cloud-based hosting environments, you can deploy an agent to the cloud in an afternoon. It is now possible to assemble a multi-agent setup with memory, observability, and MCP connected tools without a huge amount of code or infrastructure work.

This convenience, paired with AI coding assistants making it easier than ever to ship, has created a trend that is worth talking about. Many developers are wiring UIs directly to their agents as if the agent runtime is the entire backend. It looks clean. It feels efficient. It also happens to be what most demos show, so it is understandable that teams take that pattern and run.

The diagram illustrates a direct client→agent architecture with a single entrypoint where the agent runtime replaces the entire backend.

This works well when you are exploring ideas. Once you move beyond a demo and into a real application, that client to agent pattern may start to break down. This is not because any specific agent runtime itself is limited, but because real production systems still need the same architectural layers they have always needed.

Web applications still need input sanitization. APIs still need rate limits. Business logic still needs a home. Services still need to coordinate with other systems. As soon as those pieces enter the picture, the architecture starts to look a lot more familiar.

AI agents expand what an application can do, but they do not erase the fundamentals of good systems design. The agent itself is not the system. It is a capability inside the system.

Let’s talk about what that means and why it matters.

Why Direct Client → Agent Is an Incomplete Architecture

Terminology note: In this post, runtime refers to a managed environment that executes agent logic on the server side. I use Amazon Bedrock AgentCore Runtime as an example throughout, but the same concepts apply to other hosted environments. Agent or agent service is your deployed code containing the agent framework, prompts, and tool integration.

Upstream services or modules are all components that handle requests before they reach the agent (UI, gateways, routers, backends).

Downstream services or modules are the tools and resources the agent calls (MCP tools, APIs, databases, internal services).

When the client talks directly to the agent runtime, responsibilities that normally live in other components can either get lost entirely or end up pushed into your agent code where they do not belong.

Without typical components of web architectures, the agent is expected to handle:

Request and security boundaries

Input sanitization, API level authorization rules, web traffic filtering, rate limiting, throttling, and safety checks.
Application and system orchestration

Coordinating services, enforcing business rules that span multiple systems, and managing workflow transitions that require durability outside an agent session.
Resilience and operational concerns

Retries, backoff behavior, event buffering, and behaviors that protect downstream systems.

Agents or hosted runtimes may be able to handle some of these tasks, but they were never designed to be your entire backend, your middle tier, or your web server. This is the same reason why we don’t point clients directly at AWS Lambda functions in most production systems without protective layers. In a similar way, agents are not meant to be directly front-end facing services for most use cases.

Where This Architecture Breaks Down in Practice

Here are 3 ways the client→agent pattern can break down in production:

Traffic, cost, and load patterns become hard to control.

When the UI talks directly to a single agent service without upstream boundaries, there is no clean place to enforce rate limits, handle noisy clients, or cap usage per user. A small bug, a retry loop, or a surge in usage can translate into a flood of LLM calls, driving unpredictable latency and inference costs without a structured way to throttle or shed load.
Every change shares the same blast radius because everything ships in one deployment unit

When validation logic, business rules, integration code, and agent behavior all live in the same service, every small change requires touching and redeploying the entire agent app. A tweak to a business rule, a simple bug fix, or a prompt change all share the same blast radius and rollback path, which slows iteration and makes failures harder to localize.
Refactoring becomes brittle as the system grows.

When the agent service acts as the entire backend, every aspect is fused into a single deployment unit. Additionally, many agent runtimes expose a single entrypoint like POST /invoke, which means every feature, workflow, and behavior enters through one undifferentiated entrypoint.

Nothing distinguishes one operation from another, so you lose the natural places where you would normally enforce permissions, validate input, or apply business rules.

With this setup, extending the architecture becomes difficult. Adding new functionality, queues, or workflow orchestration later means untangling tightly coupled logic. Adding features risks rewriting the agent, because the system never developed the separation needed to evolve cleanly.

Why Separation of Concerns Still Matters

We break systems into modules because each piece handles a specific kind of complexity so the rest of the system doesn’t have to. That separation of concerns keeps responsibilities contained, avoids logic leaking across boundaries, allows for decoupling, and makes the system more predictable at scale.

Testability also suffers when everything runs inside a single boundary. Isolating components, mocking dependencies, and doing targeted regression testing is far easier when concerns are separated into clear modules.

Experienced developers and systems engineers know this intuitively, but the rapid progress in AI tooling has lowered the barrier to agent deployment in a way that lets people ship agents before they have the architectural context to support them.

We, as a technical community, should amplify real-world patterns and lessons learned. Providing more examples of advanced use cases alongside simplified tutorials will allow us to learn together and move towards a set of guidelines for well-architected agents.

Balancing Simplicity and Structure in Agentic Systems

Just like any other solution, as you introduce more moving parts, you are now responsible for operating and maintaining them.

Additional components add complexity in the same way that having a load balancer, an API gateway, and a database connection pool adds complexity. These components exist because they absorb or abstract the handling of specific categories of risk, or responsibility, so your core application code does not have to. They make the entire system more reliable.

None of this means you must build a massive, highly distributed, micro-serviced architecture to use agents correctly.

You can run a simple, clean setup with a load balancer and router component in front of your agent, or add an API gateway for basic shaping and protection, and stop there. That pattern is perfectly valid for many teams, especially early on.

At the same time, companies operating at global scale or projects with complex requirements will naturally need more components. They may introduce additional services for orchestration, workflow durability, message buffering, network connectivity, or cross-system coordination. These architectures are more complex because the requirements and traffic patterns call for that complexity.

Both ends of that spectrum are reasonable. What matters is choosing the right architecture for your use case and constraints. The goal is not to chase complexity for complexity’s sake, and it is also not to flatten everything into a single module. It is to introduce the minimum number of components that meaningfully reduce risk, improve security, and enable flexibility as your system grows and changes.

That balance is what helps you start simple without boxing yourself into a corner later.

What Belongs in the Agent vs the Backend?

With all of that being said, what does belong in the agent vs in other components?

Agent frameworks make it easy to blur these boundaries, but keeping them clear is what prevents the system from collapsing into an expensive mess. The way you decide to build your agent heavily depends on your use case and technology choices. Agentic frameworks vary in their implementation, and so do the requirements from case to case. There is no one size fits all answer. Here are some high-level guidelines for getting started.

What typically belongs upstream (UI, gateway, router, backend)

Input shaping, validation, rate limiting, and web traffic filtering
Core business logic
Coordinating between services or orchestrating complex workflows
Workflow state, retries, orchestration, and durability

Separating these concerns keeps security, validation, and business rules separate from core agent code, reduces the blast radius of changes, and lets you change agent behavior without constantly reworking the logic that keeps the system running on a basic level.

What typically belongs inside the agent

Invoking LLMs using agentic frameworks
Tool selection and orchestration logic
Agent session state, context, and memory handling

An agent is generally responsible for interpreting goals, choosing actions, and reasoning over context. Decisions might come from the model, from graph level orchestration, or from deterministic routing depending on the framework and use case.

What typically belongs in tools

Reading or writing data
Querying systems of record
Triggering deterministic code
Invoking internal or external APIs
Triggering another agent to do work

Tools encapsulate actions. The model may determine when the tools are needed and the tools control how they execute the underlying operation.

AWS Architecture Patterns for AI Agents

Agents fit into systems just like any other capability fits into an application. You can keep things lightweight or expand into more distributed designs as your scale and needs change.

The patterns highlighted below are intentionally leaving out other parts of agentic systems like memory, MCP servers, RAG, and multi-agent communication. Those are important topics, but those components sit inside the agent runtime or downstream from it rather than in the upstream architectural components we are focusing on here.

You can extend or adapt these patterns for your use case. I will use AWS services as examples, with Amazon Bedrock AgentCore Runtime as the agent runtime, though you could swap these components with services from other providers and keep the same patterns.

Quick Amazon Bedrock AgentCore Primer

Because the following examples use AWS services, here are the basics. AgentCore Runtime is a managed serverless environment for hosting AI agents. It handles deployment, scaling, and session management, and integrates with many tools and services both inside and outside of AWS. It supports both IAM and OAuth based identity so you can plug it into existing security models. To learn more about AgentCore Runtime, click here.

1. Minimal API Gateway Pattern

Client → Amazon API Gateway + AWS Web Application Firewall (WAF)→ Amazon Bedrock AgentCore Runtime → Downstream services

Use this when

You are moving from prototype to production and want a small number of well understood layers.
You need basic protections like auth, rate limits, and input validation but do not yet have a large service ecosystem.

API Gateway and AWS WAF provide authentication, rate limits, routing, web traffic filtering, and a controlled boundary before the agent is invoked.

You can optionally include an AWS Lambda function between the API Gateway and the agent runtime which lets you write custom logic when invoking the agent, including deterministic input validation or other logic.

AgentCore Runtime handles inbound identity using OAuth or IAM.

If you later need queuing for incoming messages, you can include Amazon SQS between API Gateway and the agent and use a Lambda function that processes messages and invokes AgentCore Runtime. That lets you handle spiky traffic or ordered message processing without changing how the agent itself works.

2.Traditional Backend + Agent Pattern

Client → Application Load Balancer + AWS WAF → Web server(s), e.g., on Amazon EC2, Amazon ECS, or AWS Lambda→ Amazon Bedrock AgentCore Runtime → Downstream services

Use this when

You already have a web backend that you need to integrate into or if you need a designated component for routing and business logic.
You have non-trivial logic or workflow orchestration requirements.

Many production workloads still run traditional web backends. Those architectures do not disappear or need a major overhaul when you add an AI agent. You extend them.

The client sends requests through an Application Load Balancer which can integrate with AWS WAF for web filtering. From there, the request is sent to a web backend on Amazon EC2, containers, or Lambda.

The backend handles business logic and system coordination. The agent is a capability it uses, invoked via a VPC endpoint so traffic remains private.

3. Deep Automation Agent Pattern

Events coming from Amazon EventBridge→ AWS Step Functions → AWS Lambda → Amazon Bedrock AgentCore Runtime → Downstream Systems

Use this when

The value of the agent lives in backend processes, not in a chat UI.
You want agents to be one part of a larger workflow.
Work is triggered by events, schedules, or pipelines rather than direct user interaction.

Here, the agent is a part of a larger workflow, pipeline, or automation task. Agents can potentially run asynchronously, with no user facing UI at all.

Events from Amazon EventBridge or scheduled runs can invoke the agent in AgentCore Runtime directly using IAM as the authentication method. You can optionally introduce AWS Step Functions as a way to coordinate the steps of a long-running or multi-phased workflow that mixes deterministic and nondeterministic steps.

Step Functions provides a workflow control mechanism so the agent does not need to manage retries, branching, or overall workflow state.

The agent does its work and calls downstream services or tools as needed, while coordination between steps is handled by Step Functions. This allows you to run deterministic steps using services like AWS Lambda before, Amazon Simple Notification Service for notifying relevant parties after, or invoke various services in parallel to your agent. Again, you could swap out Step Functions for another workflow orchestrator and the concept still applies.

These patterns let you start simple, introduce components only when needed, and grow into more distributed or mature architectures.

The Takeaway

If you remember nothing else from this post, remember this: the question is not whether you can connect your client directly to an agent. You technically can. The question is whether you should.

In the short term, it may feel fast and simple. In the long term, it leads to a brittle system that is difficult to extend, hard to understand, and expensive to maintain.

A well-structured architecture lets agents be first class participants in your system without being overloaded by concerns that belong elsewhere. That is how you get the best of both worlds: the power of agentic reasoning combined with the reliability of proven distributed system design.

And yes, the client→agent tutorials are still useful. They exist to teach one focused concept without burying you in use case specific and complex details. They show you how to get an agent running, not how to design the full application around it.

But once you move toward production, the question becomes: Did we build a full system or did we stop at the agent?

The agent is the brain. The architecture is the body. You need both.

If you want to learn more about agentic design patterns on AWS, visit Agentic AI patterns and workflows on AWS and stay tuned for more blog posts from the AWS team where we explore specific architectures for agentic AI use cases and advanced design patterns.

Top comments (8)

Massimiliano Angelino • Dec 11 '25

Nice article. But now I have two questions for you: 1/ if this is the way that agents are supposed to be run, why should we use Oauth for authentication toward the agent runtime? Wouldn’t SPIFFE Spire make more sense? 2/ what if the agent is my service that I want to expose via A2A or MCP? Where does the handling of those protocols reside? In a layer before the agent runtime? If so why? What additional features would be implemented in such service that cannot be implemented in the agentic runtime
Service? Looking at ways to reduce undifferentiated features.

Morgan Willis AWS • Dec 16 '25

I don't think you should use OAuth for authentication for agents in most cases. I think in the case of hosting agents on AWS and in the architectures I showed above you should use IAM based auth.

The handling of A2A and MCP protocols would exist in the agent runtime layer as needed. Again, in the case of AWS (since that is what my blog uses as examples) agentcore runtime supports both of these protocol. Link if you want to learn more about that: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-service-contract.html?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&sc_channel=el

I think you may still have those protocols handled in an upstream layer though if you have multi-agent orchestration going on that you want to control at a higher level or if you have logic that orchestrates a workflow from that upstream layer. So, the answer is that it depends. That being said though I think those things do "belong" in the agent module.

Dennis Traub AWS • Dec 9 '25

I whole-heartedly agree. We really need to think about this: which of our mental models need revisiting - and which ones still apply. The architecture of almost every AI agent I see - from simple demos to enterprise pilots - looks like its 1993 again...

Morgan Willis AWS • Dec 9 '25

What's old is new again :P

Art light • Dec 9 '25

Really enjoyed reading this! You’ve made a great point about the balance between simplicity and structure when working with AI agents

Morgan Willis AWS • Dec 9 '25

I've been thinking about how agents can be like a new type of microservice in a larger system. However, that doesn't mean we need to make everything overly complex. Building for future flexibility while not overcomplicating things is important.