Morgan Willis for AWS

Posted on Feb 19 • Edited on Feb 25

From POC to Production-Ready: What Changed in My AI Agent Architecture

#ai #architecture #aws #agents

Most AI agent tutorials show the same problematic pattern: a front-end client directly invoking an agent backend.

I wrote a blog, We Need to Talk about AI Agent Architectures, that explored why this pattern is a problem and highlighted a few other patterns you should use instead.

The core argument was straightforward: agents are a capability inside the system, not the system itself.

The response to that post told me the topic resonated, so I did the next logical thing. I went and built the patterns I shared and created a repo so you can try them out too.

The reference repository walks through multiple step-by-step iterations, showing how to evolve an agent architecture from a POC to secure and flexible production-ready patterns.

The repo is here: aws-samples/sample-ai-agent-architectures-agentcore

I also made a video walk-through of the entire solution end-to-end you can watch here:

This post covers what I built, what I learned along the way, and what you should watch out for when building your own agent architectures.

Starting With the Anti-pattern

I started where most people start. Browser talks to agent, and that's it.

I wrote a simple LangGraph agent that had a few sample tools, and I created a front-end that I could run locally to be able to interact with it.

I hosted this agent using Amazon Bedrock AgentCore Runtime and used Amazon Cognito to handle auth.

AgentCore Runtime validates the token before invoking the agent, and the whole thing worked pretty easily. I had my agent hosted in the cloud and only authenticated users could access it.

But then I started asking the questions I raised in the original post:

What happens when someone hammers this endpoint?
Where do I enforce rate limits?
Where does input validation go?
Where can I put the business logic needed to expand the functionality of this app?

The answer to all of those questions was: forget about it or shove it in the agent code, because there is simply nowhere else for these things to be addressed.

So I started adding in the necessary components to tackle these issues following the patterns I laid out in the original blog.

Adding a Proper Front Door

With the first iteration of this architecture, I added a proper front door: Amazon API Gateway with AWS Web Application Firewall in front of the agent.

That gave me rate limiting, web traffic filtering, and an Amazon Cognito authorizer at the API Gateway level. The user can authenticate at the API level, and the agent still uses OAuth for inbound authentication.

This is the first step away from the anti-pattern.

It felt like a solid improvement, but when a colleague of mine was reviewing my solution, they found a security gap that I think a lot of us would miss.

The Authentication Bypass Problem

Let's take a step back.

The API Gateway uses OAuth to authenticate incoming requests. When a user logs in and invokes the agent, API Gateway verifies the JWT passed in from the client.

Then, API Gateway turns around and forwards that exact same token to the agent running on AgentCore Runtime to be validated. One token, used all the way through.

The problem with this is that the same token that satisfies the API Gateway also satisfies the agent directly. If a user has a valid JWT and knows the AgentCore endpoint, they can bypass the API Gateway entirely.

Your rate limits, WAF rules, and any other protections you put in front of the agent become optional. A savvy user can just go around them.

This opens you up to a Denial of Wallet attack. This is where someone floods your system with requests, the serverless hosted backend scales up to absorb those requests, and then you're hit with a fat bill later down the line.

This might not be an obvious gap at first, because you might think "Well, how would anyone know my agent endpoint? You need both the token AND the endpoint to invoke it. As long as someone doesn't know the endpoint, they'll be forced to go through the API Gateway."

This is called security through obscurity. You're counting on someone not knowing the endpoint. But sometimes identifiers like ARNs, account numbers, and agent IDs can leak accidentally in various ways.

It's not enough to operate a production system using security by obscurity as your defense.

I left this gap in the examples I published in the repo (with disclaimers) deliberately because I think it is the kind of thing teams will hit in practice.

Closing the Security Gap

To address this issue, I introduced a lightweight AWS Lambda function between the gateway and the agent and switched the agent to use IAM authentication instead of OAuth.

That way, the token that is used to authenticate with the API is different from what is being used to securely invoke the agent. A malicious actor can no longer invoke my agent directly.

Only the AWS Lambda function with the correct permissions attached to its IAM execution role can invoke the agent.

By separating user authentication from backend invocation permissions, we eliminate the possibility of a client bypassing the API protections.

This is the pattern I recommend as the starting point for most production workloads.

Cognito handles user identity, API Gateway + WAF handle traffic protection and shaping, Lambda handles request processing, and the agent handles reasoning.

This represents an application with a single endpoint. Most real-world applications have more than one endpoint.

Time for the next iteration.

Expanding Application Functionality

What I did next is add conversation history to the application: persistent memory for the agent, conversation history displayed on the front-end, and the ability to pick up where you left off across sessions.

To achieve this, I introduced a second endpoint for conversations in API Gateway, a second Lambda function for the conversation retrieval logic, an Amazon DynamoDB table for conversation metadata, and I used Amazon Bedrock AgentCore Memory for storing the full conversation history.

The second endpoint and Lambda function gave me a place to run logic that does not require the agent, like retrieving past conversations from memory to display.

This reinforces a key design principle: only invoke the LLM when you actually need reasoning, and handle everything else with traditional application infrastructure.

This is where you can really start to see how to evolve this pattern to adapt to a more complex use case.

The Agent Wasn’t the Real Problem

The agent code barely changed across all iterations. What changed was everything around it. That progression is the whole point of sharing this example.

As the system needed tighter security, traffic controls, memory, and additional endpoints, the agent stayed focused on what agents do.

This is why it's so important to design agent architectures applying the same systems design thinking we apply to everything else. It lets you isolate responsibilities, keep reasoning separate from traffic control and business logic, and prevent your agent from becoming an accidental "Big Ball of Mud".

You want to build an architecture around your agent that can evolve as your requirements evolve.

What Comes After the Basics

The patterns we covered here tackle foundational concerns: traffic protection, auth boundaries, separation of responsibilities. These are well-understood problems with well-understood solutions.

The design challenges we face with deploying AI agents to production that come next are potentially less straightforward.

For example:

How do you control which tools an agent can call, and under what conditions?
How do you audit what data the agent accessed and what actions it took at scale?
How do you prevent the agent from doing something that is perfectly valid in one context but inappropriate in another, while still allowing it when it makes sense?
How do you ensure your agent is following the instructions you gave it end-to-end?

These are the kinds of problems teams hit as agents move from basic assistants to systems that take actions on behalf of users and organizations that have real-world consequences.

The answers require new patterns and solutions that we have not yet fully worked out or adopted widely as an industry.

This post tackled the basics. You need to get the foundational architecture right first, because none of the harder problems get easier if you are also fighting your own infrastructure design choices.

Go Forth and Build Agents

In my original post, I argued that agents are a capability inside the system, not the system itself. Building these patterns reinforced that.

Every iteration made the agent more useful, more secure, and more operable, not by changing the agent, but by building the right architecture around it.

Good architecture makes your agent better without the agent needing to know about it.

Go fork the repo, deploy the iterations, and adapt the patterns to your own use cases.

If you found this useful, star the repo so others can find it too. And if you want more context on why these patterns matter, start with the original post: We Need To Talk About AI Agent Architectures.

Top comments (2)

nivcmo • Feb 20

This is exactly the evolution every serious AI agent project goes through. The "agent as the system" anti-pattern is so common because it's the fastest way to ship a demo — but it's also the fastest way to create a production nightmare.

The auth bypass gap you identified is particularly insidious. Most teams would miss that because the "happy path" works perfectly. It's only when you start thinking like an attacker that you realize one token flowing through the entire stack is a liability.

Your key insight — "only invoke the LLM when you actually need reasoning" — is something more teams need to internalize. I've seen agents invoked for simple CRUD operations that a basic Lambda could handle in 10ms. Agents should be the "expensive reasoning engine" in your architecture, not the catch-all for every operation.

The progression from direct client→agent to API Gateway → Lambda → agent is the right mental model. Each layer adds isolation, observability, and control without complicating the agent itself.

Looking forward to your thoughts on the harder problems you mentioned — tool permissioning, audit trails, and context-appropriate behavior. Those are the real challenges that separate toy agents from production systems.

Thanks for sharing the repo — will definitely dig into the iterations.

Morgan Willis AWS • Feb 20

"I've seen agents invoked for simple CRUD operations that a basic Lambda could handle in 10ms" This part lol

Some things should just be deterministic! I'll be posting more blogs about the harder problems going forward as I work them out myself. I have ideas but I need to build them out and then I'll come back here and share what I learned.