What Are Agentic AI Architectures?
I won't waste your time with a long, fluffy introduction about how AI is changing the world. Let's get straight to the point: agentic AI architectures are fundamentally different from the prompt-response pattern you're probably used to with language models.
In an agentic architecture, the AI doesn't just spit out a response to your input. Instead, it functions as an autonomous agent that breaks down complex tasks into steps, executes those steps by calling the right tools, and uses the results to inform subsequent actions. Think of it as the difference between asking someone a question and hiring them to do a job - the agent actually does work on your behalf rather than just answering.
Amazon Bedrock and the Nova model family are AWS's offering in this space. Bedrock provides the managed infrastructure and orchestration, while Nova models serve as the intelligence. In this article we'll dig into how these technologies work together, the architectural patterns for implementing agentic systems, and the practical considerations for building them at scale.
Understanding Amazon Bedrock and the Nova Model Family
Amazon Bedrock is AWS's fully managed service for building generative AI applications. It provides a unified API for accessing foundation models, but it's not just a model gateway, it's a comprehensive platform for building, deploying, and running AI applications without managing infrastructure.
The Amazon Nova family is AWS's proprietary set of foundation models, with several variants optimized for different use cases:
| Model | Type | Context Window | Multimodal? | Best For | Pricing | 
|---|---|---|---|---|---|
| Nova Micro | Text-only | 32K tokens | No | Simple tasks, classification, high volume | $0.000035/1K input tokens, $0.00014/1K output tokens | 
| Nova Lite | Multimodal | 128K tokens | Yes (text, image, video) | Balanced performance, routine agent tasks | $0.00006/1K input tokens, $0.00024/1K output tokens | 
| Nova Pro | Multimodal | Up to 300K tokens | Yes (text, image, video) | Complex reasoning, sophisticated agents | $0.0008/1K input tokens, $0.0032/1K output tokens | 
What makes these models particularly suited for agentic applications? First, they're optimized for function calling: the ability to output structured JSON requests for external tools. Second, those large context windows allow agents to maintain extensive conversation history and detailed instructions. Third, the multimodal capabilities (in Lite and Pro) let agents process images and videos alongside text.
Under the hood, Bedrock scales compute resources automatically based on demand. When your agent suddenly gets hit with a traffic spike, AWS provisions additional resources to maintain performance. There's no infrastructure for you to manage, just APIs to call.
Agentic Architectures: Beyond Simple Prompt-Response Systems
So what exactly makes agentic architectures different from regular LLM applications? Let me break it down with a practical analogy.
A traditional LLM application is like asking someone a question at an information desk: you expect them to answer based on what they know, but they won't leave their desk to do anything for you. An agentic architecture is more like having a personal assistant: they'll not only answer your questions, but also make phone calls, look up information, and take actions on your behalf.
The foundation of this approach is what we call the Reason-Act-Observe loop:
- Reason: The agent analyzes the current state and decides what to do next 
- Act: It executes an action by calling an external tool/API 
- Observe: It processes the result from that action 
- Loop: Based on what it observed, it reasons again about the next step 
This cycle continues until the agent determines it has completed the task. It's similar to how you might approach a complex task: you don't solve problems in one leap, but through a series of steps, evaluating after each one.
Here's how this translates to AWS implementations. When you build an agent on Bedrock, you're essentially defining what tools (AWS calls these "action groups") the agent can use, what data sources (knowledge bases) it can reference, and what instructions guide its behavior. The actual orchestration, deciding which tool to use when and chaining the steps together, is handled by Bedrock's agent runtime.
This approach has clear advantages. An agent can handle requests like "Find me flights to New York next weekend, check the weather forecast, and suggest some hotels near Central Park", a request that would be impossible to fulfill in one shot. By breaking it into steps (search flights, check weather, find hotels), and calling APIs for each piece of data, the agent can assemble a comprehensive response.
But this approach isn't without trade-offs. Agentic systems are more complex to configure, potentially slower (since multiple steps and API calls take time), and generally more expensive in terms of both token usage and compute costs. You're paying for the additional reasoning steps and API calls that happen behind the scenes.
Bedrock Agents: Building Blocks and Architecture
A Bedrock Agent consists of several key components:
The foundation model is the brain of your agent. For complex agents, Amazon Nova Pro is typically the best choice with its 300K token context window and multimodal capabilities. For simpler tasks or cost-sensitive applications, Nova Lite (128K tokens) or even Nova Micro (32K tokens) might be sufficient.
The instructions define what your agent does. This is effectively a system prompt that guides the agent's behavior. For example:
You are a travel planning assistant. Your job is to help users find flights, accommodations, and plan itineraries. You have access to flight search APIs, hotel databases, and weather forecasts. Always confirm dates and locations before making any bookings. If the user's request is ambiguous, ask clarifying questions.
Action Groups (what other frameworks might call "tools") define what your agent can do in the world. Each action group contains:
- A schema (OpenAPI or function schema) describing available actions 
- A Lambda function implementing those actions 
For example, a flight search action might be defined with this schema:
openapi: "3.0.0"
info:
  title: FlightSearchAPI
  version: "1.0"
paths:
  /flights/search:
    get:
      summary: Search for flights
      description: Finds available flights between origin and destination on specified dates.
      parameters:
        - name: origin
          in: query
          required: true
          schema:
            type: string
          description: Origin airport code (e.g., "JFK")
        - name: destination
          in: query
          required: true
          schema:
            type: string
          description: Destination airport code (e.g., "LAX")
        - name: departDate
          in: query
          required: true
          schema:
            type: string
          description: Departure date (YYYY-MM-DD)
        - name: returnDate
          in: query
          required: false
          schema:
            type: string
          description: Return date for round trip (YYYY-MM-DD)
And a Lambda function to implement it:
def lambda_handler(event, context):
    # Extract parameters from the event
    params = event.get('parameters', {})
    origin = params.get('origin')
    destination = params.get('destination')
    depart_date = params.get('departDate')
    return_date = params.get('returnDate')
    # In a real implementation, you'd call your flight API
    # For this example, we'll return mock data
    flights = [
        {
            "airline": "Oceanic Airlines",
            "flightNumber": "OA815",
            "departureTime": "08:15",
            "arrivalTime": "11:30",
            "price": 299.99,
            "currency": "USD"
        },
        {
            "airline": "United Airlines",
            "flightNumber": "UA456",
            "departureTime": "13:45",
            "arrivalTime": "17:00",
            "price": 349.99,
            "currency": "USD"
        }
    ]
    return {
        "flights": flights,
        "origin": origin,
        "destination": destination,
        "departDate": depart_date,
        "returnDate": return_date
    }
Optional Knowledge Bases connect your agent to external data. These use vector embeddings (typically generated with Amazon Titan Embeddings) to find relevant information in your data sources. For instance, if you have a knowledge base of travel guides and a user asks about "things to do in Barcelona," the agent can automatically retrieve and reference the Barcelona guide.
Prompt Templates control how the agent processes information at different stages. There are four main templates:
- Pre-processing (validating user input) 
- Orchestration (driving the decision-making) 
- Knowledge Base (handling retrievals) 
- Post-processing (refining the final answer) 
The power of Bedrock Agents lies in how these components work together. When a user sends a request, the agent:
- Processes the user input 
- Enters an orchestration loop where it repeatedly: 
* Decides what to do next (answer directly or use a tool)
* If using a tool, calls the corresponding Lambda
* Processes the result and decides on next steps
- Delivers the final response once the task is complete
All of this happens automatically, your code just calls invoke_agent, and Bedrock handles the complex orchestration behind the scenes.
Stop copying cloud solutions, start understanding them. Join over 45,000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.
Knowledge Bases and Retrieval-Augmented Generation
One of the most powerful features of Bedrock Agents is their ability to tap into your data through knowledge bases. This integration enables retrieval-augmented generation (RAG), where the agent grounds its responses in specific documents or data sources.
Setting up a knowledge base involves three steps:
- Prepare your data source. This could be documents in S3, a database, or another repository. Bedrock supports multiple file formats including PDFs, Word docs, text files, HTML, and more. 
- Create the knowledge base configuration, specifying: 
* The data source (e.g., an S3 bucket)
* An embedding model (e.g., Amazon Titan Embeddings)
* Chunk size and overlap for document splitting
* Metadata options for filtering
- Associate the knowledge base with your agent.
When a user asks a question, the agent might determine it needs external information. It then:
- Formulates a search query based on the user's question 
- Sends this query to the knowledge base 
- Receives relevant document chunks 
- Incorporates these chunks into its reasoning 
- Generates a response grounded in this information 
There's a trade-off to consider with knowledge bases: adding retrieved content to prompts increases token count and therefore cost. A prompt that might normally be 500 tokens could easily grow to 2,000+ tokens with retrieved content. However, the improvement in answer quality is often worth it.
The chunking strategy significantly impacts retrieval quality. If chunks are too large, they'll contain irrelevant information and waste tokens. If they're too small, they might lose important context. A good starting point is 300-500 token chunks with about 10% overlap, but you'll need to experiment based on your specific content.
Performance and Cost Optimization
Let's talk numbers: how much will this actually cost you, and how do you keep it reasonable?
The cost of running agentic applications on Bedrock comes down to several factors:
- Model Invocation Costs: This is the primary expense. Each time the agent "thinks," it invokes the foundation model, which charges per token. For Nova models, input tokens (what you send to the model) are 8 times cheaper than output tokens (what it generates). You can view the prices on the official Bedrock pricing page. 
- Tool Execution Costs: Every tool the agent calls typically invokes a Lambda function and possibly other AWS services, each with their own costs. 
- Knowledge Base Costs: These include the initial vectorization of your data, storage of embeddings, and retrieval operations. 
Here are some strategies to optimize costs:
Use the right model for the job. Nova Micro is vastly cheaper than Nova Pro, so consider using it for simpler tasks. You could even implement a cascading approach: try with Micro first, and only escalate to Pro for complex queries.
Optimize prompt sizes. Keep your instructions concise, trim conversation history when possible, and only include relevant information. Every token costs money.
Take advantage of prompt caching. Bedrock caches repeated portions of prompts (like instructions or tool definitions) and offers up to 90% discount on those cached tokens. This can significantly reduce costs for agents that have consistent patterns.
For high volume, use provisioned throughput. If you're consistently running many agent invocations, Provisioned Throughput offers lower per-token rates in exchange for a capacity commitment.
Monitor token usage. Set up CloudWatch alarms to alert you if usage spikes unexpectedly, which could indicate an issue with your agent's logic or a potential abuse.
As for performance, agent orchestration adds latency because of the multiple steps involved. A simple query might take 2-3 seconds, while a complex one requiring multiple tool calls could take 10+ seconds. Be upfront with users about this latency, and consider implementing a streaming interface to show intermediate progress.
Advanced Implementation Patterns
Beyond the basics, there are several advanced patterns that can enhance your agents' capabilities and efficiency.
Custom Prompt Templates: The default Bedrock templates work well, but customizing them gives you more control. For example, you might modify the orchestration template to include specific reasoning steps or decision criteria:
Given the user's request and available tools, determine the best course of action by:
1. Identifying the specific information or task the user is requesting
2. Checking if you already have all necessary information in the context
3. If not, selecting the appropriate tool or asking a clarifying question
4. Once you have all information, providing a concise answer
Remember:
- Only use tools when necessary, not for information already provided
- Always verify flight details before proceeding with any booking
- If multiple actions are needed, handle them one at a time
Model Cascading: You can implement a multi-tier approach where simple queries get handled by lightweight models and only complex ones escalate to more powerful models. This isn't built into Bedrock directly, but you can create a router function that analyzes incoming queries and dispatches them to different agents powered by different models.
Chain of Agents: For complex workflows, you might create multiple specialized agents that work together. For example, a travel planning system might have separate agents for flight search, hotel recommendations, and itinerary creation. A controller coordinates between these agents, passing information between them as needed.
Hybrid RAG Approaches: While basic RAG works well, advanced implementations might combine multiple retrieval strategies. For instance, you could implement a system that first attempts semantic search, then falls back to keyword search if the results aren't satisfactory. This can be implemented by customizing your Lambda functions that process knowledge base results.
Integration with Human Workflows: For high-stakes scenarios, consider integrating human review into the agent's workflow. The agent can handle routine cases autonomously but elevate complex or risky cases to human reviewers. This requires additional orchestration logic, typically implemented through Step Functions or a similar workflow service.
Security and Access Control
Security is particularly important for agentic applications because they actively invoke services and access data. Getting this wrong means your agent could potentially do things you never intended.
The cornerstone of Bedrock Agent security is IAM. Each agent operates with an IAM execution role that defines what AWS resources it can access. Follow the principle of least privilege rigidly - grant only the specific permissions needed for the agent's functions and nothing more.
Here's an example IAM policy for an agent that only needs to call two specific Lambda functions:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "lambda:InvokeFunction",
            "Resource": [
                "arn:aws:lambda:us-east-1:123456789012:function:FlightSearchFunction",
                "arn:aws:lambda:us-east-1:123456789012:function:HotelSearchFunction"
            ]
        }
    ]
}
Additionally, apply resource-based policies on your Lambda functions to ensure they can only be invoked by your Bedrock Agent:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
            },
            "Action": "lambda:InvokeFunction",
            "Resource": "arn:aws:lambda:us-east-1:123456789012:function:FlightSearchFunction",
            "Condition": {
                "StringEquals": {
                    "AWS:SourceAccount": "123456789012"
                }
            }
        }
    ]
}
For Lambda functions that access sensitive data or services, implement additional validation. Don't assume that because your agent is well-behaved, the data it passes to your functions will be well-formed or safe. Validate everything.
If your agent processes personal or sensitive information, consider:
- Using Bedrock Guardrails to filter inappropriate content 
- Implementing PII detection and masking in your Lambda functions 
- Encrypting sensitive data at rest and in transit 
- Setting up comprehensive logging and auditing 
If your agent acts on behalf of specific users, ensure user identity and permissions are properly propagated. One approach is to pass user tokens through the agent's session attributes and have your Lambda functions validate these tokens before accessing user-specific resources.
Conclusion: The Future of Agentic Applications on AWS
Agentic applications represent a significant step forward in what's possible with AI. By combining the reasoning capabilities of foundation models with the ability to take actions in the real world, these systems can handle complex tasks that would be impossible for traditional applications.
Amazon Bedrock and the Nova model family provide a robust platform for building these applications. You get the benefit of managed infrastructure and powerful foundation models, while retaining the flexibility to integrate with your existing AWS services and data.
The patterns we've explored in this article, from action groups and knowledge bases to security controls and cost optimizations, aren't just theoretical. They're being applied today in customer service, enterprise productivity, data analysis, and many other domains.
As you start exploring this space, remember that building effective agents requires balancing several factors: technical capability, user experience, security, and cost. The most successful implementations are those that get this balance right for their specific use case.
While the technology is powerful, it's not magic. Agents have limitations: they may sometimes misunderstand requests, take longer than expected to complete tasks, or struggle with highly complex workflows. Set realistic expectations with your users, and design your applications to gracefully handle these edge cases.
Despite these challenges, the potential is enormous. As foundation models continue to improve and AWS enhances the Bedrock platform, the possibilities for intelligent, autonomous applications will only expand. The agents you build today are just the beginning of a new approach to software that's more capable, more contextual, and more helpful than ever before.
Stop copying cloud solutions, start understanding them. Join over 45,000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.
- Real scenarios and solutions 
- The why behind the solutions 
- Best practices to improve them 
If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com
 

 
                       
    
Top comments (0)