DEV Community

Cover image for Building Production-Ready AI Agents with Pydantic AI and Amazon Bedrock AgentCore
Danilo Poccia for AWS

Posted on

Building Production-Ready AI Agents with Pydantic AI and Amazon Bedrock AgentCore

Building Production-Ready AI Agents with Pydantic AI and Amazon Bedrock AgentCore

In this third deep dive of our multi-framework series, I'll show you how to build type-safe AI agents using Pydantic AI and deploy them with Amazon Bedrock AgentCore. The complete code for this implementation, along with examples for other frameworks, is available on GitHub at agentcore-multi-framework-examples.

Pydantic AI brings the power of Python most popular validation library to AI agent development. If you've ever struggled with unpredictable LLM outputs or spent hours debugging type mismatches in your agent code, Pydantic AI offers a refreshing solution. It enforces structure where AI tends to be chaotic, providing type safety, automatic validation, and clear data contracts.

Pydantic AI has a minimalist approach that simplifies implementations and works well for production deployments. Unlike the multi-agent orchestration of CrewAI or the model-based extensibility of Strands, Pydantic AI focuses on doing one thing exceptionally well: ensuring your agent's inputs and outputs are exactly what you expect them to be. This predictability is invaluable when building systems that other services depend on.

While Pydantic AI excels at structured outputs through Pydantic models, automatic validation of responses, and built-in error handling for type mismatches, production deployments also need to follow security and scalability best practices. Let me show you how the integration with AgentCore enables persistent memory and scalable deployments without adding complexity to an agent code.

Setting Up the Development Environment

Let's start by setting up the Pydantic AI project. If you haven't already cloned the repository:

git clone https://github.com/danilop/agentcore-multi-framework-examples.git
cd agentcore-multi-framework-examples
Enter fullscreen mode Exit fullscreen mode

Now let's set up the Pydantic AI project. I'm using uv, a fast Python package installer, to manage dependencies:

cd agentcore-pydantic-ai
uv sync
source .venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

The project dependencies are remarkably lean:

  • pydantic-ai-slim[bedrock]: The core framework with Amazon Bedrock support
  • bedrock-agentcore: The SDK for integrating with AgentCore services
  • bedrock-agentcore-starter-toolkit: CLI tools for deployment

Notice I'm using pydantic-ai-slim rather than the full package. This gives me just the agent functionality without additional dependencies I don't need for this implementation.

Creating and Configuring AgentCore Memory

Before building our agent, let's set up AgentCore Memory. If you've already done this for previous examples, you can skip the creation steps and just copy the configuration file.

Quick Memory Setup

For those who haven't set up memory yet:

cd ../scripts
uv sync
uv run create-memory
uv run add-sample-memory
cd ../agentcore-pydantic-ai
cp ../config/memory-config.json .
Enter fullscreen mode Exit fullscreen mode

The memory system provides three strategies that automatically extract insights from conversations: User Preferences (behavioral patterns), Semantic Facts (domain knowledge), and Session Summaries (conversation overviews). These work together to give our agents persistent memory across sessions.

Building the Type-Safe Agent

Pydantic AI approach is refreshingly straightforward. Instead of complex configurations or multiple files, everything centers around the Agent class with its clean, functional API.

Understanding Message History

One of Pydantic AI key features is its built-in message history management. Unlike other frameworks where you might manage conversation context manually, Pydantic AI provides a structured way to maintain conversation continuity:

from pydantic_ai import Agent

# Session state tracking (minimal global state)  
session_message_history = {}  # Dict[session_id, List[ModelMessage]]

@app.entrypoint
def invoke(payload: Dict[str, Any], context: Optional[RequestContext] = None) -> str:
    """Main entrypoint with refactored memory functionality using AgentMemoryManager."""

    prompt = payload.get('prompt')
    actor_id = payload.get("actor_id", DEFAULT_ACTOR_ID)

    # Get session_id from context (AgentCore automatically provides this)
    session_id = context.session_id if context and context.session_id else payload.get("session_id", DEFAULT_SESSION_ID)

    # Get or initialize message history for this session
    if session_id not in session_message_history:
        session_message_history[session_id] = []

    current_message_history = session_message_history[session_id]
Enter fullscreen mode Exit fullscreen mode

The message history is a list of ModelMessage objects that Pydantic AI uses internally. By maintaining this history and passing it to the agent, we make the conversation flows naturally across multiple invocations.

AgentCore Memory Integration

Combining Pydantic AI message history with AgentCore long-term memory provides different levels of memory. Let me show you how the integration can be implemented:

# Get memory context to add to user prompt
memory_context = memory_manager.get_memory_context(
    user_input=prompt or "",
    actor_id=actor_id,
    session_id=session_id
)

# Create enhanced user prompt with memory context
enhanced_prompt = prompt or "Hello"
if memory_context:
    enhanced_prompt = f"{memory_context}\n\nUser: {enhanced_prompt}"
    logger.info("Added memory context to user prompt")
Enter fullscreen mode Exit fullscreen mode

The MemoryManager (the same class we used in Strands Agents and CrewAI) retrieves relevant memories from past sessions. Here's how it works internally:

def get_memory_context(self, user_input: str, actor_id: str, session_id: str) -> str:
    """Get memory context as a string to be added to user input."""
    session_key = f"{actor_id}:{session_id}"
    context_parts = []

    # Load conversation history on first invocation
    if not self._initialized_sessions.get(session_key, False):
        conversations = self.memory_client.get_last_k_turns(
            memory_id=self.memory_config.memory_id,
            actor_id=actor_id,
            session_id=session_id,
            k=self.max_conversation_turns
        )

        if conversations:
            # Format as conversation history
            context_messages = []
            for turn in reversed(conversations):
                for message in turn:
                    context_messages.append(f"{message['role']}: {message['content']}")
            context_parts.append(f"Recent conversation:\n{'\n'.join(context_messages)}")

        self._initialized_sessions[session_key] = True

    # Retrieve semantically relevant memories
    if user_input:
        memories = self.memory_client.retrieve_memories(
            memory_id=self.memory_config.memory_id,
            namespace=f"/actor/{actor_id}/",
            query=user_input
        )

        if memories:
            memory_text = self._format_memories(memories)
            context_parts.append(f"Relevant long-term memory:\n{memory_text}")

    return "\n\n".join(context_parts)
Enter fullscreen mode Exit fullscreen mode

This integration provides three levels of memory:

  1. Session Message History (managed by Pydantic AI): Maintains conversation flow within the current session
  2. Conversation History (via AgentCore get_last_k_turns): Loads previous conversations when a session starts
  3. Semantic Memory (via AgentCore RetrieveMemories): Searches across all memories for relevant context

By prepending this context to the user's prompt, the agent has access to historical information even in a new session, enabling truly persistent conversations.

Running the Agent

With memory context prepared, I create and run the agent:

# Create agent with base instructions
agent = Agent(MODEL, instructions='Be concise, reply with one sentence.')

# Run the agent with enhanced prompt and message history
result = agent.run_sync(enhanced_prompt, message_history=current_message_history)

# The result object contains:
# - result.output: The agent's text response (what we return to the user)
# - result.all_messages(): Complete message history including this interaction
Enter fullscreen mode Exit fullscreen mode

The run_sync method executes the agent synchronously, which is perfect for our serverless deployment model. The message_history parameter provides the agent with context from earlier in the conversation. The method returns a result object where result.output contains the agent's text response that we'll return to the user.

Storing New Messages

After the agent responds, I need to detect and store new messages in AgentCore Memory:

# Get all messages including the new interaction
all_messages_after = result.all_messages()

# Detect new messages by comparing counts
previous_message_count = len(current_message_history)
new_messages = all_messages_after[previous_message_count:]

if new_messages:
    logger.info(f"Storing {len(new_messages)} new messages in memory")
    store_pydantic_messages_in_memory(new_messages, memory_manager, actor_id, session_id)

# Update session message history (keep last NUM_MESSAGES)
session_message_history[session_id] = all_messages_after[-NUM_MESSAGES:]
Enter fullscreen mode Exit fullscreen mode

This approach efficiently detects which messages are new by comparing message counts before and after the agent run. The store_pydantic_messages_in_memory function handles Pydantic AI message format and uses the store_conversation method to store them in AgentCore Memory.

When store_conversation calls the create_event API internally, it not only stores the raw conversation but also triggers the memory strategies to automatically extract user preferences, semantic facts, and generate session summaries. This is how our agent builds long-term knowledge from every interaction.

The NUM_MESSAGES limit (set to 30) prevents the message history from growing unbounded. This is important for managing token limits and ensuring consistent performance.

Integrating with AgentCore Runtime

The integration with AgentCore Runtime follows the same pattern as our other frameworks, but Pydantic AI simplicity makes it particularly elegant:

from bedrock_agentcore import BedrockAgentCoreApp
from bedrock_agentcore.runtime.context import RequestContext
from pydantic_ai import Agent

app = BedrockAgentCoreApp()

@app.entrypoint
def invoke(payload: Dict[str, Any], context: Optional[RequestContext] = None) -> str:
    """Main entrypoint with refactored memory functionality using AgentMemoryManager."""

    prompt = payload.get('prompt')
    actor_id = payload.get("actor_id", DEFAULT_ACTOR_ID)
    session_id = context.session_id if context and context.session_id else payload.get("session_id", DEFAULT_SESSION_ID)

    # Get or initialize message history for this session
    if session_id not in session_message_history:
        session_message_history[session_id] = []

    current_message_history = session_message_history[session_id]

    # Get memory context to enhance the prompt
    memory_context = memory_manager.get_memory_context(
        user_input=prompt or "",
        actor_id=actor_id,
        session_id=session_id
    )

    # Create enhanced prompt with memory context
    enhanced_prompt = prompt or "Hello"
    if memory_context:
        enhanced_prompt = f"{memory_context}\n\nUser: {enhanced_prompt}"

    # Create and run the agent
    agent = Agent(MODEL, instructions='Be concise, reply with one sentence.')
    result = agent.run_sync(enhanced_prompt, message_history=current_message_history)

    # Store new messages in memory
    new_messages = result.all_messages()[len(current_message_history):]
    if new_messages:
        store_pydantic_messages_in_memory(new_messages, memory_manager, actor_id, session_id)

    # Update session message history
    session_message_history[session_id] = result.all_messages()[-NUM_MESSAGES:]

    # Return the agent's string output
    return result.output

if __name__ == "__main__":
    app.run()
Enter fullscreen mode Exit fullscreen mode

The BedrockAgentCoreApp provides all the infrastructure scaffolding, while the @entrypoint decorator marks our handler function.

The key line is result = agent.run_sync(enhanced_prompt, message_history=current_message_history) where the Pydantic AI agent processes the request. The run_sync() method returns a result object, and we return result.output which contains the agent's text response as a string. AgentCore Runtime automatically wraps this string in the appropriate HTTP response structure for API Gateway, so you don't need to worry about response formatting—just return the text content you want to send back to the user.

This implementation is simple—there's no complex class hierarchy, no multiple configuration files, just a straightforward function that processes requests and returns responses. This aligns perfectly with Pydantic AI approach of minimalism and clarity.

The RequestContext automatically provides session management:

# Get session_id from context (AgentCore automatically provides this)
session_id = context.session_id if context and context.session_id else payload.get("session_id", DEFAULT_SESSION_ID)
logger.info(f"Using session_id from context: {session_id}")
Enter fullscreen mode Exit fullscreen mode

Session isolation is implemented at infrastructure level—each user gets their own session that can persist across invocations but remains completely isolated from other users.

When deployed, AgentCore Runtime handles:

  • Request routing: API Gateway forwards requests to your Lambda function
  • Session management: Automatic session tracking and isolation
  • Scaling: Lambda automatically scales based on request volume
  • Error handling: Built-in retry logic and error reporting
  • Logging: CloudWatch integration for monitoring and debugging

Your Pydantic AI agent doesn't need to know about any of this infrastructure—it just focuses on processing messages and returning responses.

Memory Manager Architecture

The Pydantic AI implementation uses the store_conversation() method from the shared memory module, with framework-specific message conversion logic in the main application file:

def convert_pydantic_messages_for_storage(messages: List[Any]) -> List[tuple]:
    """Convert Pydantic AI message objects to memory storage format."""
    messages_to_store = []

    for msg in messages:
        # Handle Pydantic AI ModelMessage objects
        if hasattr(msg, 'parts') and msg.parts:
            for part in msg.parts:
                if hasattr(part, 'content'):
                    content = part.content

                    if hasattr(msg, 'kind'):
                        if msg.kind == 'request':
                            role = 'USER' if part.part_kind == 'user-prompt' else 'SYSTEM'
                        elif msg.kind == 'response':
                            role = 'ASSISTANT'
                        else:
                            role = 'ASSISTANT'

                    message_tuple = (content, role)
                    messages_to_store.append(message_tuple)

    return messages_to_store

def store_pydantic_messages_in_memory(new_messages, memory_manager, actor_id, session_id):
    """Store Pydantic AI messages using the unified store_conversation method."""
    messages_to_store = convert_pydantic_messages_for_storage(new_messages)

    # Store each message pair using store_conversation
    for i in range(0, len(messages_to_store), 2):
        if i + 1 < len(messages_to_store):
            user_content = messages_to_store[i][0]
            assistant_content = messages_to_store[i + 1][0]

            memory_manager.store_conversation(
                user_input=user_content,
                response=assistant_content,
                actor_id=actor_id,
                session_id=session_id
            )
Enter fullscreen mode Exit fullscreen mode

This approach keeps the shared memory module clean and unified while handling Pydantic AI message format in the application-specific code.

Shared Memory Architecture

Like the other frameworks in this series, Pydantic AI uses the same unified memory management module. This architectural decision allows complete portability of memories across frameworks.

The shared memory.py module provides:

Core Components:

  • MemoryConfig class for centralized configuration management
  • MemoryManager class with framework-agnostic memory operations
  • retrieve_memories_for_actor() function for semantic search
  • format_memory_context() function for consistent formatting

Unified Interface:
All frameworks use the same store_conversation() method, with framework-specific message conversion handled in the application code:

def _convert_messages_for_storage(self, messages: List[Any]) -> List[Tuple[str, str]]:
    """Convert Pydantic AI messages to AgentCore format."""
    converted = []
    for msg in messages:
        if hasattr(msg, 'role') and hasattr(msg, 'content'):
            role = 'USER' if msg.role == 'user' else 'ASSISTANT'
            converted.append((str(msg.content), role))
    return converted
Enter fullscreen mode Exit fullscreen mode

This demonstrates how the shared architecture adapts to each framework's specific needs while maintaining the same core functionality. Memories created by a Pydantic AI agent can be accessed by agents built with Strands, CrewAI, or any other framework in the series.

Testing Locally

Before deploying to production, let's test locally. Configure the agent for AgentCore:

agentcore configure -n pydanticaiagent -e main.py
Enter fullscreen mode Exit fullscreen mode

Press Enter to accept all defaults. This creates the necessary AWS resources.

Launch the agent locally:

agentcore launch --local
Enter fullscreen mode Exit fullscreen mode

Test it with:

agentcore invoke --local '{ "prompt": "What did I say about fruit?" }'
Enter fullscreen mode Exit fullscreen mode

The agent should retrieve the sample memory about fruit preferences and respond concisely. Notice how the response is indeed one sentence, following the instruction we provided.

Test conversation continuity:

agentcore invoke --local '{ "prompt": "Tell me more about my preferences" }'
Enter fullscreen mode Exit fullscreen mode

The agent maintains context from the previous message and can elaborate on your preferences while staying concise.

Deploying to Production

Once local testing is complete, deploy to AWS:

agentcore launch
Enter fullscreen mode Exit fullscreen mode

AgentCore Runtime handles all the deployment complexity:

  • Building and pushing container images to Amazon ECR
  • Creating AWS Lambda functions with proper configuration
  • Setting up API Gateway endpoints
  • Configuring IAM permissions
  • Enabling CloudWatch logging

Check your deployment status:

agentcore status
Enter fullscreen mode Exit fullscreen mode

Test the production deployment:

agentcore invoke '{ "prompt": "What did I say about fruit?" }'
Enter fullscreen mode Exit fullscreen mode

Monitor logs in real-time:

aws logs tail /aws/bedrock-agentcore/runtimes/<AGENT_ID-ENDPOINT_ID> --follow
Enter fullscreen mode Exit fullscreen mode

What's Next

This Pydantic AI implementation demonstrates how type safety and simplicity can coexist in production AI systems. The minimal API surface area makes the code easy to understand and maintain, while the integration with AgentCore Memory provides the persistence needed for meaningful conversations.

What strikes me most about Pydantic AI is how it manages to be both simple and powerful. There's no magic, no hidden complexity—just clean Python code with predictable behavior. This predictability becomes invaluable as your system grows and other services start depending on your agent's outputs.

In the next article, I'll explore LlamaIndex, showing how to build agents with advanced retrieval capabilities and knowledge management. You'll see how the same memory architecture and deployment patterns adapt to a framework designed for working with large document collections.

The complete code is available on GitHub. I encourage you to experiment with adding Pydantic models for structured outputs, implementing custom validators, or building agents that return complex data types.

Ready to build your own type-safe AI agent? Clone the repo and start experimenting!

Top comments (0)