Danilo Poccia for AWS

Posted on Sep 15 • Edited on Sep 23

Building Production-Ready AI Agents with Strands Agents and Amazon Bedrock AgentCore

#ai #aws #bedrock #agentcore

In this first deep dive of our multi-framework series, I'll show you how to build a production-ready AI agent using Strands Agents and deploy using Amazon Bedrock AgentCore. The complete code for this implementation, along with examples for other frameworks, is available on GitHub at agentcore-multi-framework-examples.

Strands Agents embodies a model-driven philosophy that aligns perfectly with the rapid improvements in foundation models. Rather than imposing complex orchestration logic, it lets the model's capabilities drive the agent's behavior, resulting in cleaner, more maintainable code.

Strands has a hook-based architecture that provides an elegant way to extend agent functionality without cluttering the main logic. This makes it perfect for integrating with AgentCore's memory system—we can handle all the complexity of conversation persistence and memory extraction in dedicated hooks while keeping our agent code focused and clean.

Setting Up the Development Environment

Let's start by cloning the complete repository with all framework examples:

git clone https://github.com/danilop/agentcore-multi-framework-examples.git
cd agentcore-multi-framework-examples

Now let's set up the Strands Agents project. I'm using uv, a fast Python package installer, to manage dependencies:

cd agentcore-strands-agents
uv sync
source .venv/bin/activate

The project dependencies include:

strands-agents: The core framework for building our agent
strands-agents-tools: Community-provided tools like calculator
bedrock-agentcore: The SDK for integrating with AgentCore services
bedrock-agentcore-starter-toolkit: CLI tools for deployment

Creating and Configuring AgentCore Memory

Before we build our agent, let's set up AgentCore Memory. This service will store our agent's conversations and extract meaningful insights that persist across sessions.

Understanding Memory Strategies

AgentCore Memory provides three built-in strategies that automatically extract different types of information from conversations:

User Preferences: Captures recurring patterns in user behavior, interaction styles, and choices. For example, if a user consistently prefers detailed explanations or always asks for code examples, this gets stored as a preference.
Semantic Facts: Maintains knowledge of facts and domain-specific information. When users mention facts like "our company has 500 employees" or "the API endpoint is api.example.com", these get extracted and stored.
Session Summaries: Creates condensed representations of conversations. After each session, the system generates a summary capturing the main topics discussed, decisions made, and action items.

Creating the Memory Instance

I'll use the provided script to create a memory instance with all three strategies:

cd scripts
uv sync
uv run create-memory

This script creates a new AgentCore Memory instance and configures it with all three strategies. Here's what happens behind the scenes:

MEMORY_STRATEGIES = [
    {
        "userPreferenceMemoryStrategy": {
            "name": "UserPreferences",
            "namespaces": ["/actor/{actorId}/strategy/{memoryStrategyId}"]
        }
    },
    {
        "semanticMemoryStrategy": {
            "name": "SemanticFacts",
            "namespaces": ["/actor/{actorId}/strategy/{memoryStrategyId}/{sessionId}"]
        }
    },
    {
        "summaryMemoryStrategy": {
            "name": "SessionSummaries", 
            "namespaces": ["/actor/{actorId}/strategy/{memoryStrategyId}/{sessionId}"]
        }
    }
]

The namespaces organize memories hierarchically. Each actor (user) has their own isolated memory space, and within that, memories are organized by strategy and session. This provides data isolation between users while allowing memories to be shared across sessions for the same user.

The script saves the memory configuration to ../config/memory-config.json:

{
  "memory_id": "mem-abc123..."
}

Adding Sample Memory

To demonstrate how memory works, I'll add a sample memory event that we can retrieve later:

uv run add-sample-memory

This adds a simple user message to the memory: "I like apples but not bananas". The script stores this as a conversation event:

messages_to_store = [
    ("I like apples but not bananas", 'USER')
]

memory_client.create_event(
    memory_id=memory_id,
    actor_id="my-user-id",
    session_id="DEFAULT",
    messages=messages_to_store
)

Now when our agent is asked about fruit preferences later, it will be able to retrieve this memory even in a completely new session. This demonstrates the power of persistent memory—the agent remembers information from past interactions.

Don't forget to copy the configuration to our project directory:

cd ..
cp config/memory-config.json agentcore-strands-agents/
cd agentcore-strands-agents

Building the Agent with Hooks

Now let's build our agent. The architecture uses the hook system in Strands to cleanly separate memory management from the main agent logic.

Understanding the Hook System

Strands provides a powerful hook system that allows us to subscribe to lifecycle events and extend agent functionality without modifying the core logic. I've created two complementary hooks for memory management:

Hook 1: Short-Term Memory (ShortMemoryHook)

This hook handles conversation persistence within and across sessions. It subscribes to two events:

AgentInitializedEvent - Loading Conversation History

When the agent starts up, this hook retrieves previous conversation history and injects it into the agent's context:

class ShortMemoryHook(HookProvider):
    def register_hooks(self, registry: HookRegistry) -> None:
        registry.add_callback(AgentInitializedEvent, self.on_agent_initialized)
        registry.add_callback(MessageAddedEvent, self.on_message_added)

    def on_agent_initialized(self, event: AgentInitializedEvent) -> None:
        # Load conversation history when agent starts
        conversations = self.memory_client.get_last_k_turns(
            memory_id=self.memory_id,
            actor_id=event.agent.state.get("actor_id"),
            session_id=event.agent.state.get("session_id"),
            k=100  # Retrieve up to 100 previous conversation turns
        )

        if conversations:
            # Format conversation history for context
            context_messages = []
            for turn in reversed(conversations):
                for message in turn:
                    context_messages.append(f"{message['role']}: {message['content']}")

            # Inject into agent's system prompt
            event.agent.system_prompt += f"\n\nRecent conversation:\n{'\n'.join(context_messages)}"

The get_last_k_turns API retrieves previous conversation turns from AgentCore Memory. By injecting this into the system prompt, the agent maintains context even if the runtime session restarts or the user returns after a break.

MessageAddedEvent - Persisting New Messages

After each message is added to the conversation, this hook stores it in memory:

    def on_message_added(self, event: MessageAddedEvent) -> None:
        # Extract the last message
        last_message = event.agent.messages[-1]
        last_message_tuple = (json.dumps(last_message["content"]), last_message["role"])

        # Store in AgentCore Memory
        self.memory_client.create_event(
            memory_id=self.memory_id,
            actor_id=event.agent.state.get("actor_id"),
            session_id=event.agent.state.get("session_id"),
            messages=[last_message_tuple]
        )

The create_event API stores each message immediately, building the conversation history in real-time.

Hook 2: Long-Term Memory (LongTermMemoryHook)

This hook retrieves relevant memories from past sessions before each model invocation:

class LongTermMemoryHook(HookProvider):
    def register_hooks(self, registry: HookRegistry) -> None:
        registry.add_callback(BeforeInvocationEvent, self.on_before_invocation)

    def on_before_invocation(self, event: BeforeInvocationEvent) -> None:
        # Only process user messages
        last_message = event.agent.messages[-1]
        if last_message.get("role") != "USER":
            return

        user_query = last_message.get("content", "")

        # Semantic search for relevant memories
        retrieved_memories = retrieve_memories_for_actor(
            memory_id=self.memory_config.memory_id,
            actor_id=event.agent.state.get("actor_id"),
            search_query=user_query,
            memory_client=self.memory_client
        )

        if retrieved_memories:
            # Format and inject memories into context
            memory_context = format_memory_context(retrieved_memories)
            event.agent.system_prompt += f"\n\nRelevant long-term memory context:\n{memory_context}"

The RetrieveMemories operation performs semantic search across all stored memories. It finds the most relevant facts, preferences, and summaries based on the current query. This happens automatically before every model invocation, ensuring the agent always has access to relevant historical context.

The Main Agent Entry Point

from bedrock_agentcore import BedrockAgentCoreApp
from strands import Agent, tool
from strands_tools import calculator

app = BedrockAgentCoreApp()
agent = None

@app.entrypoint
def invoke(payload: Dict[str, Any], context: Optional[RequestContext] = None) -> Dict[str, Any]:
    """AI agent entrypoint"""   
    global agent

    actor_id = payload.get("actor_id", "my-user-id")
    session_id = context.session_id if context and context.session_id else payload.get("session_id", "DEFAULT")

    if agent is None:
        agent = create_agent(actor_id, session_id)

    user_message = payload.get("prompt", "Explain what you can do for me.")

    try:
        result = agent(user_message)
        return {"result": result.message}
    except Exception as e:
        logger.error("Error during agent invocation: %s", e)
        return {"error": "An error occurred while processing your request"}

def main():
    """Main entry point for the application."""
    app.run()

if __name__ == "__main__":
    main()

The BedrockAgentCoreApp handles all infrastructure concerns—HTTP server setup, request routing, and error handling. The @entrypoint decorator marks the function that AgentCore Runtime will invoke.

The app.run() at the bottom is crucial for local development. When executed directly (not deployed to AgentCore Runtime), it starts an HTTP server at http://localhost:8080 that listens for requests at the /invocations endpoint. This allows you to test your agent locally with the same interface it will have when deployed to production. The SDK automatically detects whether it's running locally or in a Docker container and configures itself appropriately.

Agentic Memory Retrieval: Beyond Automatic Context

While the LongTermMemoryHook provides automatic memory retrieval for every invocation (similar to standard RAG), I've also added a memory retrieval tool that enables agentic RAG capabilities:

@tool
def retrieve_memories(query: str) -> List[Dict[str, Any]]:
    """Retrieve memories from the memory client.
    Args:
        query: The search query to find relevant memories.
    Returns:
        A list of memories retrieved from the memory client.
    """
    actor_id = agent.state.get("actor_id")
    return retrieve_memories_for_actor(
        memory_id=memory_config.memory_id,
        actor_id=actor_id,
        search_query=query,
        memory_client=memory_client
    )

The @tool decorator makes this function available to the agent. But why have both automatic retrieval (the hook) and tool-based retrieval?

Understanding RAG in Our Memory System

Before diving into the differences, let's understand how our memory retrieval relates to RAG (Retrieval-Augmented Generation). RAG is a technique where an AI model retrieves relevant information from a knowledge base before generating its response, rather than relying solely on its training data. This retrieved information "augments" the generation process, providing fresh, relevant context.

In our implementation, AgentCore Memory acts as the knowledge base. When we retrieve memories—whether user preferences, semantic facts, or session summaries—we're essentially doing RAG. The memories are retrieved based on semantic similarity to the query, then injected into the agent's context (via the system prompt) to augment its response generation. This is exactly the RAG pattern, just applied to conversation memories rather than documents.

Standard RAG vs Agentic RAG

The key difference lies in who controls the retrieval process:

Standard RAG (via BeforeInvocationEvent hook):

Automatically retrieves memories for every query
Searches based on the user's direct input
Static, reactive approach—always follows the same pattern
Similar to traditional RAG systems where retrieval is hardcoded into the pipeline
The system decides what to retrieve based on fixed rules

Agentic RAG (via retrieve_memories tool):

The agent decides when to search memories
The agent determines what to search for, which may differ from the initial query
Dynamic, adaptive approach—the agent can:
- Skip retrieval if it already has sufficient context
- Search for related concepts not mentioned in the query
- Iteratively refine searches based on initial results
- Cross-reference multiple topics to build comprehensive understanding
- Reason about what information would be most helpful

For example, if a user asks "What should I cook for dinner?", the automatic retrieval might search for "dinner" memories. But with the agentic approach, the agent might decide to search for "dietary restrictions", then "favorite cuisines", and finally "ingredients on hand"—building a more complete picture through multiple targeted searches. The agent is reasoning about what information it needs, not just reacting to keywords.

This combination gives us the best of both worlds: guaranteed context from automatic retrieval (ensuring we never miss obvious relevant memories) plus the flexibility for the agent to explore memories strategically for complex reasoning tasks.

I also include the calculator tool from the strands-agents-tools package, demonstrating how easy it is to combine custom and pre-built tools.

Bringing It All Together

The agent creation combines hooks and tools:

def create_agent(actor_id: str, session_id: str) -> Agent:
    """Create and configure the agent with hooks and tools."""
    agent = Agent(
        hooks=[
            ShortMemoryHook(memory_id=memory_config.memory_id),
            LongTermMemoryHook(memory_id=memory_config.memory_id)
        ],
        tools=[calculator, retrieve_memories], 
        state={"actor_id": actor_id, "session_id": session_id}
    )
    return agent

Testing Locally

Before deploying to the cloud, let's test our agent locally. First, configure it for AgentCore:

agentcore configure -n strandsagent -e src/agentcore_strands_agents/agent.py

Press Enter to accept the defaults. This creates the necessary AWS resources like IAM roles and ECR repositories.

Now launch the agent locally:

agentcore launch --local

This starts a local container running your agent. In another terminal, test it:

agentcore invoke --local '{ "prompt": "What did I say about fruit?" }'

The agent should retrieve the sample memory we added earlier and respond with something like "You mentioned that you like apples but not bananas." This confirms that our memory system is working!

Try a follow-up question:

agentcore invoke --local '{ "prompt": "Based on my preferences, would I enjoy apple pie?" }'

The agent maintains context from the previous interaction and can reason about your preferences.

Deploying to Production

Once you're satisfied with local testing, deploying to AWS is simple:

agentcore launch

AgentCore Runtime handles all the complexity:

Building and pushing container images
Creating Lambda functions with proper networking
Setting up API Gateway endpoints
Configuring IAM permissions
Enabling CloudWatch logging

Check the deployment status:

agentcore status

This shows your endpoint ARN, CloudWatch logs location, and other deployment details.

Test the production deployment:

agentcore invoke '{ "prompt": "What did I say about fruit?" }'

The AgentCore starter toolkit automatically preserves the session, so further invocations continue the conversation:

agentcore invoke '{ "prompt": "Thanks for remembering that!" }'

Shared Memory Architecture

One key architectural decision I made was creating a unified memory management module that's identical across all framework implementations in this series. This memory.py module contains two classes and two standalone functions:

Classes:

MemoryConfig class: Manages centralized configuration
- __init__() method: Loads the memory configuration from JSON file
- memory_id property: Returns the configured memory ID
MemoryManager class: High-level interface for all memory operations
- get_memory_context() method: Retrieves both conversation history and relevant memories
- store_conversation() method: Saves user input and agent responses
- Additional helper methods for managing session state

Standalone Functions:

retrieve_memories_for_actor(): Performs semantic search across memory namespaces for a specific actor
format_memory_context(): Formats retrieved memories into consistent text for injection into prompts

By sharing this module across Strands Agents, CrewAI, Pydantic AI, LlamaIndex, and LangGraph implementations, I ensure consistency and portability. Memory created by one framework can be used by another, and improvements benefit all implementations. Each framework uses these components slightly differently—for example, Strands Agents primarily uses the standalone functions within its hooks, while other frameworks might instantiate the MemoryManager class directly.

Production Considerations

The combination of Strands Agents and AgentCore provides several production-ready features:

Security: Each session runs in an isolated microVM, preventing data leakage between users. The session isolation at infrastructure-level strengthens data privacy.

Scalability: AgentCore Runtime automatically scales based on demand, handling everything from a few requests to thousands of concurrent sessions.

Observability: Built-in CloudWatch integration provides logs, metrics, and traces for monitoring agent behavior and debugging issues.

Memory Persistence: Conversations and extracted insights persist beyond session boundaries, enabling truly personalized experiences.

Framework Flexibility: The clean separation between agent logic and infrastructure means you can evolve your agent implementation without changing deployment configuration.

Cleaning Up Resources

To delete the resources created by agentcore launch, I use the agentcore command in the Python virtual event:

agentcore destroy

This command deletes the AgentCore agent, the ECR images, the CodeBuild project, and the IAM roles used by the agent and by CodeBuild.

To delete the memory, including all stored events, the strategies, and the memories extracted form the events, I lookup the memory ID in the ../config/memory-config.json file and use the AWS CLI:

aws bedrock-agentcore-control delete-memory --memory-id <MEMORY_ID>

What's Next

This Strands Agents implementation demonstrates how to build a clean, maintainable agent with persistent memory and production-ready deployment. The hook-based architecture keeps concerns separated, making the code easy to test and evolve.

In the next article, I'll show how to build collaborative multi-agent systems with CrewAI, using the same AgentCore infrastructure and memory configuration. You'll see how different frameworks can leverage the same deployment patterns while bringing their unique strengths to the table.

The complete code is available on GitHub. I encourage you to explore the repository, experiment with the implementation, and see how AgentCore simplifies the journey from prototype to production.

Ready to build your own production AI agent? Clone the repo and start experimenting!

DEV Community