Mindy Jen

Posted on Mar 13 • Edited on Mar 21

From Zero to Agent (Part-II)

#aws #strands

Build production-ready agents via Strands SDK, an open-source framework on AWS

Step 6: Safety at Scale with Amazon Bedrock Guardrails

Enterprise agents need guardrails, not optional, not nice-to-have. This step integrates Amazon Bedrock Guardrails directly into a Strands agent to build a customer support assistant that cannot be coerced into giving financial advice, responding to hate speech, or leaking PII.

Guardrail configuration covers four pillars:

Topic policies — deny specific conversation topics (e.g., fiduciary advice)
Content policies — filter hate, violence, sexual content, prompt injection
Word policies — block specific phrases and managed profanity lists
Blocked messaging — custom messages shown when guardrails fire

Integration with Strands requires only wiring the guardrail to the BedrockModel:

bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    guardrail_id=guardrail_id,
    guardrail_version="DRAFT",
    guardrail_trace="enabled",
    guardrail_redact_input=True,
    guardrail_redact_input_message="Guardrail Intervened and Redacted"
)
agent = Agent(model=bedrock_model, tools=[...])

A clever SDK behavior: when a guardrail fires, the offending user input is automatically overwritten in conversation history with a neutral placeholder, so follow-up turns are not contaminated by the blocked content.

Key Takeaway: Guardrails are a first-class citizen in Strands. A single BedrockModel configuration change wraps your entire agent with enterprise-grade safety. The automatic input redaction in conversation history is a subtle but critical feature for stateful chat applications.

Step 7: Persistent Memory Across Sessions

Stateless agents forget everything between conversations. This step adds persistent memory using Mem0, wiring it as a built-in tool (mem0_memory) from strands-agents-tools. The agent can store, retrieve, and list memories across sessions, enabling genuine personalization.

from strands_tools import mem0_memory
memory_agent = Agent(
    system_prompt=SYSTEM_PROMPT,
    tools=[mem0_memory, websearch],
)
# Store a preference
memory_agent.tool.mem0_memory(action="store", content="I prefer tea over coffee.", user_id=USER_ID)
# Retrieve it later
memory_agent.tool.mem0_memory(action="retrieve", query="drink preferences", user_id=USER_ID)

The memory backend is configurable: OpenSearch Serverless (recommended for AWS deployments), FAISS (for local development with no external dependencies), or the Mem0 Platform API (fully managed SaaS).

Key Takeaway: Long-term personalization doesn’t require a custom database schema. The mem0_memory built-in tool handles storage, semantic retrieval, and listing with three actions: store, retrieve, and list. Pair it with OpenSearch Serverless for a scalable, serverless memory layer.

Step 8: Observability and Evaluation with LangFuse and RAGAS

You can’t improve what you can’t measure. This step builds a full observability and evaluation pipeline around the restaurant agent, combining LangFuse for distributed tracing and RAGAS for LLM-as-a-judge evaluation, all closing the loop by pushing scores back into LangFuse.

Tracing is enabled via OpenTelemetry — just set a few environment variables and Strands emits structured traces automatically:

os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = langfuse_endpoint
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {auth_token}"

agent = Agent(
    model=model,
    tools=[retrieve, current_time, ...],
    trace_attributes={
        "session.id": "abc-1234",
        "user.id": "user@domain.com",
        "langfuse.tags": ["Agent-SDK", "Observability"]
    }
)

No code changes needed inside the agent. Every tool call, model inference, and reasoning step appears as a span in LangFuse.

Evaluation uses RAGAS metrics, powered by Amazon Nova Premier as the judge LLM. The lab defines two types of custom metrics:

AspectCritic (binary, 0 or 1): evaluates whether the response satisfies a specific criterion — request completeness, brand voice, tool usage effectiveness.
RubricsScore (discrete multi-level): evaluates nuanced behavior like recommendation quality, scoring -1 (wrong behavior), 0 (not applicable), or +1 (correct behavior).

For RAG-specific turns, two additional metrics run: ContextRelevance (are retrieved documents pertinent to the query?) and ResponseGroundedness (is the answer actually grounded in retrieved context?).

The pipeline fetches traces from LangFuse, converts them into RAGAS SingleTurnSample or MultiTurnSample objects depending on whether retrieved context is present, runs evaluation, and writes scores back to each trace:

langfuse.create_score(
    trace_id=trace_id,
    name="rag_context_relevance",
    value=0.92
)

The result is a closed feedback loop: every agent interaction is traced, evaluated, and scored in one place.

Key Takeaway: Strands emits OTEL traces with zero code changes — just set env vars. Pair LangFuse for tracing with RAGAS for evaluation, and you get a production-grade observability stack that tells you not just what the agent did, but how good it was.

Step 9-1: Agents as Tools, Hierarchical Multi-Agent Architecture

Single agents become unwieldy at scale. This step introduces the “Agents as Tools” pattern: specialized sub-agents are wrapped with @tool and handed to an orchestrator agent, which routes queries to whichever specialist is most appropriate.

@tool
def research_assistant(query: str) -> str:
    """Process and respond to research-related queries."""
    agent = Agent(system_prompt=RESEARCH_ASSISTANT_PROMPT)
    return str(agent(query))
@tool
def trip_planning_assistant(query: str) -> str:
    """Create travel itineraries and provide travel advice."""
    agent = Agent(system_prompt=TRAVEL_AGENT_PROMPT)
    return str(agent(query))

orchestrator = Agent(
    system_prompt=MAIN_SYSTEM_PROMPT,
    tools=[research_assistant, product_recommendation_assistant, trip_planning_assistant, file_write],
)

The orchestrator reads the user’s intent and delegates to the right specialist, or calls multiple specialists in parallel for compound queries. The lab also shows a sequential pipeline pattern: feed the output of a research agent directly into a summarization agent, chaining their reasoning step by step.

Key Takeaway: The @tool decorator on a function that instantiates an Agent is all it takes to create a hierarchical multi-agent system. The orchestrator sees sub-agents as just another tool, and the complexity is hidden. This pattern enforces the separation of concerns and makes individual specialists independently testable and replaceable.

Step 9-2: Swarm Intelligence, Collaborative Multi-Agent Systems

The final step explores Swarm, Strands’ built-in multi-agent coordination primitive. Unlike the hierarchical pattern in Step 9-1, where an orchestrator directs specialists, a Swarm enables autonomous, peer-to-peer coordination: agents hand off to each other dynamically, based on expertise and context, without a central controller.

from strands.multiagent import Swarm
swarm = Swarm(
    [research_agent, creative_agent, critical_agent, summarizer_agent],
    max_handoffs=20,
    execution_timeout=900.0,
)

result = swarm("Create a blog post explaining Agentic AI, then a social media summary.")
print(result.status)
print(f"Agents involved: {[n.node_id for n in result.node_history]}")

Each agent in the swarm shares full task context, can see the history of prior agent contributions, and autonomously decides when to hand off. Safety mechanisms prevent ping-pong loops between agents (repetitive_handoff_detection_window). The result object exposes per-agent outputs, total iterations, execution time, and token usage.

You can also use the swarm built-in tool, letting a regular agent dynamically stand up and run a swarm on demand, purely through natural language:

agent = Agent(tools=[swarm])
agent("Use a swarm of 4 agents to analyze market trends in generative AI.")

Key Takeaway: Swarms produce emergent collective intelligence and the group consistently outperforms any single agent on complex, multi-faceted tasks. The Strands Swarm class handles all coordination mechanics. Use it when tasks genuinely require diverse expertise that cannot be captured in a single system prompt.

Deployment of Strands Agents to Lambda

Let's take the restaurant agent from a notebook to a production serverless deployment using AWS Lambda and the AWS CDK (TypeScript). The pattern is straightforward: package the Strands agent as a Docker-based Lambda function, provision the infrastructure with CDK, and invoke it via the AWS SDK or CLI.

The Lambda handler reads a session_id from the event payload and uses it to load or persist agent state (the full message history) in S3:

def handler(event, _context):
    prompt = event.get('prompt')
    session_id = event.get('session_id')
    agent = get_agent_object(key=f"sessions/{session_id}.json")
    if not agent:
        agent = create_agent()
    response = agent(prompt)
    put_agent_object(key=f"sessions/{session_id}.json", agent=agent)
    return str(response)

This gives you stateful multi-turn conversations over a stateless compute platform — S3 acts as the session store. The CDK stack provisions a Docker Lambda, an S3 bucket for session state, an access-log bucket, and all necessary IAM permissions for Bedrock, DynamoDB, and SSM Parameter Store.

Invocation is a standard Lambda call:

aws lambda invoke --function-name StrandsAgent-agent-function \
  --payload '{"prompt": "Where can I eat in SF?", "session_id": "abc-123"}' \
  output.json

Key Takeaway: AWS Lambda is the fastest path to a serverless Strands agent in production. Serialize agent.messages to S3 keyed by session ID and you get multi-turn memory for free. CDK handles all infrastructure, and the Docker Lambda runtime sidesteps dependency conflicts with a clean container image.

Deployment of Strands Agents to Fargate

Here shows the container-native alternative: deploying the same restaurant agent as a long-running FastAPI service on AWS Fargate, fronted by an Application Load Balancer. This pattern suits agents that need persistent connections, streaming responses over HTTP, or higher throughput than Lambda’s concurrency model allows.

The FastAPI app exposes two endpoints:

POST /invoke/{session_id} — standard request/response
POST /invoke-streaming/{session_id} — streams tokens to the client as they arrive using agent.stream_async and FastAPI's StreamingResponse

@app.post('/invoke-streaming/{session_id}')
async def get_invoke_streaming(session_id: str, request: PromptRequest):
    return StreamingResponse(
        run_agent_and_stream_response(request.prompt, session_id),
        media_type="text/plain"
    )
async def run_agent_and_stream_response(prompt, session_id):
    agent = get_agent_object(key=f"sessions/{session_id}.json")
    if not agent:
        agent = create_agent()
    async for item in agent.stream_async(prompt):
        if "data" in item:
            yield item['data']
    put_agent_object(key=f"sessions/{session_id}.json", agent=agent)

The CDK stack provisions a VPC across two AZs, an ECS Fargate cluster, a task definition built from your local Dockerfile (ARM64), an ALB with health checks, VPC flow logs, and IAM roles scoped to Bedrock, DynamoDB, SSM, and S3. The service runs two replicas by default for high availability.

Key Takeaway: Fargate is the right deployment target when you need real-time streaming, persistent HTTP connections, or fine-grained container control. The /invoke-streaming endpoint pairs perfectly with the stream_async pattern from Lab 5 — what you learned about streaming callbacks becomes a production HTTP API. Session state in S3 keeps the architecture stateless at the compute layer while preserving conversation history.

Closing Thoughts

From a 5-line “Hello World” agent to a swarm of collaborating specialists, all within the same SDK and the same Agent abstraction. A few things stand out after going through the full spectrum:

The model provider abstraction is real. Swapping Bedrock for Ollama, or Claude for Nova, is genuinely a one-line change. This is not just marketing.
MCP changes the tooling story. The ability to connect any MCP-compatible server as a tool without writing glue code is a significant productivity multiplier.
Safety is not bolted on. Guardrails, memory redaction, and structured tool schemas are all first-class SDK features, not afterthoughts.
Multi-agent patterns are accessible. The same @tool decorator that wraps a calculator also wraps a fully-featured specialist agent. Hierarchical orchestration and swarm intelligence are within reach of any developer who can write a Python function.

DEV Community