Matias Kreder for AWS Community Builders

Posted on Jul 6 • Edited on Nov 2

Three Ways to Build Multi-Agent Systems on AWS

#aws #genai #stepfunctions #bedrock

When building multi-agent AI systems, the architectural choices you make can dramatically impact performance, maintainability, and scalability. While Strands Agent is the "shiny new thing", we need to ask whether it’s always the best choice.

I set out to test different multi-agent architectural patterns by building the same system three different ways on AWS. I chose an HR Agent to evaluate resumes because it provides a perfect multi-step workflow that requires different types of AI reasoning and coordination.

The goal wasn’t to build the perfect HR system, but to understand the trade-offs between different multi-agent orchestration approaches.
I needed a use case that would effectively showcase different multi-agent coordination patterns. HR resume evaluation turned out to be ideal because it requires:

Multiple specialized tasks that benefit from different AI reasoning approaches (we could even use different LLM versions/providers)
Sequential and parallel processing opportunities
Complex data transformation from unstructured to structured formats
Coordination between agents with different areas of expertise
Real-world complexity without being overly domain-specific

The system needs to:

Parse resumes and extract structured information
Analyze job requirements and match them against candidates
Identify skill gaps and areas for development
Rate candidates numerically with detailed justification
Generate interview questions tailored to each candidate
Store everything in a structured, queryable format

Think of it as multiple AI specialists collaborating like a hiring committee, but the real focus is on how they coordinate and communicate.

Architecture #1: Step Functions – The Orchestrated Pipeline

Best for: Complex workflows with detailed monitoring needs and low latency

The Step Functions approach treats resume evaluation like a manufacturing pipeline. Each step is a specialized Lambda function that performs one specific task, with AWS Step Functions orchestrating the entire workflow.

Why I Love This Approach

Low Latency: Provides is a tiny lightweight agent management layer that reduces complexity and processing time.
Crystal clear workflow: You can see each step executing in the AWS console
Easy debugging: When something breaks, you know exactly which step failed
Granular monitoring: Each function can be optimized independently
Familiar patterns: While most developers wont consider StepFunctions to build an AI agent, it is a good fit because many of them are familiar with this tool.

The Trade-offs

More infrastructure: 6+ Lambda functions to manage
State management: Data flows between functions via JSON
Less flexibility: The workflow is relatively rigid

Perfect for: Low latency workflows and teams that want predictable, monitorable workflows and don't mind managing multiple functions.

Architecture #2: Bedrock Agents – The AI-Native Approach

Best for: AI-first teams who want Amazon's managed AI collaboration

This approach uses Amazon Bedrock Agents with a supervisor-collaborator pattern. A supervisor agent coordinates with specialized agents (Resume Parser, Job Analyzer, Skills Evaluator, etc.) to complete the evaluation.

Why This Feels Like the Future

AI-native design: Built specifically for multi-agent AI workflows
Managed complexity: Amazon handles agent coordination and communication
Rich agent interactions: Agents can have sophisticated conversations
Built-in monitoring: Bedrock console shows agent traces and interactions

The Reality Check

AWS-specific: You're locked into Amazon's agent framework
Learning curve: New concepts and debugging approaches
Cost considerations: Bedrock usage can add up with complex workflows

Perfect for: Teams building AI-first applications that want to leverage Amazon's managed AI services.

Architecture #3: Strands Agents – The Powerhouse Framework

Best for: Maximum flexibility and advanced multi-agent capabilities

The Strands approach uses the open-source Strands Agents SDK, where agents communicate in natural language and dynamically adapt their collaboration patterns. While it takes longer to process, this is where the real multi-agent magic happens. I initially used Lambda to deploy this agent by I recently updated it to Bedrock Agentcore.

What Makes This Special

Natural communication: Agents talk to each other like humans
Adaptive workflows: The system adjusts based on what it finds
Deep dives: Extended processing time enables sophisticated reasoning
Framework agnostic: Not tied to any specific cloud provider
Simplified architecture: Typically runs on a Lambda function or an ECS task

The Hidden Power

We're only using a fraction of Strands' capabilities in this implementation. The framework supports:

Dynamic tool integration during runtime
Complex multi-agent negotiations and decision-making
Adaptive workflow modification based on intermediate results
Advanced memory and context management across agent interactions
Custom agent personalities and specialized reasoning patterns

The Considerations

Longer processing: Takes 5–15 minutes, but enables deeper analysis. Not a good choice if you want to build a low-latency agent.
More complex setup: Requires proper dependency management and layer configuration
Resource intensive: Needs more memory and processing time
Learning curve: Understanding the full framework takes time

Perfect for: Teams that want to push the boundaries of multi-agent systems and are comfortable with complexity.

The Results: Architecture Matters More Than You Think

What surprised me most: all three approaches produce identical evaluation quality when using the same prompts and AI model.

The real differences are in:

Development experience and debugging
Operational complexity and monitoring
Processing time and resource utilization
Flexibility for future changes and integrations
Vendor lock-in and LLM portability

LLM Flexibility: A Critical Consideration

One of the most important components is LLM portability.

Strands and Step Functions: LLM Agnostic

Both implementations can easily pivot to different LLMs:

OpenAI GPT models via API calls
Anthropic Claude via direct API
Open-source models like Llama, Mistral, or custom fine-tuned models
Local models running on your infrastructure

This flexibility allows you to choose the best model for your use case, optimize costs, or even run workloads offline.

Bedrock Agents: AWS Ecosystem Limitation

The Bedrock Agents implementation is tied to AWS-supported models:

Limited to the Bedrock catalog – can't use models AWS doesn't support
Regional limitations based on model availability
Manual guardrail configuration you must explicitly set up guardrails, unlike other platforms which enforce some defaults automatically

This flexibility gap becomes crucial for:

Cost optimization across providers
Performance tuning with specialized models
Future-proofing against vendor changes or pricing shifts

Performance Comparison

Aspect	Step Functions	Bedrock Agents	Strands Agents
Processing Time	< 1 minute	2–5 minutes	2-5 minutes minutes*
Setup Complexity	Medium	Low	High
Debugging	Excellent	Not Great	Good
Multi-Agent Flexibility	Low	Medium	Very High
LLM Portability	High	Low	High
Cost	Low	Medium	Medium
Architecture Complexity	High	Medium	Low

*Strands' longer processing time enables sophisticated multi-agent reasoning that we're not fully utilizing in this implementation.

Real-World Lessons Learned

1. Architecture choice impacts more than performance

The biggest differences aren't in output quality, but in operational characteristics, vendor flexibility, and future extensibility.

2. Strands is a sleeping giant

We’re using maybe 30% of what Strands can do. The framework supports dynamic tool integration, complex agent negotiations, and adaptive workflows.

3. Bedrock Agents trade flexibility for simplicity

Great for getting started quickly, but debugging issues when something fails is hard.

4. Step Functions remain the reliable choice

When you need low-latency, predictable, debuggable workflows and don't mind managing multiple functions, it's hard to beat.

Which Multi-Agent Architecture Should You Choose?

Choose Step Functions if:

You want low-latency, predictable, monitorable workflows
Your team is comfortable with traditional serverless patterns
You need fast processing times and clear debugging
LLM flexibility is important for your use case
You prefer proven, stable architectural patterns

Choose Bedrock Agents if:

You're building AI-first applications within the AWS ecosystem
You want Amazon to handle multi-agent complexity
You're already using other Bedrock services
You prefer managed services over custom implementations

Choose Strands Agents if:

You need maximum multi-agent capabilities and flexibility
You want to explore cutting-edge AI coordination patterns
LLM portability and vendor independence are priorities
You're okay with longer processing times for deeper reasoning
You just want your agent to run with no cloud configuration using an AWS cli tool (agentcore)

The Code

All three implementations are available in my AWS Agents repository, including complete deployment scripts, sample data, and documentation. Each approach includes:

Complete SAM templates for one-click deployment
Sample resumes and job descriptions for testing
Comprehensive monitoring and logging
Identical evaluation quality across all approaches

Sample Output

All three implementations store the outputs in DynamoDB. A sample output looks like this:

{
  "id": "a5ec67f5-3e3f-4db3-8e68-d7127b9131f3",
  "name": "Sarah Smith Data Scientist.Txt",
  "resume_key": "resumes/sarah_smith_data_scientist.txt",
  "status": "completed",
  "job_title": "AI Engineer Position",
  "completed_at": "2025-06-29T18:05:58.708418",

  "candidate_rating": {
    "rating": 2,
    "job_fit": "Sarah would be a fair fit for a junior or mid-level AI engineering role but does not meet the requirements for the senior position. Her statistical background and basic ML experience provide a foundation to build upon, but she would need significant mentoring and development in deep learning, MLOps, containerization, and production deployment before being ready for a senior AI Engineer role.",
    "strengths": [
      "Strong educational background in Statistics and Mathematics with ML coursework",
      "Solid foundation in Python programming and data analysis",
      "Experience with basic ML algorithms and statistical modeling",
      "Some database knowledge (PostgreSQL) as required",
      "Collaborative experience working with product teams",
      "Good data visualization skills that would be useful for stakeholder communication"
    ],
    "weaknesses": [
      "Insufficient experience (2 years vs. required 3+ years)",
      "Limited experience with required deep learning frameworks (only project-level TensorFlow, no PyTorch)",
      "No experience with containerization (Docker, Kubernetes) or MLOps practices",
      "Limited cloud platform expertise beyond basic AWS knowledge",
      "No production-level AI system deployment experience",
      "Lack of experience with big data technologies (Spark, Hadoop)",
      "No demonstrated experience in model monitoring or CI/CD pipelines"
    ]
  },

  "evaluation_results": {
    "job_match_analysis": {
      "overall_fit": "Partial match - Junior to mid-level candidate applying for senior role",
      "recommendation": "Consider for a mid-level AI Engineer position rather than senior role. The candidate shows promise but lacks the depth of experience and technical breadth required for a senior position. Would benefit from mentorship and exposure to production ML systems, containerization, and MLOps practices."
    },
    "technical_expertise": {
      "programming": {
        "alignment": "Partial match - Strong in Python and SQL as required, but no Java",
        "depth": "Moderate - 2 years professional experience with Python"
      },
      "ml_frameworks": {
        "alignment": "Partial match - Experience with Scikit-learn but limited exposure to TensorFlow and no PyTorch mentioned",
        "depth": "Basic - Primary experience with traditional ML algorithms rather than deep learning"
      },
      "cloud_platforms": {
        "alignment": "Minimal match - Only basic AWS knowledge mentioned",
        "depth": "Limited - Only mentions S3 and EC2, no SageMaker or other ML-specific services"
      }
    }
  },

  "gaps_analysis": {
    "skill_mismatches": {
      "issues": [
        "Claims to be a Machine Learning Engineer but experience seems more aligned with Data Analyst/Scientist role",
        "Lists TensorFlow in project but not in skills section",
        "Claims 'Basic AWS knowledge' but doesn't demonstrate cloud implementation experience"
      ]
    },
    "overall_concerns": {
      "potential_under_qualification": "Limited professional experience (2 years) for roles requiring more extensive background. Experience appears more aligned with junior data scientist rather than machine learning engineer positions."
    }
  },

  "interview_notes": {
    "technical_questions": [
      "Can you walk me through your experience with TensorFlow beyond the stock price prediction project? What specific neural network architectures have you implemented?",
      "How would you approach deploying a machine learning model to a production environment? What tools and practices would you use for model monitoring and maintenance?",
      "What experience do you have with containerization technologies like Docker and Kubernetes? How have you used them in ML workflows?"
    ],
    "concerns_to_address": [
      "Experience gap (2 years vs. required 3+ years for senior role)",
      "Limited experience with deep learning frameworks beyond project work",
      "No demonstrated experience with containerization or MLOps",
      "Limited cloud platform expertise beyond basic AWS services"
    ],
    "general_notes": [
      "Candidate has strong educational background but lacks senior-level experience",
      "Consider for mid-level position rather than senior role",
      "Strong in statistics and traditional ML but gaps in deep learning and MLOps",
      "Would benefit from mentorship in production ML systems and DevOps practices"
    ]
  }
}

Conclusion

The future of multi-agent AI systems is incredibly exciting. After testing these three approaches, I'm convinced that the choice of coordination pattern matters more than the specific use case. Whether you're building HR automation, document processing, or any other multi-step AI workflow, these patterns offer strong foundations for production-ready systems.

The key takeaway? Each architecture comes with trade-offs that go far beyond performance. Step Functions is not an obvious choice, but it is a highly reliable approach for orchestrating multi-agent flows, especially when clarity and debugging are important. Bedrock Agents provide a managed experience that is great for fast prototyping, although troubleshooting can be difficult when issues arise. Strands offers unmatched reasoning and flexibility, but its longer processing time and higher resource requirements often lead to running it in ECS, where scaling may become a different challenge.

Bottom line, choosing the right approach is not just about the technology itself; it is about how much control, flexibility, and complexity your team is prepared to handle.