DEV Community

Cover image for Three Ways to Build Multi-Agent Systems on AWS

Three Ways to Build Multi-Agent Systems on AWS

When building multi-agent AI systems, the architectural choices you make can dramatically impact performance, maintainability, and scalability. While Strands Agent is the "shiny new thing", we need to ask whether it’s always the best choice.

I set out to test different multi-agent architectural patterns by building the same system three different ways on AWS. I chose an HR Agent to evaluate resumes because it provides a perfect multi-step workflow that requires different types of AI reasoning and coordination.

The goal wasn’t to build the perfect HR system, but to understand the trade-offs between different multi-agent orchestration approaches.
I needed a use case that would effectively showcase different multi-agent coordination patterns. HR resume evaluation turned out to be ideal because it requires:

  • Multiple specialized tasks that benefit from different AI reasoning approaches (we could even use different LLM versions/providers)
  • Sequential and parallel processing opportunities
  • Complex data transformation from unstructured to structured formats
  • Coordination between agents with different areas of expertise
  • Real-world complexity without being overly domain-specific

The system needs to:

  • Parse resumes and extract structured information
  • Analyze job requirements and match them against candidates
  • Identify skill gaps and areas for development
  • Rate candidates numerically with detailed justification
  • Generate interview questions tailored to each candidate
  • Store everything in a structured, queryable format

Think of it as multiple AI specialists collaborating like a hiring committee, but the real focus is on how they coordinate and communicate.

Architecture #1: Step Functions – The Orchestrated Pipeline

Step Functions Architecture Diagram

Best for: Complex workflows with detailed monitoring needs and low latency

The Step Functions approach treats resume evaluation like a manufacturing pipeline. Each step is a specialized Lambda function that performs one specific task, with AWS Step Functions orchestrating the entire workflow.

Why I Love This Approach

  • Low Latency: Provides is a tiny lightweight agent management layer that reduces complexity and processing time.
  • Crystal clear workflow: You can see each step executing in the AWS console
  • Easy debugging: When something breaks, you know exactly which step failed
  • Granular monitoring: Each function can be optimized independently
  • Familiar patterns: While most developers wont consider StepFunctions to build an AI agent, it is a good fit because many of them are familiar with this tool.

The Trade-offs

  • More infrastructure: 6+ Lambda functions to manage
  • State management: Data flows between functions via JSON
  • Less flexibility: The workflow is relatively rigid

Perfect for: Low latency workflows and teams that want predictable, monitorable workflows and don't mind managing multiple functions.

Architecture #2: Bedrock Agents – The AI-Native Approach

Bedrock Agents Architecture Diagram

Best for: AI-first teams who want Amazon's managed AI collaboration

This approach uses Amazon Bedrock Agents with a supervisor-collaborator pattern. A supervisor agent coordinates with specialized agents (Resume Parser, Job Analyzer, Skills Evaluator, etc.) to complete the evaluation.

Why This Feels Like the Future

  • AI-native design: Built specifically for multi-agent AI workflows
  • Managed complexity: Amazon handles agent coordination and communication
  • Rich agent interactions: Agents can have sophisticated conversations
  • Built-in monitoring: Bedrock console shows agent traces and interactions

The Reality Check

  • AWS-specific: You're locked into Amazon's agent framework
  • Learning curve: New concepts and debugging approaches
  • Cost considerations: Bedrock usage can add up with complex workflows

Perfect for: Teams building AI-first applications that want to leverage Amazon's managed AI services.

Architecture #3: Strands Agents – The Powerhouse Framework

Strands Agent Architecture Diagram

Best for: Maximum flexibility and advanced multi-agent capabilities

The Strands approach uses the open-source Strands Agents SDK, where agents communicate in natural language and dynamically adapt their collaboration patterns. While it takes longer to process, this is where the real multi-agent magic happens. On this example, I have used Lambda, but due to long processing times, ECS would be the most appropriate service to run it.

What Makes This Special

  • Natural communication: Agents talk to each other like humans
  • Adaptive workflows: The system adjusts based on what it finds
  • Deep dives: Extended processing time enables sophisticated reasoning
  • Framework agnostic: Not tied to any specific cloud provider
  • Simplified architecture: Typically runs on a Lambda function or an ECS task

The Hidden Power

We're only using a fraction of Strands' capabilities in this implementation. The framework supports:

  • Dynamic tool integration during runtime
  • Complex multi-agent negotiations and decision-making
  • Adaptive workflow modification based on intermediate results
  • Advanced memory and context management across agent interactions
  • Custom agent personalities and specialized reasoning patterns

The Considerations

  • Longer processing: Takes 5–15 minutes, but enables deeper analysis. Not a good choice if you want to build a low-latency agent.
  • More complex setup: Requires proper dependency management and layer configuration
  • Resource intensive: Needs more memory and processing time
  • Learning curve: Understanding the full framework takes time

Perfect for: Teams that want to push the boundaries of multi-agent systems and are comfortable with complexity.

The Results: Architecture Matters More Than You Think

What surprised me most: all three approaches produce identical evaluation quality when using the same prompts and AI model.

The real differences are in:

  • Development experience and debugging
  • Operational complexity and monitoring
  • Processing time and resource utilization
  • Flexibility for future changes and integrations
  • Vendor lock-in and LLM portability

LLM Flexibility: A Critical Consideration

One of the most important components is LLM portability.

Strands and Step Functions: LLM Agnostic

Both implementations can easily pivot to different LLMs:

  • OpenAI GPT models via API calls
  • Anthropic Claude via direct API
  • Open-source models like Llama, Mistral, or custom fine-tuned models
  • Local models running on your infrastructure

This flexibility allows you to choose the best model for your use case, optimize costs, or even run workloads offline.

Bedrock Agents: AWS Ecosystem Limitation

The Bedrock Agents implementation is tied to AWS-supported models:

  • Limited to the Bedrock catalog – can't use models AWS doesn't support
  • Regional limitations based on model availability
  • Manual guardrail configuration you must explicitly set up guardrails, unlike other platforms which enforce some defaults automatically

This flexibility gap becomes crucial for:

  • Cost optimization across providers
  • Performance tuning with specialized models
  • Future-proofing against vendor changes or pricing shifts

Performance Comparison

Aspect Step Functions Bedrock Agents Strands Agents
Processing Time < 1 minute 2–5 minutes 5–15 minutes*
Setup Complexity Medium Low High
Debugging Excellent Not Great Good
Multi-Agent Flexibility Low Medium Very High
LLM Portability High Low High
Cost Low Medium Medium
Architecture Complexity High Medium Low

*Strands' longer processing time enables sophisticated multi-agent reasoning that we're not fully utilizing in this implementation.

Real-World Lessons Learned

1. Architecture choice impacts more than performance

The biggest differences aren't in output quality, but in operational characteristics, vendor flexibility, and future extensibility.

2. Strands is a sleeping giant

We’re using maybe 30% of what Strands can do. The framework supports dynamic tool integration, complex agent negotiations, and adaptive workflows.

3. Bedrock Agents trade flexibility for simplicity

Great for getting started quickly, but debugging issues when something fails is hard.

4. Step Functions remain the reliable choice

When you need low-latency, predictable, debuggable workflows and don't mind managing multiple functions, it's hard to beat.

Which Multi-Agent Architecture Should You Choose?

Choose Step Functions if:

  • You want low-latency, predictable, monitorable workflows
  • Your team is comfortable with traditional serverless patterns
  • You need fast processing times and clear debugging
  • LLM flexibility is important for your use case
  • You prefer proven, stable architectural patterns

Choose Bedrock Agents if:

  • You're building AI-first applications within the AWS ecosystem
  • You want Amazon to handle multi-agent complexity
  • You're already using other Bedrock services
  • You prefer managed services over custom implementations

Choose Strands Agents if:

  • You need maximum multi-agent capabilities and flexibility
  • You want to explore cutting-edge AI coordination patterns
  • LLM portability and vendor independence are priorities
  • You're okay with longer processing times for deeper reasoning
  • You just want your agent to run in a lambda or an ECS task

The Code

All three implementations are available in my AWS Agents repository, including complete deployment scripts, sample data, and documentation. Each approach includes:

  • Complete SAM templates for one-click deployment
  • Sample resumes and job descriptions for testing
  • Comprehensive monitoring and logging
  • Identical evaluation quality across all approaches

Sample Output

All three implementations store the outputs in DynamoDB. A sample output looks like this:

{
  "id": "a5ec67f5-3e3f-4db3-8e68-d7127b9131f3",
  "name": "Sarah Smith Data Scientist.Txt",
  "resume_key": "resumes/sarah_smith_data_scientist.txt",
  "status": "completed",
  "job_title": "AI Engineer Position",
  "completed_at": "2025-06-29T18:05:58.708418",

  "candidate_rating": {
    "rating": 2,
    "job_fit": "Sarah would be a fair fit for a junior or mid-level AI engineering role but does not meet the requirements for the senior position. Her statistical background and basic ML experience provide a foundation to build upon, but she would need significant mentoring and development in deep learning, MLOps, containerization, and production deployment before being ready for a senior AI Engineer role.",
    "strengths": [
      "Strong educational background in Statistics and Mathematics with ML coursework",
      "Solid foundation in Python programming and data analysis",
      "Experience with basic ML algorithms and statistical modeling",
      "Some database knowledge (PostgreSQL) as required",
      "Collaborative experience working with product teams",
      "Good data visualization skills that would be useful for stakeholder communication"
    ],
    "weaknesses": [
      "Insufficient experience (2 years vs. required 3+ years)",
      "Limited experience with required deep learning frameworks (only project-level TensorFlow, no PyTorch)",
      "No experience with containerization (Docker, Kubernetes) or MLOps practices",
      "Limited cloud platform expertise beyond basic AWS knowledge",
      "No production-level AI system deployment experience",
      "Lack of experience with big data technologies (Spark, Hadoop)",
      "No demonstrated experience in model monitoring or CI/CD pipelines"
    ]
  },

  "evaluation_results": {
    "job_match_analysis": {
      "overall_fit": "Partial match - Junior to mid-level candidate applying for senior role",
      "recommendation": "Consider for a mid-level AI Engineer position rather than senior role. The candidate shows promise but lacks the depth of experience and technical breadth required for a senior position. Would benefit from mentorship and exposure to production ML systems, containerization, and MLOps practices."
    },
    "technical_expertise": {
      "programming": {
        "alignment": "Partial match - Strong in Python and SQL as required, but no Java",
        "depth": "Moderate - 2 years professional experience with Python"
      },
      "ml_frameworks": {
        "alignment": "Partial match - Experience with Scikit-learn but limited exposure to TensorFlow and no PyTorch mentioned",
        "depth": "Basic - Primary experience with traditional ML algorithms rather than deep learning"
      },
      "cloud_platforms": {
        "alignment": "Minimal match - Only basic AWS knowledge mentioned",
        "depth": "Limited - Only mentions S3 and EC2, no SageMaker or other ML-specific services"
      }
    }
  },

  "gaps_analysis": {
    "skill_mismatches": {
      "issues": [
        "Claims to be a Machine Learning Engineer but experience seems more aligned with Data Analyst/Scientist role",
        "Lists TensorFlow in project but not in skills section",
        "Claims 'Basic AWS knowledge' but doesn't demonstrate cloud implementation experience"
      ]
    },
    "overall_concerns": {
      "potential_under_qualification": "Limited professional experience (2 years) for roles requiring more extensive background. Experience appears more aligned with junior data scientist rather than machine learning engineer positions."
    }
  },

  "interview_notes": {
    "technical_questions": [
      "Can you walk me through your experience with TensorFlow beyond the stock price prediction project? What specific neural network architectures have you implemented?",
      "How would you approach deploying a machine learning model to a production environment? What tools and practices would you use for model monitoring and maintenance?",
      "What experience do you have with containerization technologies like Docker and Kubernetes? How have you used them in ML workflows?"
    ],
    "concerns_to_address": [
      "Experience gap (2 years vs. required 3+ years for senior role)",
      "Limited experience with deep learning frameworks beyond project work",
      "No demonstrated experience with containerization or MLOps",
      "Limited cloud platform expertise beyond basic AWS services"
    ],
    "general_notes": [
      "Candidate has strong educational background but lacks senior-level experience",
      "Consider for mid-level position rather than senior role",
      "Strong in statistics and traditional ML but gaps in deep learning and MLOps",
      "Would benefit from mentorship in production ML systems and DevOps practices"
    ]
  }
}

Enter fullscreen mode Exit fullscreen mode

Conclusion

The future of multi-agent AI systems is incredibly exciting. After testing these three approaches, I'm convinced that the choice of coordination pattern matters more than the specific use case. Whether you're building HR automation, document processing, or any other multi-step AI workflow, these patterns offer strong foundations for production-ready systems.

The key takeaway? Each architecture comes with trade-offs that go far beyond performance. Step Functions is not an obvious choice, but it is a highly reliable approach for orchestrating multi-agent flows, especially when clarity and debugging are important. Bedrock Agents provide a managed experience that is great for fast prototyping, although troubleshooting can be difficult when issues arise. Strands offers unmatched reasoning and flexibility, but its longer processing time and higher resource requirements often lead to running it in ECS, where scaling may become a different challenge.

Bottom line, choosing the right approach is not just about the technology itself; it is about how much control, flexibility, and complexity your team is prepared to handle.

Top comments (2)

Collapse
 
m_azarboon profile image
Mahdi Azarboon

Informative. Where did you host your Strand agents?

Collapse
 
mkreder profile image
Matias Kreder AWS Community Builders

After trying them locally I hosted them on AWS Lambda but I recommend ECS/Fargate for production usage.