DEV Community

Volodymyr Marynychev for AWS Community Builders

Posted on • Edited on

Orchestrating AI: Dynamic LLM Routing based on AWS Step Functions

By expanding this simple architectural pattern, you can significantly reduce your LLM costs while maintaining high-quality responses across different use cases.

Architecture Diagram

🚨 Important Disclaimer: Proof of Concept 🚨

This project is a demonstration of the dynamic AI model routing concept and should NOT be considered a production-ready solution.

Key Limitations:

  • Experimental architecture
  • Prototype-level implementation
  • Minimal error handling
  • Requires significant enhancement for enterprise use

Use at Your Own Risk

  • Not recommended for mission-critical applications
  • Potential unexpected behaviors
  • May incur unexpected cloud service costs

The goal of this project is to demonstrate a technical concept and provide a starting point for building intelligent, cost-effective AI routing systems. It's an educational resource and a blueprint for building more sophisticated solutions.

The Evolution of LLM Usage

The landscape of Large Language Models (LLMs) has evolved dramatically over the past few years. What started with GPT-3 has expanded into a diverse ecosystem of models, each with its own strengths and cost structures. You now have access to various options:

  • OpenAI's GPT-4 and GPT-3.5
  • Anthropic's Claude series
  • Open-source models like Llama 2
  • Cloud provider solutions like Amazon Bedrock
  • Budget DeepSeek models

This diversity brings both opportunities and challenges. While having multiple options provides flexibility, it also complicates the decision-making process. How do you choose the right model for each specific use case? How do you balance cost against performance? These questions become increasingly important as you scale your AI implementations.

Core Components and Resources

Complexity Analyzer

The first step in our routing system is analyzing the complexity of incoming queries. For this demonstration, we've implemented a simple classifier that categorizes inputs based on their characteristics. While we're using Claude 3 Sonnet in this example, you could easily swap it for a more cost-effective model like GPT- 3.5 or DeepSeek-R1 or even a simpler rule-based system, depending on your specific needs and budget constraints.

The complexity analyzer categorizes inputs into three basic levels, which helps determine the most appropriate model for handling each request:

def analyze_complexity(input_text):
    # Note: This is a demonstration using Claude 3 Sonnet
    # Consider using more cost-effective alternatives like DeepSeek
    # or implementing a custom rule-based classifier for production
    bedrock_client = boto3.client('bedrock-runtime')

    prompt = f"""
Analyze the complexity of the following input:
"{input_text}"

Classify it into one of these categories:
1. SIMPLE: Basic questions, straightforward tasks
2. CALCULATION: Mathematical operations, data analysis
3. COMPLEX: Multi-step reasoning, creative problem-solving

Return ONLY the classification (SIMPLE/CALCULATION/COMPLEX)
"""

response = bedrock_client.invoke_model(
        modelId="anthropic.claude-3-sonnet-v1",
        body=json.dumps({
            "messages": [{"role": "user", "content": prompt}],
            "max_tokens": 10
        })
    )
Enter fullscreen mode Exit fullscreen mode

AWS Step Functions State Machine

  • Orchestrates the entire workflow
  • Handles model selection logic based on complexity analysis
  • Manages error handling and retries
  • Integrates with various AWS services and external APIs

Step Functions Workflow

Lambda Functions

Complexity Analyzer Lambda

  • Uses Amazon Bedrock (Claude 3 Sonnet) to analyze input complexity
  • Classifies inputs into three categories: SIMPLE, CALCULATION, COMPLEX
  • Helps in optimal model selection

Bedrock Lambda (Instant & Sonnet)

  • Handles requests to Amazon Bedrock models
  • Claude Instant for simple queries
  • Claude Sonnet for complex analysis

Cost Calculator Lambda

  • Triggered by DynamoDB streams
  • Calculates precise costs for each model invocation
  • Updates cost information in DynamoDB

Storage and Database

DynamoDB Table

  • Stores execution results and metadata
  • Uses stream processing for cost calculations
  • Encrypted at rest using KMS

Security Components

KMS (Key Management Service)

  • Manages encryption keys for sensitive data
  • Used for DynamoDB encryption
  • Secures CloudWatch logs

SSM Parameter Store

  • Securely stores API keys
  • Manages configuration values
  • Encrypted using KMS

Access Control

  • Fine-grained IAM permissions
  • Service-to-service authentication
  • Secure parameter management

Integration Points

EventBridge API Destination

  • Manages OpenAI API integration
  • Handles API key authentication
  • Provides secure HTTP endpoints

The Cost-Effectiveness Dilemma: Is Dynamic Routing Worth It? 🤔

One of the most critical questions when designing any sophisticated system is: "Does the complexity come with a meaningful benefit?" In our dynamic AI model routing approach, we need to carefully analyze whether the overhead of complexity analysis justifies the potential cost savings.

The Hidden Cost of Complexity Analysis

Let's break down the economics of our approach:

# Complexity Analysis Cost Calculation
complexity_cost = 0.015 / 1000 * tokens  # Using Claude Sonnet as analyzer

model_costs = {
    "gpt-3.5": 0.000002 / 1000,       # Cheapest model
    "claude-instant": 0.0003 / 1000,  # Mid-range model 
    "claude-sonnet": 0.015 / 1000     # Most expensive model

}
def is_routing_cost_effective(input_text):
    # Complexity check costs ~50-100 tokens
    complexity_check_cost = 0.015 / 1000 * 100  # ~$0.0015

    # Potential savings by choosing optimal model
    potential_savings = calculate_model_cost_difference(input_text)

    return potential_savings > complexity_check_cost
Enter fullscreen mode Exit fullscreen mode

When Dynamic Routing Makes Sense

Dynamic model routing is most beneficial in scenarios with:

  • High-volume systems (1000+ daily requests)
  • Significant cost variation between models
  • Diverse input complexity
  • Large token count differences

When to Reconsider

You might want to skip complexity analysis if:

  • Your system has low request volume
  • Input complexity is relatively uniform
  • Model pricing is similar
  • You have strict latency requirements

Cost Comparison

Our solution doesn't just route requests - it meticulously tracks and calculates the cost of every single AI interaction. We've implemented a dedicated cost calculator Lambda function that processes each request's details and stores comprehensive cost information in DynamoDB. This approach allows for:

  • Granular cost tracking per request
  • Historical cost analysis
  • Insights into model usage patterns
def calculate_cost(model_used, tokens):
    MODEL_COSTS = {
        "gpt-3.5-turbo-1106": 0.000002,  # Average cost per token
        "bedrock-instant": 0.0003,
        "bedrock-sonnet": 0.015
    }

    # Calculate cost based on tokens used
    cost = (tokens * MODEL_COSTS.get(model_used, 0)) / 1000

    # Store detailed cost information in DynamoDB
    dynamodb.put_item(
        TableName='ai-usage-costs',
        Item={
            'execution_id': {'S': str(uuid.uuid4())},
            'model_used': {'S': model_used},
            'tokens_used': {'N': str(tokens)},
            'calculated_cost': {'N': str(cost)},
            'timestamp': {'S': datetime.now().isoformat()}
        }
    )

    return cost
Enter fullscreen mode Exit fullscreen mode

Terraform: Infrastructure as Code 🏗️

The entire solution is implemented as a modular Terraform project, making it easy to deploy and customize:

  • Supports multiple AWS regions
  • Easily configurable through variables
  • Manages all AWS resources declaratively
  • Includes security best practices
    • KMS encryption
    • IAM least-privilege roles
    • Secure parameter management

Getting Started 🚀

Want to try it out? Here's how:

  1. Prerequisites:
# Ensure you have
brew install terraform  # macOS
# or
sudo apt-get install terraform  # Linux

# Install AWS CLI
pip install awscli

# Configure AWS credentials
aws configure
Enter fullscreen mode Exit fullscreen mode
  1. Clone the Repository:
git clone https://github.com/requix/aws-step-functions-ai-orchestration.git
cd aws-step-functions-ai-orchestration/terraform
Enter fullscreen mode Exit fullscreen mode
  1. Set Up OpenAI API Key:
aws ssm put-parameter \
    --name "/ai-orchestration/openai-api-key" \
    --type "SecureString" \
    --value "your-openai-api-key"
Enter fullscreen mode Exit fullscreen mode
  1. Deploy Infrastructure:
terraform init
terraform plan
terraform apply
Enter fullscreen mode Exit fullscreen mode
  1. Run Your First Execution:
# Use the output from terraform apply
aws stepfunctions start-execution \
    --state-machine-arn YOUR_STATE_MACHINE_ARN \
    --input '{"input": "What is the capital of France?"}'
Enter fullscreen mode Exit fullscreen mode

Open Source and Community 🌐

The entire project is open-source and available on GitHub:
🔗 https://github.com/requix/aws-step-functions-ai-orchestration

We welcome contributions, issue reports, and feature suggestions!

Conclusion

By expanding this architectural pattern, you can create an intelligent, cost-effective AI routing system that adapts to different use cases. The key is flexibility, continuous monitoring, and a willingness to iterate.

Remember:

  • This is a proof of concept
  • Always test thoroughly
  • Monitor and optimize continuously

Happy routing! 🤖✨

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Create a simple OTP system with AWS Serverless cover image

Create a simple OTP system with AWS Serverless

Implement a One Time Password (OTP) system with AWS Serverless services including Lambda, API Gateway, DynamoDB, Simple Email Service (SES), and Amplify Web Hosting using VueJS for the frontend.

Read full post