Volodymyr Marynychev for AWS Community Builders

Posted on Jan 29 • Edited on Jan 31

Orchestrating AI: Dynamic LLM Routing based on AWS Step Functions

#aws #stepfunctions #ai #llm

By expanding this simple architectural pattern, you can significantly reduce your LLM costs while maintaining high-quality responses across different use cases.

🚨 Important Disclaimer: Proof of Concept 🚨

This project is a demonstration of the dynamic AI model routing concept and should NOT be considered a production-ready solution.

Key Limitations:

Experimental architecture
Prototype-level implementation
Minimal error handling
Requires significant enhancement for enterprise use

Use at Your Own Risk

Not recommended for mission-critical applications
Potential unexpected behaviors
May incur unexpected cloud service costs

The goal of this project is to demonstrate a technical concept and provide a starting point for building intelligent, cost-effective AI routing systems. It's an educational resource and a blueprint for building more sophisticated solutions.

The Evolution of LLM Usage

The landscape of Large Language Models (LLMs) has evolved dramatically over the past few years. What started with GPT-3 has expanded into a diverse ecosystem of models, each with its own strengths and cost structures. You now have access to various options:

OpenAI's GPT-4 and GPT-3.5
Anthropic's Claude series
Open-source models like Llama 2
Cloud provider solutions like Amazon Bedrock
Budget DeepSeek models

This diversity brings both opportunities and challenges. While having multiple options provides flexibility, it also complicates the decision-making process. How do you choose the right model for each specific use case? How do you balance cost against performance? These questions become increasingly important as you scale your AI implementations.

Core Components and Resources

Complexity Analyzer

The first step in our routing system is analyzing the complexity of incoming queries. For this demonstration, we've implemented a simple classifier that categorizes inputs based on their characteristics. While we're using Claude 3 Sonnet in this example, you could easily swap it for a more cost-effective model like GPT- 3.5 or DeepSeek-R1 or even a simpler rule-based system, depending on your specific needs and budget constraints.

The complexity analyzer categorizes inputs into three basic levels, which helps determine the most appropriate model for handling each request:

def analyze_complexity(input_text):
    # Note: This is a demonstration using Claude 3 Sonnet
    # Consider using more cost-effective alternatives like DeepSeek
    # or implementing a custom rule-based classifier for production
    bedrock_client = boto3.client('bedrock-runtime')

    prompt = f"""
Analyze the complexity of the following input:
"{input_text}"

Classify it into one of these categories:
1. SIMPLE: Basic questions, straightforward tasks
2. CALCULATION: Mathematical operations, data analysis
3. COMPLEX: Multi-step reasoning, creative problem-solving

Return ONLY the classification (SIMPLE/CALCULATION/COMPLEX)
"""

response = bedrock_client.invoke_model(
        modelId="anthropic.claude-3-sonnet-v1",
        body=json.dumps({
            "messages": [{"role": "user", "content": prompt}],
            "max_tokens": 10
        })
    )

AWS Step Functions State Machine

Orchestrates the entire workflow
Handles model selection logic based on complexity analysis
Manages error handling and retries
Integrates with various AWS services and external APIs

Lambda Functions

Complexity Analyzer Lambda

Uses Amazon Bedrock (Claude 3 Sonnet) to analyze input complexity
Classifies inputs into three categories: SIMPLE, CALCULATION, COMPLEX
Helps in optimal model selection

Bedrock Lambda (Instant & Sonnet)

Handles requests to Amazon Bedrock models
Claude Instant for simple queries
Claude Sonnet for complex analysis

Cost Calculator Lambda

Triggered by DynamoDB streams
Calculates precise costs for each model invocation
Updates cost information in DynamoDB

Storage and Database

DynamoDB Table

Stores execution results and metadata
Uses stream processing for cost calculations
Encrypted at rest using KMS

Security Components

KMS (Key Management Service)

Manages encryption keys for sensitive data
Used for DynamoDB encryption
Secures CloudWatch logs

SSM Parameter Store

Securely stores API keys
Manages configuration values
Encrypted using KMS

Access Control

Fine-grained IAM permissions
Service-to-service authentication
Secure parameter management

Integration Points

EventBridge API Destination

Manages OpenAI API integration
Handles API key authentication
Provides secure HTTP endpoints

The Cost-Effectiveness Dilemma: Is Dynamic Routing Worth It? 🤔

One of the most critical questions when designing any sophisticated system is: "Does the complexity come with a meaningful benefit?" In our dynamic AI model routing approach, we need to carefully analyze whether the overhead of complexity analysis justifies the potential cost savings.

The Hidden Cost of Complexity Analysis

Let's break down the economics of our approach:

# Complexity Analysis Cost Calculation
complexity_cost = 0.015 / 1000 * tokens  # Using Claude Sonnet as analyzer

model_costs = {
    "gpt-3.5": 0.000002 / 1000,       # Cheapest model
    "claude-instant": 0.0003 / 1000,  # Mid-range model 
    "claude-sonnet": 0.015 / 1000     # Most expensive model

}
def is_routing_cost_effective(input_text):
    # Complexity check costs ~50-100 tokens
    complexity_check_cost = 0.015 / 1000 * 100  # ~$0.0015

    # Potential savings by choosing optimal model
    potential_savings = calculate_model_cost_difference(input_text)

    return potential_savings > complexity_check_cost

When Dynamic Routing Makes Sense

Dynamic model routing is most beneficial in scenarios with:

High-volume systems (1000+ daily requests)
Significant cost variation between models
Diverse input complexity
Large token count differences

When to Reconsider

You might want to skip complexity analysis if:

Your system has low request volume
Input complexity is relatively uniform
Model pricing is similar
You have strict latency requirements

Cost Comparison

Our solution doesn't just route requests - it meticulously tracks and calculates the cost of every single AI interaction. We've implemented a dedicated cost calculator Lambda function that processes each request's details and stores comprehensive cost information in DynamoDB. This approach allows for:

Granular cost tracking per request
Historical cost analysis
Insights into model usage patterns

def calculate_cost(model_used, tokens):
    MODEL_COSTS = {
        "gpt-3.5-turbo-1106": 0.000002,  # Average cost per token
        "bedrock-instant": 0.0003,
        "bedrock-sonnet": 0.015
    }

    # Calculate cost based on tokens used
    cost = (tokens * MODEL_COSTS.get(model_used, 0)) / 1000

    # Store detailed cost information in DynamoDB
    dynamodb.put_item(
        TableName='ai-usage-costs',
        Item={
            'execution_id': {'S': str(uuid.uuid4())},
            'model_used': {'S': model_used},
            'tokens_used': {'N': str(tokens)},
            'calculated_cost': {'N': str(cost)},
            'timestamp': {'S': datetime.now().isoformat()}
        }
    )

    return cost

Terraform: Infrastructure as Code 🏗️

The entire solution is implemented as a modular Terraform project, making it easy to deploy and customize:

Supports multiple AWS regions
Easily configurable through variables
Manages all AWS resources declaratively
Includes security best practices
- KMS encryption
- IAM least-privilege roles
- Secure parameter management

Getting Started 🚀

Want to try it out? Here's how:

Prerequisites:

# Ensure you have
brew install terraform  # macOS
# or
sudo apt-get install terraform  # Linux

# Install AWS CLI
pip install awscli

# Configure AWS credentials
aws configure

Clone the Repository:

git clone https://github.com/requix/aws-step-functions-ai-orchestration.git
cd aws-step-functions-ai-orchestration/terraform

Set Up OpenAI API Key:

aws ssm put-parameter \
    --name "/ai-orchestration/openai-api-key" \
    --type "SecureString" \
    --value "your-openai-api-key"

Deploy Infrastructure:

terraform init
terraform plan
terraform apply

Run Your First Execution:

# Use the output from terraform apply
aws stepfunctions start-execution \
    --state-machine-arn YOUR_STATE_MACHINE_ARN \
    --input '{"input": "What is the capital of France?"}'

Open Source and Community 🌐

The entire project is open-source and available on GitHub:
🔗 https://github.com/requix/aws-step-functions-ai-orchestration

We welcome contributions, issue reports, and feature suggestions!

Conclusion

By expanding this architectural pattern, you can create an intelligent, cost-effective AI routing system that adapts to different use cases. The key is flexibility, continuous monitoring, and a willingness to iterate.

Remember:

This is a proof of concept
Always test thoroughly
Monitor and optimize continuously

Happy routing! 🤖✨

DEV Community