By expanding this simple architectural pattern, you can significantly reduce your LLM costs while maintaining high-quality responses across different use cases.
🚨 Important Disclaimer: Proof of Concept 🚨
This project is a demonstration of the dynamic AI model routing concept and should NOT be considered a production-ready solution.
Key Limitations:
- Experimental architecture
- Prototype-level implementation
- Minimal error handling
- Requires significant enhancement for enterprise use
Use at Your Own Risk
- Not recommended for mission-critical applications
- Potential unexpected behaviors
- May incur unexpected cloud service costs
The goal of this project is to demonstrate a technical concept and provide a starting point for building intelligent, cost-effective AI routing systems. It's an educational resource and a blueprint for building more sophisticated solutions.
The Evolution of LLM Usage
The landscape of Large Language Models (LLMs) has evolved dramatically over the past few years. What started with GPT-3 has expanded into a diverse ecosystem of models, each with its own strengths and cost structures. You now have access to various options:
- OpenAI's GPT-4 and GPT-3.5
- Anthropic's Claude series
- Open-source models like Llama 2
- Cloud provider solutions like Amazon Bedrock
- Budget DeepSeek models
This diversity brings both opportunities and challenges. While having multiple options provides flexibility, it also complicates the decision-making process. How do you choose the right model for each specific use case? How do you balance cost against performance? These questions become increasingly important as you scale your AI implementations.
Core Components and Resources
Complexity Analyzer
The first step in our routing system is analyzing the complexity of incoming queries. For this demonstration, we've implemented a simple classifier that categorizes inputs based on their characteristics. While we're using Claude 3 Sonnet in this example, you could easily swap it for a more cost-effective model like GPT- 3.5 or DeepSeek-R1 or even a simpler rule-based system, depending on your specific needs and budget constraints.
The complexity analyzer categorizes inputs into three basic levels, which helps determine the most appropriate model for handling each request:
def analyze_complexity(input_text):
# Note: This is a demonstration using Claude 3 Sonnet
# Consider using more cost-effective alternatives like DeepSeek
# or implementing a custom rule-based classifier for production
bedrock_client = boto3.client('bedrock-runtime')
prompt = f"""
Analyze the complexity of the following input:
"{input_text}"
Classify it into one of these categories:
1. SIMPLE: Basic questions, straightforward tasks
2. CALCULATION: Mathematical operations, data analysis
3. COMPLEX: Multi-step reasoning, creative problem-solving
Return ONLY the classification (SIMPLE/CALCULATION/COMPLEX)
"""
response = bedrock_client.invoke_model(
modelId="anthropic.claude-3-sonnet-v1",
body=json.dumps({
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 10
})
)
AWS Step Functions State Machine
- Orchestrates the entire workflow
- Handles model selection logic based on complexity analysis
- Manages error handling and retries
- Integrates with various AWS services and external APIs
Lambda Functions
Complexity Analyzer Lambda
- Uses Amazon Bedrock (Claude 3 Sonnet) to analyze input complexity
- Classifies inputs into three categories: SIMPLE, CALCULATION, COMPLEX
- Helps in optimal model selection
Bedrock Lambda (Instant & Sonnet)
- Handles requests to Amazon Bedrock models
- Claude Instant for simple queries
- Claude Sonnet for complex analysis
Cost Calculator Lambda
- Triggered by DynamoDB streams
- Calculates precise costs for each model invocation
- Updates cost information in DynamoDB
Storage and Database
DynamoDB Table
- Stores execution results and metadata
- Uses stream processing for cost calculations
- Encrypted at rest using KMS
Security Components
KMS (Key Management Service)
- Manages encryption keys for sensitive data
- Used for DynamoDB encryption
- Secures CloudWatch logs
SSM Parameter Store
- Securely stores API keys
- Manages configuration values
- Encrypted using KMS
Access Control
- Fine-grained IAM permissions
- Service-to-service authentication
- Secure parameter management
Integration Points
EventBridge API Destination
- Manages OpenAI API integration
- Handles API key authentication
- Provides secure HTTP endpoints
The Cost-Effectiveness Dilemma: Is Dynamic Routing Worth It? 🤔
One of the most critical questions when designing any sophisticated system is: "Does the complexity come with a meaningful benefit?" In our dynamic AI model routing approach, we need to carefully analyze whether the overhead of complexity analysis justifies the potential cost savings.
The Hidden Cost of Complexity Analysis
Let's break down the economics of our approach:
# Complexity Analysis Cost Calculation
complexity_cost = 0.015 / 1000 * tokens # Using Claude Sonnet as analyzer
model_costs = {
"gpt-3.5": 0.000002 / 1000, # Cheapest model
"claude-instant": 0.0003 / 1000, # Mid-range model
"claude-sonnet": 0.015 / 1000 # Most expensive model
}
def is_routing_cost_effective(input_text):
# Complexity check costs ~50-100 tokens
complexity_check_cost = 0.015 / 1000 * 100 # ~$0.0015
# Potential savings by choosing optimal model
potential_savings = calculate_model_cost_difference(input_text)
return potential_savings > complexity_check_cost
When Dynamic Routing Makes Sense
Dynamic model routing is most beneficial in scenarios with:
- High-volume systems (1000+ daily requests)
- Significant cost variation between models
- Diverse input complexity
- Large token count differences
When to Reconsider
You might want to skip complexity analysis if:
- Your system has low request volume
- Input complexity is relatively uniform
- Model pricing is similar
- You have strict latency requirements
Cost Comparison
Our solution doesn't just route requests - it meticulously tracks and calculates the cost of every single AI interaction. We've implemented a dedicated cost calculator Lambda function that processes each request's details and stores comprehensive cost information in DynamoDB. This approach allows for:
- Granular cost tracking per request
- Historical cost analysis
- Insights into model usage patterns
def calculate_cost(model_used, tokens):
MODEL_COSTS = {
"gpt-3.5-turbo-1106": 0.000002, # Average cost per token
"bedrock-instant": 0.0003,
"bedrock-sonnet": 0.015
}
# Calculate cost based on tokens used
cost = (tokens * MODEL_COSTS.get(model_used, 0)) / 1000
# Store detailed cost information in DynamoDB
dynamodb.put_item(
TableName='ai-usage-costs',
Item={
'execution_id': {'S': str(uuid.uuid4())},
'model_used': {'S': model_used},
'tokens_used': {'N': str(tokens)},
'calculated_cost': {'N': str(cost)},
'timestamp': {'S': datetime.now().isoformat()}
}
)
return cost
Terraform: Infrastructure as Code 🏗️
The entire solution is implemented as a modular Terraform project, making it easy to deploy and customize:
- Supports multiple AWS regions
- Easily configurable through variables
- Manages all AWS resources declaratively
- Includes security best practices
- KMS encryption
- IAM least-privilege roles
- Secure parameter management
Getting Started 🚀
Want to try it out? Here's how:
- Prerequisites:
# Ensure you have
brew install terraform # macOS
# or
sudo apt-get install terraform # Linux
# Install AWS CLI
pip install awscli
# Configure AWS credentials
aws configure
- Clone the Repository:
git clone https://github.com/requix/aws-step-functions-ai-orchestration.git
cd aws-step-functions-ai-orchestration/terraform
- Set Up OpenAI API Key:
aws ssm put-parameter \
--name "/ai-orchestration/openai-api-key" \
--type "SecureString" \
--value "your-openai-api-key"
- Deploy Infrastructure:
terraform init
terraform plan
terraform apply
- Run Your First Execution:
# Use the output from terraform apply
aws stepfunctions start-execution \
--state-machine-arn YOUR_STATE_MACHINE_ARN \
--input '{"input": "What is the capital of France?"}'
Open Source and Community 🌐
The entire project is open-source and available on GitHub:
🔗 https://github.com/requix/aws-step-functions-ai-orchestration
We welcome contributions, issue reports, and feature suggestions!
Conclusion
By expanding this architectural pattern, you can create an intelligent, cost-effective AI routing system that adapts to different use cases. The key is flexibility, continuous monitoring, and a willingness to iterate.
Remember:
- This is a proof of concept
- Always test thoroughly
- Monitor and optimize continuously
Happy routing! 🤖✨
Top comments (0)