$2,847 for six hours of work.
That's what I paid when a developer accidentally created an infinite loop in my AI platform. GPT-4 at $3.20 per iteration, running 891 times while I slept. The agent kept retrying a failed classification call, convinced it could eventually succeed.
It never did. But my credit card kept getting charged.
That morning taught me that cost control isn't a nice-to-have feature - it's life support for AI platforms.
The $2,847 Wake-Up Call
Here's exactly what happened. A developer was testing an agent that analyzed customer feedback. The agent was supposed to:
- Extract sentiment from reviews
- Classify issues
- Generate summary reports
But there was a bug in the ReAct loop. When the classification tool returned an empty result (which happened for non-English reviews), the agent assumed it failed and retried. Forever.
The logs told the story:
2024-01-15 14:23:15 - Agent: Classifying review text...
2024-01-15 14:23:18 - Tool: Classification failed - no result
2024-01-15 14:23:19 - Agent: Let me try classifying again...
2024-01-15 14:23:22 - Tool: Classification failed - no result
2024-01-15 14:23:23 - Agent: Let me try classifying again...
... (repeats 891 times)
Each retry: 4,000 tokens of GPT-4. 891 retries x $3.20 = $2,847.
That's when I built comprehensive cost tracking and budget controls. Because if you're building an AI platform without cost guardrails, you're building a financial timebomb.
Per-User Cost Tracking Architecture
Every request now logs detailed cost information to DynamoDB. Here's the tracking middleware:
export interface UsageRecord {
userId: string;
requestId: string;
timestamp: number;
provider: string;
model: string;
promptTokens: number;
completionTokens: number;
totalTokens: number;
estimatedCost: number;
actualCost?: number; // Updated when we get actual billing
requestType: 'completion' | 'embedding' | 'agent';
metadata: {
endpoint: string;
userAgent: string;
duration: number;
byok: boolean; // Bring Your Own Key
};
}
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient, PutCommand, UpdateCommand } from '@aws-sdk/lib-dynamodb';
export class CostTracker {
private readonly pricingTable: Map<string, TokenPricing>;
constructor(private dynamoClient: DynamoDBDocumentClient) {
this.initializePricing();
}
async trackUsage(
userId: string,
requestId: string,
usage: TokenUsage,
metadata: RequestMetadata
): Promise<void> {
const pricing = this.pricingTable.get(`${usage.provider}:${usage.model}`);
if (!pricing) {
throw new Error(`No pricing data for ${usage.provider}:${usage.model}`);
}
const promptCost = (usage.promptTokens / 1000) * pricing.promptPer1K;
const completionCost = (usage.completionTokens / 1000) * pricing.completionPer1K;
const estimatedCost = promptCost + completionCost;
const record: UsageRecord = {
userId,
requestId,
timestamp: Date.now(),
provider: usage.provider,
model: usage.model,
promptTokens: usage.promptTokens,
completionTokens: usage.completionTokens,
totalTokens: usage.totalTokens,
estimatedCost,
requestType: metadata.requestType,
metadata: {
endpoint: metadata.endpoint,
userAgent: metadata.userAgent,
duration: metadata.duration,
byok: metadata.byok
}
};
await this.dynamoClient.send(new PutCommand({
TableName: 'ai-platform-usage',
Item: record
}));
// Update real-time budget tracking
await this.updateUserBudget(userId, estimatedCost);
}
private initializePricing(): void {
// Updated regularly from provider APIs
this.pricingTable.set('openai:gpt-4', {
promptPer1K: 0.030,
completionPer1K: 0.060
});
this.pricingTable.set('openai:gpt-4-turbo', {
promptPer1K: 0.010,
completionPer1K: 0.030
});
this.pricingTable.set('anthropic:claude-3-sonnet', {
promptPer1K: 0.003,
completionPer1K: 0.015
});
// ... more models
}
}
Real-Time Budget Management
The budget system prevents runaway costs with soft and hard limits:
export interface UserBudget {
userId: string;
monthlyLimit: number;
currentSpend: number;
warningThreshold: number; // Default: 80%
lastUpdated: number;
status: 'active' | 'warning' | 'blocked';
notifications: {
warning: boolean;
limit: boolean;
lastSent: number;
};
}
export class BudgetManager {
async checkBudget(userId: string, estimatedCost: number): Promise<BudgetCheckResult> {
const budget = await this.getUserBudget(userId);
const projectedSpend = budget.currentSpend + estimatedCost;
if (projectedSpend > budget.monthlyLimit) {
return {
allowed: false,
reason: 'Monthly budget exceeded',
currentSpend: budget.currentSpend,
limit: budget.monthlyLimit,
remainingBudget: 0
};
}
if (projectedSpend > (budget.monthlyLimit * budget.warningThreshold / 100)) {
await this.sendBudgetWarning(userId, budget);
return {
allowed: true,
warning: true,
reason: 'Approaching budget limit',
currentSpend: budget.currentSpend,
limit: budget.monthlyLimit,
remainingBudget: budget.monthlyLimit - projectedSpend
};
}
return {
allowed: true,
currentSpend: budget.currentSpend,
limit: budget.monthlyLimit,
remainingBudget: budget.monthlyLimit - projectedSpend
};
}
async updateUserBudget(userId: string, cost: number): Promise<void> {
const now = Date.now();
const monthStart = new Date(new Date().getFullYear(), new Date().getMonth(), 1).getTime();
await this.dynamoClient.send(new UpdateCommand({
TableName: 'ai-platform-budgets',
Key: { userId },
UpdateExpression: `
SET currentSpend = if_not_exists(currentSpend, :zero) + :cost,
lastUpdated = :now,
#status = :status
`,
ExpressionAttributeNames: {
'#status': 'status'
},
ExpressionAttributeValues: {
':cost': cost,
':now': now,
':zero': 0,
':status': 'active'
}
}));
// Reset monthly spend if new month
if (now > monthStart + (30 * 24 * 60 * 60 * 1000)) {
await this.resetMonthlyBudget(userId);
}
}
private async sendBudgetWarning(userId: string, budget: UserBudget): Promise<void> {
const timeSinceLastWarning = Date.now() - budget.notifications.lastSent;
const hoursSinceWarning = timeSinceLastWarning / (1000 * 60 * 60);
// Don't spam warnings - max once per 6 hours
if (hoursSinceWarning < 6) return;
const percentUsed = (budget.currentSpend / budget.monthlyLimit) * 100;
await this.notificationService.send({
userId,
type: 'budget_warning',
title: 'AI Usage Budget Warning',
message: `You've used ${percentUsed.toFixed(1)}% of your monthly AI budget ($${budget.currentSpend.toFixed(2)} of $${budget.monthlyLimit})`,
severity: 'warning'
});
await this.updateNotificationTime(userId);
}
}
Authentication: Three Layers Deep
I implement three authentication patterns depending on the use case:
1. API Keys for External Developers
export interface ApiKey {
keyId: string;
keyPrefix: string; // First 8 chars for display
hashedKey: string; // bcrypt hash
userId: string;
name: string;
scopes: string[];
rateLimit: {
requestsPerMinute: number;
tokensPerMinute: number;
};
budget: {
monthlyLimit: number;
currentSpend: number;
};
status: 'active' | 'suspended' | 'revoked';
createdAt: number;
lastUsed?: number;
expiresAt?: number;
}
export class ApiKeyAuth {
async validateApiKey(rawKey: string): Promise<AuthResult> {
// Extract key prefix
const keyId = rawKey.substring(0, 12);
const keyData = await this.getApiKey(keyId);
if (!keyData || keyData.status !== 'active') {
return { valid: false, reason: 'Invalid API key' };
}
// Check expiration
if (keyData.expiresAt && Date.now() > keyData.expiresAt) {
return { valid: false, reason: 'API key expired' };
}
// Verify hash
const isValid = await bcrypt.compare(rawKey, keyData.hashedKey);
if (!isValid) {
return { valid: false, reason: 'Invalid API key' };
}
// Update last used
await this.updateLastUsed(keyId);
return {
valid: true,
userId: keyData.userId,
scopes: keyData.scopes,
rateLimit: keyData.rateLimit,
budget: keyData.budget
};
}
async createApiKey(userId: string, options: CreateKeyOptions): Promise<string> {
const rawKey = `sk-${generateId(48)}`; // sk- prefix like OpenAI
const hashedKey = await bcrypt.hash(rawKey, 12);
const apiKey: ApiKey = {
keyId: rawKey.substring(0, 12),
keyPrefix: rawKey.substring(0, 8),
hashedKey,
userId,
name: options.name,
scopes: options.scopes || ['ai:complete', 'ai:embed'],
rateLimit: options.rateLimit || {
requestsPerMinute: 60,
tokensPerMinute: 100000
},
budget: options.budget || {
monthlyLimit: 100,
currentSpend: 0
},
status: 'active',
createdAt: Date.now(),
expiresAt: options.expiresAt
};
await this.storeApiKey(apiKey);
return rawKey; // Only returned once!
}
}
2. JWT for Internal Services
Internal microservices use JWT tokens with short expiration:
export class JWTAuth {
constructor(private secretKey: string) {}
generateServiceToken(serviceId: string, scopes: string[]): string {
return jwt.sign(
{
sub: serviceId,
aud: 'ai-platform',
iss: 'ai-platform-auth',
scopes,
type: 'service'
},
this.secretKey,
{
expiresIn: '1h',
algorithm: 'HS256'
}
);
}
async validateJWT(token: string): Promise<JWTAuthResult> {
try {
const decoded = jwt.verify(token, this.secretKey) as JWTPayload;
return {
valid: true,
serviceId: decoded.sub,
scopes: decoded.scopes,
type: decoded.type
};
} catch (error) {
return {
valid: false,
reason: error.message
};
}
}
}
3. IAM Roles for AWS Services
Lambda functions and ECS tasks use IAM roles for service-to-service authentication:
export class IAMAuth {
async validateAWSRequest(request: Request): Promise<IAMAuthResult> {
const authHeader = request.headers['authorization'];
if (!authHeader?.startsWith('AWS4-HMAC-SHA256')) {
return { valid: false, reason: 'Missing AWS signature' };
}
// Parse AWS Signature V4
const signature = this.parseAWSSignature(authHeader);
const isValid = await this.verifyAWSSignature(request, signature);
if (!isValid) {
return { valid: false, reason: 'Invalid AWS signature' };
}
// Get IAM role/user details
const identity = await this.getAWSIdentity(signature.accessKeyId);
return {
valid: true,
identity,
scopes: this.mapIAMToScopes(identity.policies)
};
}
}
Rate Limiting with Token Buckets
I use the token bucket algorithm for smooth rate limiting:
export class RateLimiter {
private buckets = new Map<string, TokenBucket>();
async checkRate(
userId: string,
requestType: 'request' | 'token',
amount: number = 1
): Promise<RateLimitResult> {
const bucketKey = `${userId}:${requestType}`;
let bucket = this.buckets.get(bucketKey);
if (!bucket) {
const limits = await this.getUserLimits(userId);
bucket = new TokenBucket(
limits[requestType].capacity,
limits[requestType].refillRate
);
this.buckets.set(bucketKey, bucket);
}
const allowed = bucket.consume(amount);
return {
allowed,
remainingTokens: bucket.tokens,
refillRate: bucket.refillRate,
resetTime: bucket.nextRefill
};
}
}
class TokenBucket {
private lastRefill: number;
constructor(
private capacity: number,
public refillRate: number, // tokens per second
public tokens: number = capacity
) {
this.lastRefill = Date.now();
}
consume(amount: number): boolean {
this.refill();
if (this.tokens >= amount) {
this.tokens -= amount;
return true;
}
return false;
}
private refill(): void {
const now = Date.now();
const timePassed = (now - this.lastRefill) / 1000;
const tokensToAdd = timePassed * this.refillRate;
this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
this.lastRefill = now;
}
get nextRefill(): number {
const tokensNeeded = this.capacity - this.tokens;
const timeToRefill = tokensNeeded / this.refillRate;
return this.lastRefill + (timeToRefill * 1000);
}
}
BYOK: Bring Your Own Key Deep Dive
Many users want to bring their own OpenAI/Anthropic API keys for cost control and compliance. This requires careful security handling:
export class BYOKManager {
private readonly encryptionKey: Buffer;
constructor() {
this.encryptionKey = crypto.scryptSync(
process.env.BYOK_PASSWORD!,
process.env.BYOK_SALT!,
32
);
}
async storeUserKey(
userId: string,
provider: string,
apiKey: string,
metadata?: KeyMetadata
): Promise<void> {
// Validate key before storing
const isValid = await this.validateProviderKey(provider, apiKey);
if (!isValid) {
throw new Error('Invalid API key for provider');
}
// Encrypt the key
const iv = crypto.randomBytes(16);
const cipher = crypto.createCipherGCM('aes-256-gcm', this.encryptionKey);
cipher.setAAD(Buffer.from(userId)); // Additional authenticated data
let encrypted = cipher.update(apiKey, 'utf8', 'hex');
encrypted += cipher.final('hex');
const authTag = cipher.getAuthTag();
// Store encrypted key
await this.dynamoClient.put({
TableName: 'ai-platform-user-keys',
Item: {
userId,
provider,
encryptedKey: encrypted,
iv: iv.toString('hex'),
authTag: authTag.toString('hex'),
metadata,
createdAt: Date.now(),
lastValidated: Date.now(),
status: 'active'
}
}).promise();
}
async getUserKey(userId: string, provider: string): Promise<string | null> {
const result = await this.dynamoClient.get({
TableName: 'ai-platform-user-keys',
Key: { userId, provider }
}).promise();
if (!result.Item) return null;
const { encryptedKey, iv, authTag } = result.Item;
// Decrypt the key
const decipher = crypto.createDecipherGCM('aes-256-gcm', this.encryptionKey);
decipher.setAAD(Buffer.from(userId));
decipher.setAuthTag(Buffer.from(authTag, 'hex'));
let decrypted = decipher.update(encryptedKey, 'hex', 'utf8');
decrypted += decipher.final('utf8');
return decrypted;
}
private async validateProviderKey(provider: string, apiKey: string): Promise<boolean> {
try {
switch (provider) {
case 'openai':
const openai = new OpenAI({ apiKey });
await openai.models.list();
return true;
case 'anthropic':
const anthropic = new Anthropic({ apiKey });
await anthropic.messages.create({
model: 'claude-3-haiku-20240307',
messages: [{ role: 'user', content: 'test' }],
max_tokens: 1
});
return true;
default:
return false;
}
} catch (error) {
return false;
}
}
}
Monitoring and Alerting
CloudWatch dashboards show real-time platform health:
export class MonitoringService {
async createDashboard(): Promise<void> {
await this.cloudwatchClient.putDashboard({
DashboardName: 'ai-platform-production',
DashboardBody: JSON.stringify({
widgets: [
{
type: 'metric',
properties: {
metrics: [
['AWS/Lambda', 'Duration', 'FunctionName', 'ai-platform-gateway'],
['AWS/Lambda', 'Errors', 'FunctionName', 'ai-platform-gateway'],
['AWS/ECS', 'CPUUtilization', 'ServiceName', 'ai-agents'],
['AWS/ECS', 'MemoryUtilization', 'ServiceName', 'ai-agents']
],
period: 300,
stat: 'Average',
region: 'us-east-1',
title: 'Infrastructure Health'
}
},
{
type: 'metric',
properties: {
metrics: [
['ai-platform', 'RequestCount'],
['ai-platform', 'TokensProcessed'],
['ai-platform', 'CostPerHour'],
['ai-platform', 'ErrorRate']
],
period: 300,
stat: 'Sum',
title: 'Business Metrics'
}
}
]
})
}).promise();
}
async setupCostAnomalyDetection(): Promise<void> {
// Alert if hourly costs exceed 150% of baseline
await this.cloudwatchClient.putAnomalyAlarm({
AlarmName: 'ai-platform-cost-anomaly',
MetricName: 'CostPerHour',
Namespace: 'ai-platform',
Statistic: 'Sum',
AnomalyDetector: {
MetricMathAnomalyDetector: {
MetricDataQueries: [
{
Id: 'cost_per_hour',
MetricStat: {
Metric: {
MetricName: 'CostPerHour',
Namespace: 'ai-platform'
},
Period: 3600,
Stat: 'Sum'
}
}
]
}
},
ComparisonOperator: 'GreaterThanUpperThreshold',
EvaluationPeriods: 2,
AlarmActions: [process.env.ALERT_SNS_TOPIC]
}).promise();
}
}
Security Hardening Checklist
Here's my production security configuration:
Secrets Management:
// All secrets in AWS Systems Manager Parameter Store
const config = {
jwtSecret: await getParameter('/ai-platform/jwt-secret', true),
encryptionKey: await getParameter('/ai-platform/encryption-key', true),
openaiKey: await getParameter('/ai-platform/openai-key', true)
};
VPC Configuration for ECS:
# CDK construct for secure ECS
new ecs.FargateService(this, 'AgentService', {
cluster,
taskDefinition,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
securityGroups: [privateSecurityGroup],
assignPublicIp: false
});
# Security group: only allows outbound HTTPS
privateSecurityGroup.addEgressRule(
ec2.Peer.anyIpv4(),
ec2.Port.tcp(443),
'HTTPS outbound for API calls'
);
API Gateway with WAF:
new wafv2.WebAcl(this, 'ApiWaf', {
scope: wafv2.Scope.REGIONAL,
rules: [
{
name: 'RateLimitRule',
priority: 1,
action: wafv2.WafAction.block(),
statement: wafv2.WafStatement.rateBasedStatement({
limit: 1000, // requests per 5 minutes
aggregateKeyType: wafv2.AggregateKeyType.IP
})
},
{
name: 'GeoRestrictRule',
priority: 2,
action: wafv2.WafAction.block(),
statement: wafv2.WafStatement.geoMatchStatement({
countryCodes: ['CN', 'RU'] // Block certain countries
})
}
]
});
Cost Breakdown: What This Actually Costs
After 8 months in production serving 1,500 requests/day:
Fixed Infrastructure Costs (Monthly):
- API Gateway: $3.50 (1M requests)
- Lambda (Gateway): $8.20 (compute + requests)
- ECS Fargate: $15.40 (avg 2 tasks running)
- DynamoDB: $6.80 (usage tracking + budgets)
- Application Load Balancer: $16.20
- CloudWatch: $4.30
- Total Fixed: $54.40/month
Variable AI Costs (Pass-through):
- OpenAI API: $340-890/month (user-dependent)
- Anthropic API: $180-420/month
- AWS Bedrock: $45-120/month
- Total Variable: User-driven, 2% platform markup
Cost Optimization Wins:
- Moved summarization to Claude Haiku: 60% cost reduction
- Implemented response caching: 25% fewer API calls
- BYOK adoption: 70% of users, zero platform AI costs
Real Production Incidents
Incident 1: Memory Leak in ECS Agent
- Symptom: Tasks consuming 8GB RAM, getting OOM killed
- Root cause: Long conversations not being garbage collected
- Fix: Added conversation pruning after 50 messages
- Prevention: Memory monitoring alerts at 80% usage
Incident 2: DynamoDB Throttling
- Symptom: Budget checks failing, users getting 500 errors
- Root cause: Hot partition on userId during peak traffic
- Fix: Added requestId to partition key for better distribution
- Prevention: DynamoDB on-demand billing mode
Incident 3: BYOK Key Validation Loop
- Symptom: Users unable to update API keys
- Root cause: Validation calling itself recursively
- Fix: Separate validation context from normal request context
- Prevention: Integration tests for all BYOK flows
The Security Mindset Shift
Building production AI infrastructure changed how I think about security. Traditional web apps have predictable resource usage. AI apps can consume unlimited resources with a single malicious prompt.
Every endpoint needs three guards:
- Authentication - Who are you?
- Authorization - What can you do?
- Budget control - How much can you spend?
Miss any one, and you're vulnerable.
What's Next
The complete production setup is documented in my ai-platform-aws-examples repo.
Next week, I'll tie everything together with a complete deployment walkthrough. You'll see how to go from zero to a fully operational AI platform in under an hour.
But more importantly, I'll share the real numbers: what this platform actually costs to run, performance metrics from production, and the roadmap for what's coming next.
Because the best architecture means nothing if you can't deploy it reliably.
This is part 7 of an 8-part series documenting my journey building an AI platform on AWS. Next week: the complete deployment guide and lessons from 8 months in production.
Top comments (0)