Tyson Cung

Posted on Mar 6

Cost Tracking, Auth & Production Hardening

#aws #security #production #ai

$2,847 for six hours of work.

That's what I paid when a developer accidentally created an infinite loop in my AI platform. GPT-4 at $3.20 per iteration, running 891 times while I slept. The agent kept retrying a failed classification call, convinced it could eventually succeed.

It never did. But my credit card kept getting charged.

That morning taught me that cost control isn't a nice-to-have feature - it's life support for AI platforms.

The $2,847 Wake-Up Call

Here's exactly what happened. A developer was testing an agent that analyzed customer feedback. The agent was supposed to:

Extract sentiment from reviews
Classify issues
Generate summary reports

But there was a bug in the ReAct loop. When the classification tool returned an empty result (which happened for non-English reviews), the agent assumed it failed and retried. Forever.

The logs told the story:

2024-01-15 14:23:15 - Agent: Classifying review text...
2024-01-15 14:23:18 - Tool: Classification failed - no result
2024-01-15 14:23:19 - Agent: Let me try classifying again...
2024-01-15 14:23:22 - Tool: Classification failed - no result  
2024-01-15 14:23:23 - Agent: Let me try classifying again...
... (repeats 891 times)

Each retry: 4,000 tokens of GPT-4. 891 retries x $3.20 = $2,847.

That's when I built comprehensive cost tracking and budget controls. Because if you're building an AI platform without cost guardrails, you're building a financial timebomb.

Per-User Cost Tracking Architecture

Every request now logs detailed cost information to DynamoDB. Here's the tracking middleware:

export interface UsageRecord {
  userId: string;
  requestId: string;
  timestamp: number;
  provider: string;
  model: string;
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  estimatedCost: number;
  actualCost?: number; // Updated when we get actual billing
  requestType: 'completion' | 'embedding' | 'agent';
  metadata: {
    endpoint: string;
    userAgent: string;
    duration: number;
    byok: boolean; // Bring Your Own Key
  };
}

import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient, PutCommand, UpdateCommand } from '@aws-sdk/lib-dynamodb';

export class CostTracker {
  private readonly pricingTable: Map<string, TokenPricing>;

  constructor(private dynamoClient: DynamoDBDocumentClient) {
    this.initializePricing();
  }

  async trackUsage(
    userId: string, 
    requestId: string,
    usage: TokenUsage,
    metadata: RequestMetadata
  ): Promise<void> {
    const pricing = this.pricingTable.get(`${usage.provider}:${usage.model}`);
    if (!pricing) {
      throw new Error(`No pricing data for ${usage.provider}:${usage.model}`);
    }

    const promptCost = (usage.promptTokens / 1000) * pricing.promptPer1K;
    const completionCost = (usage.completionTokens / 1000) * pricing.completionPer1K;
    const estimatedCost = promptCost + completionCost;

    const record: UsageRecord = {
      userId,
      requestId,
      timestamp: Date.now(),
      provider: usage.provider,
      model: usage.model,
      promptTokens: usage.promptTokens,
      completionTokens: usage.completionTokens,
      totalTokens: usage.totalTokens,
      estimatedCost,
      requestType: metadata.requestType,
      metadata: {
        endpoint: metadata.endpoint,
        userAgent: metadata.userAgent,
        duration: metadata.duration,
        byok: metadata.byok
      }
    };

    await this.dynamoClient.send(new PutCommand({
      TableName: 'ai-platform-usage',
      Item: record
    }));

    // Update real-time budget tracking
    await this.updateUserBudget(userId, estimatedCost);
  }

  private initializePricing(): void {
    // Updated regularly from provider APIs
    this.pricingTable.set('openai:gpt-4', {
      promptPer1K: 0.030,
      completionPer1K: 0.060
    });
    this.pricingTable.set('openai:gpt-4-turbo', {
      promptPer1K: 0.010,
      completionPer1K: 0.030  
    });
    this.pricingTable.set('anthropic:claude-3-sonnet', {
      promptPer1K: 0.003,
      completionPer1K: 0.015
    });
    // ... more models
  }
}

Real-Time Budget Management

The budget system prevents runaway costs with soft and hard limits:

export interface UserBudget {
  userId: string;
  monthlyLimit: number;
  currentSpend: number;
  warningThreshold: number; // Default: 80%
  lastUpdated: number;
  status: 'active' | 'warning' | 'blocked';
  notifications: {
    warning: boolean;
    limit: boolean;
    lastSent: number;
  };
}

export class BudgetManager {
  async checkBudget(userId: string, estimatedCost: number): Promise<BudgetCheckResult> {
    const budget = await this.getUserBudget(userId);
    const projectedSpend = budget.currentSpend + estimatedCost;

    if (projectedSpend > budget.monthlyLimit) {
      return {
        allowed: false,
        reason: 'Monthly budget exceeded',
        currentSpend: budget.currentSpend,
        limit: budget.monthlyLimit,
        remainingBudget: 0
      };
    }

    if (projectedSpend > (budget.monthlyLimit * budget.warningThreshold / 100)) {
      await this.sendBudgetWarning(userId, budget);
      return {
        allowed: true,
        warning: true,
        reason: 'Approaching budget limit',
        currentSpend: budget.currentSpend,
        limit: budget.monthlyLimit,
        remainingBudget: budget.monthlyLimit - projectedSpend
      };
    }

    return {
      allowed: true,
      currentSpend: budget.currentSpend,
      limit: budget.monthlyLimit,
      remainingBudget: budget.monthlyLimit - projectedSpend
    };
  }

  async updateUserBudget(userId: string, cost: number): Promise<void> {
    const now = Date.now();
    const monthStart = new Date(new Date().getFullYear(), new Date().getMonth(), 1).getTime();

    await this.dynamoClient.send(new UpdateCommand({
      TableName: 'ai-platform-budgets',
      Key: { userId },
      UpdateExpression: `
        SET currentSpend = if_not_exists(currentSpend, :zero) + :cost,
            lastUpdated = :now,
            #status = :status
      `,
      ExpressionAttributeNames: {
        '#status': 'status'
      },
      ExpressionAttributeValues: {
        ':cost': cost,
        ':now': now,
        ':zero': 0,
        ':status': 'active'
      }
    }));

    // Reset monthly spend if new month
    if (now > monthStart + (30 * 24 * 60 * 60 * 1000)) {
      await this.resetMonthlyBudget(userId);
    }
  }

  private async sendBudgetWarning(userId: string, budget: UserBudget): Promise<void> {
    const timeSinceLastWarning = Date.now() - budget.notifications.lastSent;
    const hoursSinceWarning = timeSinceLastWarning / (1000 * 60 * 60);

    // Don't spam warnings - max once per 6 hours
    if (hoursSinceWarning < 6) return;

    const percentUsed = (budget.currentSpend / budget.monthlyLimit) * 100;

    await this.notificationService.send({
      userId,
      type: 'budget_warning',
      title: 'AI Usage Budget Warning',
      message: `You've used ${percentUsed.toFixed(1)}% of your monthly AI budget ($${budget.currentSpend.toFixed(2)} of $${budget.monthlyLimit})`,
      severity: 'warning'
    });

    await this.updateNotificationTime(userId);
  }
}

Authentication: Three Layers Deep

I implement three authentication patterns depending on the use case:

1. API Keys for External Developers

export interface ApiKey {
  keyId: string;
  keyPrefix: string; // First 8 chars for display
  hashedKey: string; // bcrypt hash
  userId: string;
  name: string;
  scopes: string[];
  rateLimit: {
    requestsPerMinute: number;
    tokensPerMinute: number;
  };
  budget: {
    monthlyLimit: number;
    currentSpend: number;
  };
  status: 'active' | 'suspended' | 'revoked';
  createdAt: number;
  lastUsed?: number;
  expiresAt?: number;
}

export class ApiKeyAuth {
  async validateApiKey(rawKey: string): Promise<AuthResult> {
    // Extract key prefix
    const keyId = rawKey.substring(0, 12);
    const keyData = await this.getApiKey(keyId);

    if (!keyData || keyData.status !== 'active') {
      return { valid: false, reason: 'Invalid API key' };
    }

    // Check expiration
    if (keyData.expiresAt && Date.now() > keyData.expiresAt) {
      return { valid: false, reason: 'API key expired' };
    }

    // Verify hash
    const isValid = await bcrypt.compare(rawKey, keyData.hashedKey);
    if (!isValid) {
      return { valid: false, reason: 'Invalid API key' };
    }

    // Update last used
    await this.updateLastUsed(keyId);

    return {
      valid: true,
      userId: keyData.userId,
      scopes: keyData.scopes,
      rateLimit: keyData.rateLimit,
      budget: keyData.budget
    };
  }

  async createApiKey(userId: string, options: CreateKeyOptions): Promise<string> {
    const rawKey = `sk-${generateId(48)}`; // sk- prefix like OpenAI
    const hashedKey = await bcrypt.hash(rawKey, 12);

    const apiKey: ApiKey = {
      keyId: rawKey.substring(0, 12),
      keyPrefix: rawKey.substring(0, 8),
      hashedKey,
      userId,
      name: options.name,
      scopes: options.scopes || ['ai:complete', 'ai:embed'],
      rateLimit: options.rateLimit || {
        requestsPerMinute: 60,
        tokensPerMinute: 100000
      },
      budget: options.budget || {
        monthlyLimit: 100,
        currentSpend: 0
      },
      status: 'active',
      createdAt: Date.now(),
      expiresAt: options.expiresAt
    };

    await this.storeApiKey(apiKey);
    return rawKey; // Only returned once!
  }
}

2. JWT for Internal Services

Internal microservices use JWT tokens with short expiration:

export class JWTAuth {
  constructor(private secretKey: string) {}

  generateServiceToken(serviceId: string, scopes: string[]): string {
    return jwt.sign(
      {
        sub: serviceId,
        aud: 'ai-platform',
        iss: 'ai-platform-auth',
        scopes,
        type: 'service'
      },
      this.secretKey,
      {
        expiresIn: '1h',
        algorithm: 'HS256'
      }
    );
  }

  async validateJWT(token: string): Promise<JWTAuthResult> {
    try {
      const decoded = jwt.verify(token, this.secretKey) as JWTPayload;

      return {
        valid: true,
        serviceId: decoded.sub,
        scopes: decoded.scopes,
        type: decoded.type
      };
    } catch (error) {
      return {
        valid: false,
        reason: error.message
      };
    }
  }
}

3. IAM Roles for AWS Services

Lambda functions and ECS tasks use IAM roles for service-to-service authentication:

export class IAMAuth {
  async validateAWSRequest(request: Request): Promise<IAMAuthResult> {
    const authHeader = request.headers['authorization'];
    if (!authHeader?.startsWith('AWS4-HMAC-SHA256')) {
      return { valid: false, reason: 'Missing AWS signature' };
    }

    // Parse AWS Signature V4
    const signature = this.parseAWSSignature(authHeader);
    const isValid = await this.verifyAWSSignature(request, signature);

    if (!isValid) {
      return { valid: false, reason: 'Invalid AWS signature' };
    }

    // Get IAM role/user details
    const identity = await this.getAWSIdentity(signature.accessKeyId);

    return {
      valid: true,
      identity,
      scopes: this.mapIAMToScopes(identity.policies)
    };
  }
}

Rate Limiting with Token Buckets

I use the token bucket algorithm for smooth rate limiting:

export class RateLimiter {
  private buckets = new Map<string, TokenBucket>();

  async checkRate(
    userId: string, 
    requestType: 'request' | 'token',
    amount: number = 1
  ): Promise<RateLimitResult> {
    const bucketKey = `${userId}:${requestType}`;
    let bucket = this.buckets.get(bucketKey);

    if (!bucket) {
      const limits = await this.getUserLimits(userId);
      bucket = new TokenBucket(
        limits[requestType].capacity,
        limits[requestType].refillRate
      );
      this.buckets.set(bucketKey, bucket);
    }

    const allowed = bucket.consume(amount);

    return {
      allowed,
      remainingTokens: bucket.tokens,
      refillRate: bucket.refillRate,
      resetTime: bucket.nextRefill
    };
  }
}

class TokenBucket {
  private lastRefill: number;

  constructor(
    private capacity: number,
    public refillRate: number, // tokens per second
    public tokens: number = capacity
  ) {
    this.lastRefill = Date.now();
  }

  consume(amount: number): boolean {
    this.refill();

    if (this.tokens >= amount) {
      this.tokens -= amount;
      return true;
    }

    return false;
  }

  private refill(): void {
    const now = Date.now();
    const timePassed = (now - this.lastRefill) / 1000;
    const tokensToAdd = timePassed * this.refillRate;

    this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
    this.lastRefill = now;
  }

  get nextRefill(): number {
    const tokensNeeded = this.capacity - this.tokens;
    const timeToRefill = tokensNeeded / this.refillRate;
    return this.lastRefill + (timeToRefill * 1000);
  }
}

BYOK: Bring Your Own Key Deep Dive

Many users want to bring their own OpenAI/Anthropic API keys for cost control and compliance. This requires careful security handling:

export class BYOKManager {
  private readonly encryptionKey: Buffer;

  constructor() {
    this.encryptionKey = crypto.scryptSync(
      process.env.BYOK_PASSWORD!, 
      process.env.BYOK_SALT!,
      32
    );
  }

  async storeUserKey(
    userId: string,
    provider: string,
    apiKey: string,
    metadata?: KeyMetadata
  ): Promise<void> {
    // Validate key before storing
    const isValid = await this.validateProviderKey(provider, apiKey);
    if (!isValid) {
      throw new Error('Invalid API key for provider');
    }

    // Encrypt the key
    const iv = crypto.randomBytes(16);
    const cipher = crypto.createCipherGCM('aes-256-gcm', this.encryptionKey);
    cipher.setAAD(Buffer.from(userId)); // Additional authenticated data

    let encrypted = cipher.update(apiKey, 'utf8', 'hex');
    encrypted += cipher.final('hex');
    const authTag = cipher.getAuthTag();

    // Store encrypted key
    await this.dynamoClient.put({
      TableName: 'ai-platform-user-keys',
      Item: {
        userId,
        provider,
        encryptedKey: encrypted,
        iv: iv.toString('hex'),
        authTag: authTag.toString('hex'),
        metadata,
        createdAt: Date.now(),
        lastValidated: Date.now(),
        status: 'active'
      }
    }).promise();
  }

  async getUserKey(userId: string, provider: string): Promise<string | null> {
    const result = await this.dynamoClient.get({
      TableName: 'ai-platform-user-keys',
      Key: { userId, provider }
    }).promise();

    if (!result.Item) return null;

    const { encryptedKey, iv, authTag } = result.Item;

    // Decrypt the key
    const decipher = crypto.createDecipherGCM('aes-256-gcm', this.encryptionKey);
    decipher.setAAD(Buffer.from(userId));
    decipher.setAuthTag(Buffer.from(authTag, 'hex'));

    let decrypted = decipher.update(encryptedKey, 'hex', 'utf8');
    decrypted += decipher.final('utf8');

    return decrypted;
  }

  private async validateProviderKey(provider: string, apiKey: string): Promise<boolean> {
    try {
      switch (provider) {
        case 'openai':
          const openai = new OpenAI({ apiKey });
          await openai.models.list();
          return true;

        case 'anthropic':
          const anthropic = new Anthropic({ apiKey });
          await anthropic.messages.create({
            model: 'claude-3-haiku-20240307',
            messages: [{ role: 'user', content: 'test' }],
            max_tokens: 1
          });
          return true;

        default:
          return false;
      }
    } catch (error) {
      return false;
    }
  }
}

Monitoring and Alerting

CloudWatch dashboards show real-time platform health:

export class MonitoringService {
  async createDashboard(): Promise<void> {
    await this.cloudwatchClient.putDashboard({
      DashboardName: 'ai-platform-production',
      DashboardBody: JSON.stringify({
        widgets: [
          {
            type: 'metric',
            properties: {
              metrics: [
                ['AWS/Lambda', 'Duration', 'FunctionName', 'ai-platform-gateway'],
                ['AWS/Lambda', 'Errors', 'FunctionName', 'ai-platform-gateway'],
                ['AWS/ECS', 'CPUUtilization', 'ServiceName', 'ai-agents'],
                ['AWS/ECS', 'MemoryUtilization', 'ServiceName', 'ai-agents']
              ],
              period: 300,
              stat: 'Average',
              region: 'us-east-1',
              title: 'Infrastructure Health'
            }
          },
          {
            type: 'metric',
            properties: {
              metrics: [
                ['ai-platform', 'RequestCount'],
                ['ai-platform', 'TokensProcessed'],
                ['ai-platform', 'CostPerHour'],
                ['ai-platform', 'ErrorRate']
              ],
              period: 300,
              stat: 'Sum',
              title: 'Business Metrics'
            }
          }
        ]
      })
    }).promise();
  }

  async setupCostAnomalyDetection(): Promise<void> {
    // Alert if hourly costs exceed 150% of baseline
    await this.cloudwatchClient.putAnomalyAlarm({
      AlarmName: 'ai-platform-cost-anomaly',
      MetricName: 'CostPerHour',
      Namespace: 'ai-platform',
      Statistic: 'Sum',
      AnomalyDetector: {
        MetricMathAnomalyDetector: {
          MetricDataQueries: [
            {
              Id: 'cost_per_hour',
              MetricStat: {
                Metric: {
                  MetricName: 'CostPerHour',
                  Namespace: 'ai-platform'
                },
                Period: 3600,
                Stat: 'Sum'
              }
            }
          ]
        }
      },
      ComparisonOperator: 'GreaterThanUpperThreshold',
      EvaluationPeriods: 2,
      AlarmActions: [process.env.ALERT_SNS_TOPIC]
    }).promise();
  }
}

Security Hardening Checklist

Here's my production security configuration:

Secrets Management:

// All secrets in AWS Systems Manager Parameter Store
const config = {
  jwtSecret: await getParameter('/ai-platform/jwt-secret', true),
  encryptionKey: await getParameter('/ai-platform/encryption-key', true),
  openaiKey: await getParameter('/ai-platform/openai-key', true)
};

VPC Configuration for ECS:

# CDK construct for secure ECS
new ecs.FargateService(this, 'AgentService', {
  cluster,
  taskDefinition,
  vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
  securityGroups: [privateSecurityGroup],
  assignPublicIp: false
});

# Security group: only allows outbound HTTPS
privateSecurityGroup.addEgressRule(
  ec2.Peer.anyIpv4(),
  ec2.Port.tcp(443),
  'HTTPS outbound for API calls'
);

API Gateway with WAF:

new wafv2.WebAcl(this, 'ApiWaf', {
  scope: wafv2.Scope.REGIONAL,
  rules: [
    {
      name: 'RateLimitRule',
      priority: 1,
      action: wafv2.WafAction.block(),
      statement: wafv2.WafStatement.rateBasedStatement({
        limit: 1000, // requests per 5 minutes
        aggregateKeyType: wafv2.AggregateKeyType.IP
      })
    },
    {
      name: 'GeoRestrictRule', 
      priority: 2,
      action: wafv2.WafAction.block(),
      statement: wafv2.WafStatement.geoMatchStatement({
        countryCodes: ['CN', 'RU'] // Block certain countries
      })
    }
  ]
});

Cost Breakdown: What This Actually Costs

After 8 months in production serving 1,500 requests/day:

Fixed Infrastructure Costs (Monthly):

API Gateway: $3.50 (1M requests)
Lambda (Gateway): $8.20 (compute + requests)
ECS Fargate: $15.40 (avg 2 tasks running)
DynamoDB: $6.80 (usage tracking + budgets)
Application Load Balancer: $16.20
CloudWatch: $4.30
Total Fixed: $54.40/month

Variable AI Costs (Pass-through):

OpenAI API: $340-890/month (user-dependent)
Anthropic API: $180-420/month
AWS Bedrock: $45-120/month
Total Variable: User-driven, 2% platform markup

Cost Optimization Wins:

Moved summarization to Claude Haiku: 60% cost reduction
Implemented response caching: 25% fewer API calls
BYOK adoption: 70% of users, zero platform AI costs

Real Production Incidents

Incident 1: Memory Leak in ECS Agent

Symptom: Tasks consuming 8GB RAM, getting OOM killed
Root cause: Long conversations not being garbage collected
Fix: Added conversation pruning after 50 messages
Prevention: Memory monitoring alerts at 80% usage

Incident 2: DynamoDB Throttling

Symptom: Budget checks failing, users getting 500 errors
Root cause: Hot partition on userId during peak traffic
Fix: Added requestId to partition key for better distribution
Prevention: DynamoDB on-demand billing mode

Incident 3: BYOK Key Validation Loop

Symptom: Users unable to update API keys
Root cause: Validation calling itself recursively
Fix: Separate validation context from normal request context
Prevention: Integration tests for all BYOK flows

The Security Mindset Shift

Building production AI infrastructure changed how I think about security. Traditional web apps have predictable resource usage. AI apps can consume unlimited resources with a single malicious prompt.

Every endpoint needs three guards:

Authentication - Who are you?
Authorization - What can you do?
Budget control - How much can you spend?

Miss any one, and you're vulnerable.

What's Next

The complete production setup is documented in my ai-platform-aws-examples repo.

Next week, I'll tie everything together with a complete deployment walkthrough. You'll see how to go from zero to a fully operational AI platform in under an hour.

But more importantly, I'll share the real numbers: what this platform actually costs to run, performance metrics from production, and the roadmap for what's coming next.

Because the best architecture means nothing if you can't deploy it reliably.

This is part 7 of an 8-part series documenting my journey building an AI platform on AWS. Next week: the complete deployment guide and lessons from 8 months in production.

DEV Community