DEV Community

Cover image for Deploy a Production AI Platform on AWS for $100/month
Tyson Cung
Tyson Cung

Posted on

Deploy a Production AI Platform on AWS for $100/month

From seven broken Lambda functions to a production AI platform in 8 articles.

That's the journey we've taken together. Functions that couldn't communicate, hit timeout walls, and left users staring at loading spinners. Now you get a complete platform that orchestrates complex workflows, streams real-time updates, and won't bankrupt your startup.

This isn't a toy example. The architecture I'm about to show you serves 1,500+ requests daily, has survived 8 months in production, and handles everything from document analysis to multi-step research tasks.

Time to deploy it.

The Complete Architecture

Before we dive into deployment, here's what we're building:

AI Platform AWS Architecture
- Content Classification
The data flow:

  1. API Gateway receives requests, handles auth, enforces rate limits
  2. Gateway Lambda validates requests, checks budgets, routes to appropriate service
  3. ECS Agents orchestrate multi-step workflows using Lambda tools
  4. Lambda Tools perform specific AI tasks (summarize, extract, classify)
  5. DynamoDB tracks usage, manages budgets, stores user data
  6. WebSocket streams real-time updates back to clients

Prerequisites: Bootstrap Your Environment

First, let's set up the deployment environment:

# Install AWS CDK
npm install -g aws-cdk

# Clone the platform
git clone https://github.com/tysoncung/ai-platform-aws.git
cd ai-platform-aws

# Install dependencies
npm install
npm run install:all  # Installs in all packages

# Bootstrap CDK (one time per account/region)
npx cdk bootstrap

# Create environment file
cp .env.example .env
Enter fullscreen mode Exit fullscreen mode

Edit .env with your configuration:

# AWS Configuration
AWS_REGION=us-east-1
AWS_ACCOUNT_ID=123456789012

# AI Provider API Keys
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key

# Platform Configuration
PLATFORM_ENVIRONMENT=production
COST_TRACKING_ENABLED=true
BUDGET_ALERTS_ENABLED=true

# Monitoring
SLACK_WEBHOOK_URL=https://hooks.slack.com/your-webhook
ALERT_EMAIL=you@company.com

# Security
JWT_SECRET_KEY=your-super-secret-jwt-key
ENCRYPTION_SALT=your-encryption-salt
Enter fullscreen mode Exit fullscreen mode

Local Development Setup

Before deploying to AWS, let's run everything locally with Docker Compose:

# docker-compose.yml
version: '3.8'

services:
  api-gateway:
    build:
      context: ./packages/gateway
      dockerfile: Dockerfile.dev
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=development
      - DYNAMODB_ENDPOINT=http://dynamodb:8000
      - AGENT_ENDPOINT=http://agent:3001
    depends_on:
      - dynamodb
      - agent

  agent:
    build:
      context: ./packages/agents
      dockerfile: Dockerfile.dev
    ports:
      - "3001:3001"
    environment:
      - NODE_ENV=development
      - LAMBDA_ENDPOINT=http://lambda-tools:3002
    depends_on:
      - lambda-tools

  lambda-tools:
    build:
      context: ./packages/tools
      dockerfile: Dockerfile.dev
    ports:
      - "3002:3002"
    environment:
      - NODE_ENV=development
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}

  dynamodb:
    image: amazon/dynamodb-local:latest
    ports:
      - "8000:8000"
    command: ["-jar", "DynamoDBLocal.jar", "-sharedDb", "-inMemory"]

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
Enter fullscreen mode Exit fullscreen mode

Start the local environment:

# Start all services
docker-compose up -d

# Run database migrations
npm run db:migrate:local

# Seed with sample data
npm run db:seed:local

# Test the platform
curl http://localhost:3000/health
Enter fullscreen mode Exit fullscreen mode

CDK Stack Composition

The platform is composed of multiple CDK stacks for better separation of concerns:

// bin/deploy.ts
import { AIGatewayStack } from '../lib/gateway-stack';
import { AIAgentsStack } from '../lib/agents-stack';
import { AIToolsStack } from '../lib/tools-stack';
import { AIMonitoringStack } from '../lib/monitoring-stack';
import { AISecurityStack } from '../lib/security-stack';

const app = new cdk.App();
const env = { 
  account: process.env.CDK_DEFAULT_ACCOUNT, 
  region: process.env.CDK_DEFAULT_REGION 
};

// Security layer (VPC, IAM, KMS)
const securityStack = new AISecurityStack(app, 'AISecurityStack', { env });

// Lambda tools layer
const toolsStack = new AIToolsStack(app, 'AIToolsStack', {
  env,
  vpc: securityStack.vpc,
  securityGroup: securityStack.lambdaSecurityGroup
});

// ECS agents layer
const agentsStack = new AIAgentsStack(app, 'AIAgentsStack', {
  env,
  vpc: securityStack.vpc,
  securityGroup: securityStack.ecsSecurityGroup,
  toolsArns: toolsStack.functionArns
});

// API Gateway layer
const gatewayStack = new AIGatewayStack(app, 'AIGatewayStack', {
  env,
  agentsCluster: agentsStack.cluster,
  agentsService: agentsStack.service,
  toolsArns: toolsStack.functionArns
});

// Monitoring and alerting
new AIMonitoringStack(app, 'AIMonitoringStack', {
  env,
  gatewayApi: gatewayStack.api,
  agentsService: agentsStack.service,
  toolsFunctions: toolsStack.functions
});
Enter fullscreen mode Exit fullscreen mode

Here's the gateway stack implementation:

// lib/gateway-stack.ts
export class AIGatewayStack extends cdk.Stack {
  public readonly api: apigateway.RestApi;

  constructor(scope: Construct, id: string, props: AIGatewayStackProps) {
    super(scope, id, props);

    // DynamoDB tables
    const usageTable = new dynamodb.Table(this, 'UsageTable', {
      tableName: 'ai-platform-usage',
      partitionKey: { name: 'userId', type: dynamodb.AttributeType.STRING },
      sortKey: { name: 'timestamp', type: dynamodb.AttributeType.NUMBER },
      billingMode: dynamodb.BillingMode.ON_DEMAND,
      timeToLiveAttribute: 'ttl'
    });

    const budgetTable = new dynamodb.Table(this, 'BudgetTable', {
      tableName: 'ai-platform-budgets',
      partitionKey: { name: 'userId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.ON_DEMAND
    });

    // Gateway Lambda function
    const gatewayFunction = new lambda.Function(this, 'GatewayFunction', {
      runtime: lambda.Runtime.NODEJS_18_X,
      code: lambda.Code.fromAsset('packages/gateway/dist'),
      handler: 'index.handler',
      timeout: cdk.Duration.seconds(30),
      memorySize: 512,
      environment: {
        USAGE_TABLE_NAME: usageTable.tableName,
        BUDGET_TABLE_NAME: budgetTable.tableName,
        AGENTS_CLUSTER_ARN: props.agentsCluster.clusterArn,
        AGENTS_SERVICE_ARN: props.agentsService.serviceArn,
        TOOLS_ARNS: JSON.stringify(props.toolsArns)
      }
    });

    // Grant permissions
    usageTable.grantReadWriteData(gatewayFunction);
    budgetTable.grantReadWriteData(gatewayFunction);

    // API Gateway
    this.api = new apigateway.RestApi(this, 'AIApi', {
      restApiName: 'AI Platform API',
      description: 'AI Platform REST API',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization']
      }
    });

    // API Gateway integration
    const lambdaIntegration = new apigateway.LambdaIntegration(gatewayFunction);

    // Routes
    const v1 = this.api.root.addResource('v1');

    v1.addResource('complete').addMethod('POST', lambdaIntegration);
    v1.addResource('embed').addMethod('POST', lambdaIntegration);
    v1.addResource('stream').addMethod('POST', lambdaIntegration);

    const agents = v1.addResource('agents');
    agents.addResource('run').addMethod('POST', lambdaIntegration);
    agents.addResource('stream').addMethod('POST', lambdaIntegration);

    // Usage and budget endpoints
    const usage = v1.addResource('usage');
    usage.addMethod('GET', lambdaIntegration); // Get usage stats
    usage.addResource('budget').addMethod('GET', lambdaIntegration);
    usage.addResource('budget').addMethod('PUT', lambdaIntegration);

    // WebSocket API for streaming
    const webSocketApi = new apigatewayv2.WebSocketApi(this, 'StreamingAPI', {
      apiName: 'AI Platform Streaming',
      connectRouteOptions: {
        integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
          'ConnectIntegration',
          gatewayFunction
        )
      },
      disconnectRouteOptions: {
        integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
          'DisconnectIntegration',
          gatewayFunction
        )
      },
      defaultRouteOptions: {
        integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
          'DefaultIntegration',
          gatewayFunction
        )
      }
    });

    new apigatewayv2.WebSocketStage(this, 'StreamingStage', {
      webSocketApi,
      stageName: 'prod',
      autoDeploy: true
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

Step-by-Step Deployment

Now let's deploy everything:

# 1. Validate CDK configuration
npx cdk doctor

# 2. Review what will be deployed
npx cdk diff

# 3. Deploy security stack first
npx cdk deploy AISecurityStack

# 4. Deploy Lambda tools
npx cdk deploy AIToolsStack

# 5. Deploy ECS agents
npx cdk deploy AIAgentsStack

# 6. Deploy API Gateway
npx cdk deploy AIGatewayStack

# 7. Deploy monitoring
npx cdk deploy AIMonitoringStack

# Or deploy everything at once
npx cdk deploy --all
Enter fullscreen mode Exit fullscreen mode

The deployment takes about 15 minutes. You'll see output like:

AIGatewayStack.APIEndpoint = https://abc123.execute-api.us-east-1.amazonaws.com/v1
AIGatewayStack.WebSocketEndpoint = wss://def456.execute-api.us-east-1.amazonaws.com/prod
AIAgentsStack.ClusterName = ai-platform-agents
AIToolsStack.SummarizeFunctionArn = arn:aws:lambda:us-east-1:123456789012:function:summarize
Enter fullscreen mode Exit fullscreen mode

Configure AI Providers

Once deployed, configure your AI provider credentials:

# Store API keys in AWS Systems Manager
aws ssm put-parameter \
  --name "/ai-platform/openai-api-key" \
  --value "sk-your-openai-key" \
  --type "SecureString"

aws ssm put-parameter \
  --name "/ai-platform/anthropic-api-key" \
  --value "sk-ant-your-anthropic-key" \
  --type "SecureString"

# Update the deployed functions with the new parameter names
npx cdk deploy AIToolsStack AIGatewayStack
Enter fullscreen mode Exit fullscreen mode

Testing Your Deployment

Let's test the complete platform:

# 1. Health check
curl https://your-api-endpoint.execute-api.us-east-1.amazonaws.com/v1/health

# 2. Create an API key
curl -X POST https://your-api-endpoint/v1/auth/keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Test Key",
    "scopes": ["ai:complete", "ai:embed", "agent:run"],
    "monthlyBudget": 50
  }'

# Returns: {"apiKey": "sk-proj-abc123...", "keyId": "sk-proj-abc"}

# 3. Test completion
curl -X POST https://your-api-endpoint/v1/complete \
  -H "Authorization: Bearer sk-proj-abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Write a haiku about TypeScript"}],
    "model": "gpt-4",
    "temperature": 0.8
  }'

# 4. Test agent workflow
curl -X POST https://your-api-endpoint/v1/agents/run \
  -H "Authorization: Bearer sk-proj-abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "type": "research",
    "input": {"topic": "renewable energy trends"},
    "tools": ["search", "summarize", "extract"]
  }'
Enter fullscreen mode Exit fullscreen mode

Dashboard Tour

The platform includes a built-in dashboard at /dashboard. Here's what you'll see:

Usage Overview:

  • Requests per day/hour
  • Token consumption by model
  • Cost breakdown by user
  • Success/error rates

Real-time Monitoring:

  • Active agent sessions
  • Queue depth for tools
  • Response time percentiles
  • Error alerts

Budget Management:

  • Per-user spend tracking
  • Budget utilization alerts
  • Cost projections
  • BYOK vs platform credit usage

System Health:

  • Lambda cold start metrics
  • ECS task utilization
  • DynamoDB performance
  • API Gateway latency

You can access it at: https://your-api-endpoint/dashboard

Performance Numbers from Production

Here are the real metrics from 8 months running in production:

Latency (P95):

  • Simple completion: 1.2s
  • Streaming completion: 180ms to first token
  • Agent workflow (3 tools): 12s
  • API Gateway overhead: 45ms
  • Lambda cold start: 850ms (mitigated with provisioned concurrency)

Throughput:

  • Sustained: 50 requests/second
  • Burst: 200 requests/second (before rate limiting)
  • Agent concurrency: 15 parallel workflows
  • Tool execution: 100 parallel Lambda invocations

Reliability:

  • Uptime: 99.8%
  • Error rate: 0.4%
  • P99 latency SLA: 5s (met 98.9% of the time)
  • Budget enforcement accuracy: 99.99%

Cost Optimization Wins:

  • Response caching: 25% reduction in API calls
  • Smart model selection: 40% cost reduction (Claude Haiku for summaries)
  • BYOK adoption: 70% of users, eliminating platform AI costs
  • Lambda right-sizing: 30% reduction in compute costs

Cost Breakdown: What This Actually Costs

Fixed Infrastructure (Monthly):

API Gateway:           $3.50   (1M requests)
Lambda (Gateway):      $8.20   (compute + requests)
ECS Fargate:          $15.40   (2 tasks avg)
DynamoDB:             $6.80    (usage + budgets)
Application Load Balancer: $16.20
NAT Gateway:          $45.00   (data transfer)
CloudWatch:           $4.30    (logs + metrics)
Route 53:             $0.50    (hosted zone)
----
Total Fixed:          $99.90/month
Enter fullscreen mode Exit fullscreen mode

Variable Costs:

  • AI API costs: Pass-through with 2% platform markup
  • Data transfer: $0.09/GB out of AWS
  • Lambda executions: $0.20 per million requests
  • DynamoDB reads/writes: $0.25 per million operations

Real customer costs (excluding AI API):

  • Light usage (500 req/month): $12/month
  • Medium usage (5K req/month): $35/month
  • Heavy usage (50K req/month): $120/month

The platform is cost-effective for most use cases. The break-even point vs building your own infrastructure is around 2,000 requests per month.

Cold Start Mitigation

Lambda cold starts were killing our performance. Here's how we solved it:

// Provisioned concurrency for critical functions
new lambda.Function(this, 'GatewayFunction', {
  // ... other config
  reservedConcurrencyLimit: 10,
  provisionedConcurrencyConfig: {
    provisionedConcurrentExecutions: 5
  }
});

// Keep-warm function that pings Lambdas every 5 minutes
new events.Rule(this, 'KeepWarmRule', {
  schedule: events.Schedule.rate(cdk.Duration.minutes(5)),
  targets: [
    new targets.LambdaFunction(gatewayFunction, {
      event: events.RuleTargetInput.fromObject({ warmup: true })
    })
  ]
});

// In Lambda handler - respond quickly to warmup
export const handler = async (event: any) => {
  if (event.warmup) {
    return { statusCode: 200, body: 'warm' };
  }

  // Normal processing...
};
Enter fullscreen mode Exit fullscreen mode

Result: Cold start rate dropped from 23% to 3% of requests.

Open Source Roadmap

This platform is completely open source. Here's what's coming next:

Q2 2026:

  • [ ] Multi-region deployment support
  • [ ] GraphQL API alongside REST
  • [ ] Built-in vector database (Pinecone integration)
  • [ ] Advanced agent memory management

Q3 2026:

  • [ ] Kubernetes support (alternative to ECS)
  • [ ] Multi-tenant isolation improvements
  • [ ] Advanced cost optimization (spot instances)
  • [ ] Plugin system for custom tools

Q4 2026:

  • [ ] Edge deployment (CloudFlare Workers)
  • [ ] Real-time collaboration features
  • [ ] Advanced monitoring and observability
  • [ ] Enterprise SSO integration

Community Requests:

  • Google Cloud and Azure support
  • Terraform modules (alternative to CDK)
  • Python SDK alongside TypeScript
  • Zapier/Make.com integrations

Contributing and Community

The entire platform is open source under MIT license. Everything I've built, you can use, modify, and improve.

Repositories:

How to help:

  1. Star the repositories - helps others discover the project
  2. Try the full deployment - example 07-full-stack has everything
  3. Report deployment issues - especially AWS region differences
  4. Submit improvements - see CONTRIBUTING.md for guidelines
  5. Share your experience - what are you building with it?

Connect:

What We Built Together

Eight articles. One complete AI platform.

We started with seven broken Lambda functions. We built:

  • Agent orchestration that handles complex multi-step workflows without timeouts
  • TypeScript SDK with perfect IntelliSense, streaming support, and smart error handling
  • Cost control that prevents $2,847 surprises with budgets and rate limits
  • Production security with authentication, encryption, and monitoring
  • One-command deployment that gets you running in under an hour

The platform serves 1,500+ requests daily. It's survived 8 months in production. It's processing everything from document analysis to research workflows. And it's completely open source.

The Hard-Won Lessons

Building production AI infrastructure taught me things tutorials never mention:

Technical truths:

  • Cost control is life support, not a nice-to-have feature
  • Lambda excels at tools, fails at orchestration
  • Streaming looks simple, implementation is brutal
  • Type safety prevents expensive mistakes at 3AM

Business realities:

  • Developers pay for great experience, abandon bad APIs
  • Open source builds trust better than marketing
  • Production numbers matter more than perfect demos
  • Failure stories teach more than success posts

Personal discoveries:

  • Building in public creates accountability
  • Documentation is your product's face
  • Shipping beats perfecting every time
  • Sharing mistakes helps everyone improve

Your Turn

You have everything you need. Real code, real examples, real production lessons. The platform is MIT licensed - use it, improve it, make money with it.

Next steps:

  1. Star the repos - ai-platform-aws and examples
  2. Deploy example 07 - full platform in under an hour
  3. Build something cool - then tell me about it
  4. Share your experience - help others learn from your journey

Get stuck? Email me at tyson@hivo.co or find me on Twitter @tysoncung.

The AI revolution needs better infrastructure. You can build it.

Go.


End of series: "Building an AI Platform on AWS from Scratch". Complete platform and examples at github.com/tysoncung/ai-platform-aws.

Top comments (8)

Collapse
 
nyrok profile image
Hamza KONTE

Great breakdown of the AWS AI stack! One thing I'd add to your prompt engineering layer: the quality of prompts sent to your models matters as much as the infrastructure itself.

I built flompt (flompt.dev) — a free visual prompt builder that decomposes prompts into 12 semantic blocks (role, context, constraints, examples, output format, etc.) and recompiles them into Claude-optimized XML. At scale on production AI platforms, structured prompts make a huge difference in consistency and output quality. There's also an MCP server for Claude Code agents: claude mcp add flompt https://flompt.dev/mcp/

Solid post — bookmarking for my next AWS deployment!

Collapse
 
tyson_cung profile image
Tyson Cung

Thanks! Good point on prompt engineering. We actually handle that at the Gateway Lambda layer with prompt templates per use case (summarisation, extraction, classification each get optimised system prompts). Structured prompts definitely help with consistency, especially when you're routing across

Collapse
 
nyrok profile image
Hamza KONTE

That's exactly the right architecture — per-use-case templates at the gateway level so each task gets optimized instructions. The summarisation vs extraction vs classification distinction matters a lot for consistency.

If you ever want to iterate on those templates more visually, flompt.dev (github.com/Nyrok/flompt) might be useful — the block-based editor makes it easier to tweak one component (say, output format) without accidentally breaking the rest of the prompt.

Collapse
 
tyson_cung profile image
Tyson Cung

Exactly, the routing layer is key. We version our prompt templates alongside the code so they go through the same PR review process. Haven't tried a visual builder for it yet but I can see the value for non-technical team members who need to tweak prompts. Will take a look.

Collapse
 
nyrok profile image
Hamza KONTE

The PR review process for prompt templates is such a good idea — treating prompts as first-class code artifacts. The visual builder actually makes that iteration loop faster before a template even hits a PR, since you can decompose and restructure without manually editing XML/JSON. Happy you're checking it out! flompt.dev / github.com/Nyrok/flompt

Collapse
 
nyrok profile image
Hamza KONTE

That Gateway Lambda pattern for per-use-case system prompts is exactly the right approach. The routing layer knowing which prompt template to apply makes the whole system more predictable. One thing that helped me when iterating on those templates: building each one in a visual tool (flompt.dev) so the role/constraints/output-format are explicit and easy to diff between versions. Makes it much easier to spot why summarisation and extraction need different structures.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.