From seven broken Lambda functions to a production AI platform in 8 articles.
That's the journey we've taken together. Functions that couldn't communicate, hit timeout walls, and left users staring at loading spinners. Now you get a complete platform that orchestrates complex workflows, streams real-time updates, and won't bankrupt your startup.
This isn't a toy example. The architecture I'm about to show you serves 1,500+ requests daily, has survived 8 months in production, and handles everything from document analysis to multi-step research tasks.
Time to deploy it.
The Complete Architecture
Before we dive into deployment, here's what we're building:

- Content Classification
The data flow:
- API Gateway receives requests, handles auth, enforces rate limits
- Gateway Lambda validates requests, checks budgets, routes to appropriate service
- ECS Agents orchestrate multi-step workflows using Lambda tools
- Lambda Tools perform specific AI tasks (summarize, extract, classify)
- DynamoDB tracks usage, manages budgets, stores user data
- WebSocket streams real-time updates back to clients
Prerequisites: Bootstrap Your Environment
First, let's set up the deployment environment:
# Install AWS CDK
npm install -g aws-cdk
# Clone the platform
git clone https://github.com/tysoncung/ai-platform-aws.git
cd ai-platform-aws
# Install dependencies
npm install
npm run install:all # Installs in all packages
# Bootstrap CDK (one time per account/region)
npx cdk bootstrap
# Create environment file
cp .env.example .env
Edit .env with your configuration:
# AWS Configuration
AWS_REGION=us-east-1
AWS_ACCOUNT_ID=123456789012
# AI Provider API Keys
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
# Platform Configuration
PLATFORM_ENVIRONMENT=production
COST_TRACKING_ENABLED=true
BUDGET_ALERTS_ENABLED=true
# Monitoring
SLACK_WEBHOOK_URL=https://hooks.slack.com/your-webhook
ALERT_EMAIL=you@company.com
# Security
JWT_SECRET_KEY=your-super-secret-jwt-key
ENCRYPTION_SALT=your-encryption-salt
Local Development Setup
Before deploying to AWS, let's run everything locally with Docker Compose:
# docker-compose.yml
version: '3.8'
services:
api-gateway:
build:
context: ./packages/gateway
dockerfile: Dockerfile.dev
ports:
- "3000:3000"
environment:
- NODE_ENV=development
- DYNAMODB_ENDPOINT=http://dynamodb:8000
- AGENT_ENDPOINT=http://agent:3001
depends_on:
- dynamodb
- agent
agent:
build:
context: ./packages/agents
dockerfile: Dockerfile.dev
ports:
- "3001:3001"
environment:
- NODE_ENV=development
- LAMBDA_ENDPOINT=http://lambda-tools:3002
depends_on:
- lambda-tools
lambda-tools:
build:
context: ./packages/tools
dockerfile: Dockerfile.dev
ports:
- "3002:3002"
environment:
- NODE_ENV=development
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
dynamodb:
image: amazon/dynamodb-local:latest
ports:
- "8000:8000"
command: ["-jar", "DynamoDBLocal.jar", "-sharedDb", "-inMemory"]
redis:
image: redis:7-alpine
ports:
- "6379:6379"
Start the local environment:
# Start all services
docker-compose up -d
# Run database migrations
npm run db:migrate:local
# Seed with sample data
npm run db:seed:local
# Test the platform
curl http://localhost:3000/health
CDK Stack Composition
The platform is composed of multiple CDK stacks for better separation of concerns:
// bin/deploy.ts
import { AIGatewayStack } from '../lib/gateway-stack';
import { AIAgentsStack } from '../lib/agents-stack';
import { AIToolsStack } from '../lib/tools-stack';
import { AIMonitoringStack } from '../lib/monitoring-stack';
import { AISecurityStack } from '../lib/security-stack';
const app = new cdk.App();
const env = {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: process.env.CDK_DEFAULT_REGION
};
// Security layer (VPC, IAM, KMS)
const securityStack = new AISecurityStack(app, 'AISecurityStack', { env });
// Lambda tools layer
const toolsStack = new AIToolsStack(app, 'AIToolsStack', {
env,
vpc: securityStack.vpc,
securityGroup: securityStack.lambdaSecurityGroup
});
// ECS agents layer
const agentsStack = new AIAgentsStack(app, 'AIAgentsStack', {
env,
vpc: securityStack.vpc,
securityGroup: securityStack.ecsSecurityGroup,
toolsArns: toolsStack.functionArns
});
// API Gateway layer
const gatewayStack = new AIGatewayStack(app, 'AIGatewayStack', {
env,
agentsCluster: agentsStack.cluster,
agentsService: agentsStack.service,
toolsArns: toolsStack.functionArns
});
// Monitoring and alerting
new AIMonitoringStack(app, 'AIMonitoringStack', {
env,
gatewayApi: gatewayStack.api,
agentsService: agentsStack.service,
toolsFunctions: toolsStack.functions
});
Here's the gateway stack implementation:
// lib/gateway-stack.ts
export class AIGatewayStack extends cdk.Stack {
public readonly api: apigateway.RestApi;
constructor(scope: Construct, id: string, props: AIGatewayStackProps) {
super(scope, id, props);
// DynamoDB tables
const usageTable = new dynamodb.Table(this, 'UsageTable', {
tableName: 'ai-platform-usage',
partitionKey: { name: 'userId', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'timestamp', type: dynamodb.AttributeType.NUMBER },
billingMode: dynamodb.BillingMode.ON_DEMAND,
timeToLiveAttribute: 'ttl'
});
const budgetTable = new dynamodb.Table(this, 'BudgetTable', {
tableName: 'ai-platform-budgets',
partitionKey: { name: 'userId', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.ON_DEMAND
});
// Gateway Lambda function
const gatewayFunction = new lambda.Function(this, 'GatewayFunction', {
runtime: lambda.Runtime.NODEJS_18_X,
code: lambda.Code.fromAsset('packages/gateway/dist'),
handler: 'index.handler',
timeout: cdk.Duration.seconds(30),
memorySize: 512,
environment: {
USAGE_TABLE_NAME: usageTable.tableName,
BUDGET_TABLE_NAME: budgetTable.tableName,
AGENTS_CLUSTER_ARN: props.agentsCluster.clusterArn,
AGENTS_SERVICE_ARN: props.agentsService.serviceArn,
TOOLS_ARNS: JSON.stringify(props.toolsArns)
}
});
// Grant permissions
usageTable.grantReadWriteData(gatewayFunction);
budgetTable.grantReadWriteData(gatewayFunction);
// API Gateway
this.api = new apigateway.RestApi(this, 'AIApi', {
restApiName: 'AI Platform API',
description: 'AI Platform REST API',
defaultCorsPreflightOptions: {
allowOrigins: apigateway.Cors.ALL_ORIGINS,
allowMethods: apigateway.Cors.ALL_METHODS,
allowHeaders: ['Content-Type', 'Authorization']
}
});
// API Gateway integration
const lambdaIntegration = new apigateway.LambdaIntegration(gatewayFunction);
// Routes
const v1 = this.api.root.addResource('v1');
v1.addResource('complete').addMethod('POST', lambdaIntegration);
v1.addResource('embed').addMethod('POST', lambdaIntegration);
v1.addResource('stream').addMethod('POST', lambdaIntegration);
const agents = v1.addResource('agents');
agents.addResource('run').addMethod('POST', lambdaIntegration);
agents.addResource('stream').addMethod('POST', lambdaIntegration);
// Usage and budget endpoints
const usage = v1.addResource('usage');
usage.addMethod('GET', lambdaIntegration); // Get usage stats
usage.addResource('budget').addMethod('GET', lambdaIntegration);
usage.addResource('budget').addMethod('PUT', lambdaIntegration);
// WebSocket API for streaming
const webSocketApi = new apigatewayv2.WebSocketApi(this, 'StreamingAPI', {
apiName: 'AI Platform Streaming',
connectRouteOptions: {
integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
'ConnectIntegration',
gatewayFunction
)
},
disconnectRouteOptions: {
integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
'DisconnectIntegration',
gatewayFunction
)
},
defaultRouteOptions: {
integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
'DefaultIntegration',
gatewayFunction
)
}
});
new apigatewayv2.WebSocketStage(this, 'StreamingStage', {
webSocketApi,
stageName: 'prod',
autoDeploy: true
});
}
}
Step-by-Step Deployment
Now let's deploy everything:
# 1. Validate CDK configuration
npx cdk doctor
# 2. Review what will be deployed
npx cdk diff
# 3. Deploy security stack first
npx cdk deploy AISecurityStack
# 4. Deploy Lambda tools
npx cdk deploy AIToolsStack
# 5. Deploy ECS agents
npx cdk deploy AIAgentsStack
# 6. Deploy API Gateway
npx cdk deploy AIGatewayStack
# 7. Deploy monitoring
npx cdk deploy AIMonitoringStack
# Or deploy everything at once
npx cdk deploy --all
The deployment takes about 15 minutes. You'll see output like:
AIGatewayStack.APIEndpoint = https://abc123.execute-api.us-east-1.amazonaws.com/v1
AIGatewayStack.WebSocketEndpoint = wss://def456.execute-api.us-east-1.amazonaws.com/prod
AIAgentsStack.ClusterName = ai-platform-agents
AIToolsStack.SummarizeFunctionArn = arn:aws:lambda:us-east-1:123456789012:function:summarize
Configure AI Providers
Once deployed, configure your AI provider credentials:
# Store API keys in AWS Systems Manager
aws ssm put-parameter \
--name "/ai-platform/openai-api-key" \
--value "sk-your-openai-key" \
--type "SecureString"
aws ssm put-parameter \
--name "/ai-platform/anthropic-api-key" \
--value "sk-ant-your-anthropic-key" \
--type "SecureString"
# Update the deployed functions with the new parameter names
npx cdk deploy AIToolsStack AIGatewayStack
Testing Your Deployment
Let's test the complete platform:
# 1. Health check
curl https://your-api-endpoint.execute-api.us-east-1.amazonaws.com/v1/health
# 2. Create an API key
curl -X POST https://your-api-endpoint/v1/auth/keys \
-H "Content-Type: application/json" \
-d '{
"name": "Test Key",
"scopes": ["ai:complete", "ai:embed", "agent:run"],
"monthlyBudget": 50
}'
# Returns: {"apiKey": "sk-proj-abc123...", "keyId": "sk-proj-abc"}
# 3. Test completion
curl -X POST https://your-api-endpoint/v1/complete \
-H "Authorization: Bearer sk-proj-abc123..." \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Write a haiku about TypeScript"}],
"model": "gpt-4",
"temperature": 0.8
}'
# 4. Test agent workflow
curl -X POST https://your-api-endpoint/v1/agents/run \
-H "Authorization: Bearer sk-proj-abc123..." \
-H "Content-Type: application/json" \
-d '{
"type": "research",
"input": {"topic": "renewable energy trends"},
"tools": ["search", "summarize", "extract"]
}'
Dashboard Tour
The platform includes a built-in dashboard at /dashboard. Here's what you'll see:
Usage Overview:
- Requests per day/hour
- Token consumption by model
- Cost breakdown by user
- Success/error rates
Real-time Monitoring:
- Active agent sessions
- Queue depth for tools
- Response time percentiles
- Error alerts
Budget Management:
- Per-user spend tracking
- Budget utilization alerts
- Cost projections
- BYOK vs platform credit usage
System Health:
- Lambda cold start metrics
- ECS task utilization
- DynamoDB performance
- API Gateway latency
You can access it at: https://your-api-endpoint/dashboard
Performance Numbers from Production
Here are the real metrics from 8 months running in production:
Latency (P95):
- Simple completion: 1.2s
- Streaming completion: 180ms to first token
- Agent workflow (3 tools): 12s
- API Gateway overhead: 45ms
- Lambda cold start: 850ms (mitigated with provisioned concurrency)
Throughput:
- Sustained: 50 requests/second
- Burst: 200 requests/second (before rate limiting)
- Agent concurrency: 15 parallel workflows
- Tool execution: 100 parallel Lambda invocations
Reliability:
- Uptime: 99.8%
- Error rate: 0.4%
- P99 latency SLA: 5s (met 98.9% of the time)
- Budget enforcement accuracy: 99.99%
Cost Optimization Wins:
- Response caching: 25% reduction in API calls
- Smart model selection: 40% cost reduction (Claude Haiku for summaries)
- BYOK adoption: 70% of users, eliminating platform AI costs
- Lambda right-sizing: 30% reduction in compute costs
Cost Breakdown: What This Actually Costs
Fixed Infrastructure (Monthly):
API Gateway: $3.50 (1M requests)
Lambda (Gateway): $8.20 (compute + requests)
ECS Fargate: $15.40 (2 tasks avg)
DynamoDB: $6.80 (usage + budgets)
Application Load Balancer: $16.20
NAT Gateway: $45.00 (data transfer)
CloudWatch: $4.30 (logs + metrics)
Route 53: $0.50 (hosted zone)
----
Total Fixed: $99.90/month
Variable Costs:
- AI API costs: Pass-through with 2% platform markup
- Data transfer: $0.09/GB out of AWS
- Lambda executions: $0.20 per million requests
- DynamoDB reads/writes: $0.25 per million operations
Real customer costs (excluding AI API):
- Light usage (500 req/month): $12/month
- Medium usage (5K req/month): $35/month
- Heavy usage (50K req/month): $120/month
The platform is cost-effective for most use cases. The break-even point vs building your own infrastructure is around 2,000 requests per month.
Cold Start Mitigation
Lambda cold starts were killing our performance. Here's how we solved it:
// Provisioned concurrency for critical functions
new lambda.Function(this, 'GatewayFunction', {
// ... other config
reservedConcurrencyLimit: 10,
provisionedConcurrencyConfig: {
provisionedConcurrentExecutions: 5
}
});
// Keep-warm function that pings Lambdas every 5 minutes
new events.Rule(this, 'KeepWarmRule', {
schedule: events.Schedule.rate(cdk.Duration.minutes(5)),
targets: [
new targets.LambdaFunction(gatewayFunction, {
event: events.RuleTargetInput.fromObject({ warmup: true })
})
]
});
// In Lambda handler - respond quickly to warmup
export const handler = async (event: any) => {
if (event.warmup) {
return { statusCode: 200, body: 'warm' };
}
// Normal processing...
};
Result: Cold start rate dropped from 23% to 3% of requests.
Open Source Roadmap
This platform is completely open source. Here's what's coming next:
Q2 2026:
- [ ] Multi-region deployment support
- [ ] GraphQL API alongside REST
- [ ] Built-in vector database (Pinecone integration)
- [ ] Advanced agent memory management
Q3 2026:
- [ ] Kubernetes support (alternative to ECS)
- [ ] Multi-tenant isolation improvements
- [ ] Advanced cost optimization (spot instances)
- [ ] Plugin system for custom tools
Q4 2026:
- [ ] Edge deployment (CloudFlare Workers)
- [ ] Real-time collaboration features
- [ ] Advanced monitoring and observability
- [ ] Enterprise SSO integration
Community Requests:
- Google Cloud and Azure support
- Terraform modules (alternative to CDK)
- Python SDK alongside TypeScript
- Zapier/Make.com integrations
Contributing and Community
The entire platform is open source under MIT license. Everything I've built, you can use, modify, and improve.
Repositories:
- Main platform: github.com/tysoncung/ai-platform-aws
- Working examples: github.com/tysoncung/ai-platform-aws-examples
How to help:
- Star the repositories - helps others discover the project
- Try the full deployment - example 07-full-stack has everything
- Report deployment issues - especially AWS region differences
- Submit improvements - see CONTRIBUTING.md for guidelines
- Share your experience - what are you building with it?
Connect:
- Email: tyson@hivo.co
- Twitter: @tysoncung
What We Built Together
Eight articles. One complete AI platform.
We started with seven broken Lambda functions. We built:
- Agent orchestration that handles complex multi-step workflows without timeouts
- TypeScript SDK with perfect IntelliSense, streaming support, and smart error handling
- Cost control that prevents $2,847 surprises with budgets and rate limits
- Production security with authentication, encryption, and monitoring
- One-command deployment that gets you running in under an hour
The platform serves 1,500+ requests daily. It's survived 8 months in production. It's processing everything from document analysis to research workflows. And it's completely open source.
The Hard-Won Lessons
Building production AI infrastructure taught me things tutorials never mention:
Technical truths:
- Cost control is life support, not a nice-to-have feature
- Lambda excels at tools, fails at orchestration
- Streaming looks simple, implementation is brutal
- Type safety prevents expensive mistakes at 3AM
Business realities:
- Developers pay for great experience, abandon bad APIs
- Open source builds trust better than marketing
- Production numbers matter more than perfect demos
- Failure stories teach more than success posts
Personal discoveries:
- Building in public creates accountability
- Documentation is your product's face
- Shipping beats perfecting every time
- Sharing mistakes helps everyone improve
Your Turn
You have everything you need. Real code, real examples, real production lessons. The platform is MIT licensed - use it, improve it, make money with it.
Next steps:
- Star the repos - ai-platform-aws and examples
- Deploy example 07 - full platform in under an hour
- Build something cool - then tell me about it
- Share your experience - help others learn from your journey
Get stuck? Email me at tyson@hivo.co or find me on Twitter @tysoncung.
The AI revolution needs better infrastructure. You can build it.
Go.
End of series: "Building an AI Platform on AWS from Scratch". Complete platform and examples at github.com/tysoncung/ai-platform-aws.
Top comments (8)
Great breakdown of the AWS AI stack! One thing I'd add to your prompt engineering layer: the quality of prompts sent to your models matters as much as the infrastructure itself.
I built flompt (flompt.dev) — a free visual prompt builder that decomposes prompts into 12 semantic blocks (role, context, constraints, examples, output format, etc.) and recompiles them into Claude-optimized XML. At scale on production AI platforms, structured prompts make a huge difference in consistency and output quality. There's also an MCP server for Claude Code agents:
claude mcp add flompt https://flompt.dev/mcp/Solid post — bookmarking for my next AWS deployment!
Thanks! Good point on prompt engineering. We actually handle that at the Gateway Lambda layer with prompt templates per use case (summarisation, extraction, classification each get optimised system prompts). Structured prompts definitely help with consistency, especially when you're routing across
That's exactly the right architecture — per-use-case templates at the gateway level so each task gets optimized instructions. The summarisation vs extraction vs classification distinction matters a lot for consistency.
If you ever want to iterate on those templates more visually, flompt.dev (github.com/Nyrok/flompt) might be useful — the block-based editor makes it easier to tweak one component (say, output format) without accidentally breaking the rest of the prompt.
Exactly, the routing layer is key. We version our prompt templates alongside the code so they go through the same PR review process. Haven't tried a visual builder for it yet but I can see the value for non-technical team members who need to tweak prompts. Will take a look.
The PR review process for prompt templates is such a good idea — treating prompts as first-class code artifacts. The visual builder actually makes that iteration loop faster before a template even hits a PR, since you can decompose and restructure without manually editing XML/JSON. Happy you're checking it out! flompt.dev / github.com/Nyrok/flompt
That Gateway Lambda pattern for per-use-case system prompts is exactly the right approach. The routing layer knowing which prompt template to apply makes the whole system more predictable. One thing that helped me when iterating on those templates: building each one in a visual tool (flompt.dev) so the role/constraints/output-format are explicit and easy to diff between versions. Makes it much easier to spot why summarisation and extraction need different structures.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.