We wanted every employee in the company to use OpenClaw — not just engineers. Product managers writing specs, designers prototyping, ops teams automating workflows. The problem wasn't adoption — it was operations.
Hundreds of employees, each needing their own API key in a local config file. Within a week, keys were committed to git. Within a month, finance couldn't figure out who was spending what. When Anthropic rotated a key, we had to notify everyone individually. We needed infrastructure, not process.
We solved it with four AWS CDK stacks and a 100-line sidecar proxy. Here's how.
The Architecture: Four CDK Stacks
Everything deploys in order with nx deploy:
DEPLOY_ENV=prod nx deploy teamclaw-foundation-infra # VPC, EFS, ECR, Secrets Manager
DEPLOY_ENV=prod nx deploy teamclaw-cluster-infra # ECS Fargate, ALB, CloudFront
DEPLOY_ENV=prod nx deploy teamclaw-control-plane-infra # Cognito, DynamoDB, Lifecycle Lambda
DEPLOY_ENV=prod nx deploy teamclaw-admin-infra # API Gateway, 44 Lambda handlers
After deployment, IT configures API keys in the admin panel. Employees sign up with their company email. That's it — no per-user key setup.
Stack 1: Foundation
VPC with public/private subnets, EFS for per-user data persistence, ECR for container images, and Secrets Manager for the API key pool.
The key design decision: one Secrets Manager secret holds all provider keys as a JSON pool.
// foundation.stack.ts
const apiKeysSecret = new aws_secretsmanager.Secret(this, 'ApiKeysSecret', {
secretName: `${deployEnv}/teamclaw/api-keys`,
description: 'Shared API key pool for TeamClaw',
});
The secret format:
{
"providers": {
"anthropic": {
"authType": "apiKey",
"keys": ["sk-ant-key1...", "sk-ant-key2...", "sk-ant-key3..."]
},
"openai": {
"authType": "apiKey",
"keys": ["sk-proj-key1...", "sk-proj-key2..."]
}
}
}
Adding a key is one CLI call. No container restart — the sidecar's 60-second cache handles it.
Stack 2: Cluster
ECS Fargate cluster, ALB with per-user path-based routing, CloudFront for WSS termination, and the ECS Task Definition with two containers.
// cluster.stack.ts
const taskDefinition = new aws_ecs.FargateTaskDefinition(this, 'UserTaskDef', {
family: `teamclaw-user-${deployEnv}`,
cpu: 1024,
memoryLimitMiB: 2048,
taskRole,
executionRole,
});
// Container 1: Unmodified OpenClaw
taskDefinition.addContainer('teamclaw', {
image: aws_ecs.ContainerImage.fromRegistry(`${ecrUri}:latest`),
portMappings: [{ containerPort: 18789 }],
});
// Container 2: Sidecar proxy for credential injection
taskDefinition.addContainer('proxy-sidecar', {
image: aws_ecs.ContainerImage.fromRegistry(`${sidecarUri}:latest`),
portMappings: [{ containerPort: 3000 }],
});
The sidecar is the key piece. OpenClaw's config points all providers to http://localhost:3000/anthropic, http://localhost:3000/openai, etc. The sidecar intercepts every API call, strips the dummy auth header, injects the real key from Secrets Manager, and forwards upstream. The real API key never touches the OpenClaw process.
The task role is scoped tight — least privilege:
// Secrets Manager: read-only on the key pool
taskRole.addToPolicy(new aws_iam.PolicyStatement({
actions: ['secretsmanager:GetSecretValue'],
resources: [`arn:aws:secretsmanager:${region}:${account}:secret:${deployEnv}/teamclaw/*`],
}));
// EFS: mount user data
taskRole.addToPolicy(new aws_iam.PolicyStatement({
actions: ['elasticfilesystem:ClientMount', 'elasticfilesystem:ClientWrite'],
resources: [`arn:aws:elasticfilesystem:${region}:${account}:file-system/*`],
}));
// DynamoDB: sidecar logs usage per request
taskRole.addToPolicy(new aws_iam.PolicyStatement({
actions: ['dynamodb:PutItem'],
resources: [`arn:aws:dynamodb:${region}:${account}:table/teamclaw-usage-${deployEnv}`],
}));
Stack 3: Control Plane
Cognito for employee auth, DynamoDB for state, and the Lifecycle Lambda that orchestrates container lifecycle.
When an employee logs in, the user-session Lambda checks DynamoDB. If no container exists, it invokes the Lifecycle Lambda which:
- Creates an EFS access point at
/users/{userId} - Calls
ecs:RunTaskto start a Fargate task - Polls
DescribeTasksuntil the container gets a private IP - Creates a per-user ALB target group and listener rule
- Records the task ARN and IP in DynamoDB
// Lifecycle Lambda — least privilege for ECS orchestration
lifecycleLambda.addToRolePolicy(new aws_iam.PolicyStatement({
actions: ['ecs:RunTask', 'ecs:StopTask', 'ecs:DescribeTasks'],
resources: [
`arn:aws:ecs:${region}:${account}:cluster/teamclaw-${deployEnv}`,
`arn:aws:ecs:${region}:${account}:task/teamclaw-${deployEnv}/*`,
`arn:aws:ecs:${region}:${account}:task-definition/teamclaw-*-${deployEnv}:*`,
],
}));
An EventBridge rule triggers idle checking every 15 minutes. Containers idle for 30+ minutes get stopped — real cost savings when Fargate bills per-second.
new aws_events.Rule(this, 'IdleCheckRule', {
schedule: aws_events.Schedule.rate(Duration.minutes(15)),
}).addTarget(new aws_events_targets.LambdaFunction(lifecycleLambda, {
event: aws_events.RuleTargetInput.fromObject({ action: 'check-idle', userId: 'system' }),
}));
Stack 4: Admin API
API Gateway HTTP API with Cognito JWT authorizer, 44 Lambda handlers. Users, teams, containers, API keys, integrations, analytics — all managed through the admin panel.
Cross-stack communication uses SSM parameters exclusively — no CDK exports:
// Stack 2 writes:
new aws_ssm.StringParameter(this, 'ClusterNameParam', {
parameterName: `/tc/${deployEnv}/ecs/clusterName`,
stringValue: cluster.clusterName,
});
// Stack 3 reads:
const clusterName = aws_ssm.StringParameter.valueForStringParameter(
this, `/tc/${deployEnv}/ecs/clusterName`
);
This pattern means stacks are fully independent. No deployment ordering issues, no "stack is in a failed state" nightmares.
The Sidecar: How API Key Isolation Works
The sidecar proxy runs on localhost:3000 inside each container. OpenClaw thinks it's talking to the AI provider. Its config has:
providers: {
anthropic: { baseUrl: 'http://localhost:3000/anthropic', apiKey: 'proxy-managed' },
openai: { baseUrl: 'http://localhost:3000/openai', apiKey: 'proxy-managed' },
}
The sidecar parses the provider from the URL, reads the key pool from Secrets Manager (cached 60s), round-robins across keys, strips the dummy auth, injects the real key, and forwards upstream. Usage gets logged to DynamoDB per request.
The container entrypoint explicitly unsets all provider env vars as its first action:
#!/bin/sh
unset ANTHROPIC_API_KEY OPENAI_API_KEY GOOGLE_API_KEY
node /scripts/generate-config.js
exec openclaw gateway run --port 18789 --auth trusted-proxy
If a skill or MCP tool reads process.env, it finds nothing. The real key only exists in the sidecar's memory, fetched at runtime from Secrets Manager.
Config Hierarchy: Global → Team → User
OpenClaw reads a single openclaw.json. We generate it at startup by merging three tiers:
const globalConfig = loadJson('/efs/system/global-config.json'); // Admin guardrails
const teamConfig = loadJson(`/efs/teams/${teamId}/team-config.json`); // Team standards
const userConfig = loadJson(`/efs/users/${userId}/user-config.json`); // Personal prefs
const merged = deepMerge(deepMerge(deepMerge(baseConfig, globalConfig), teamConfig), userConfig);
Same for SOUL.md — OpenClaw's system prompt. Admin sets guardrails, team sets coding standards, user sets preferences. All layered, zero conflicts.
What We Learned
ALB listener rules cap at 100 per listener. Each user gets a routing rule. At 100 concurrent users, you need to request a limit increase or rearchitect.
Per-user Fargate costs ~$12/user/month running 24/7. With 30-minute idle stop, real-world cost is ~$3-4/user/month.
SSM parameter pattern for cross-stack refs is worth it. More verbose than CDK exports, but zero coupling between stacks.
The sidecar pattern works for any container-based AI tool, not just OpenClaw. If you're running any AI agent that needs API keys, a localhost proxy that injects credentials from Secrets Manager is a clean, provider-agnostic solution.
Source
Apache 2.0: github.com/ChenKuanSun/teamclaw — Nx monorepo, 4 CDK stacks, 992 tests.
Top comments (0)