Introduction
Over the past few months, I've been working on something that fundamentally changed how I think about building APIs: implementing the Model Context Protocol (MCP) on AWS Lambda. This isn't just another serverless project—it's about rethinking how we expose services to AI assistants.
In this post, I'll share the complete architecture, the challenges I faced, and the lessons learned while building a production-ready MCP server that enables AI assistants like Claude to interact with APIs in a better way—allowing humans to interact with the assistant without having to worry about what's behind it. Whether you're considering building your own MCP server or just curious about serverless architectures for AI-native interfaces, this guide is for you.
Why Build an MCP Server?
The Problem with Traditional APIs
Traditional REST APIs are designed for human developers. They have:
- Complex authentication flows
- Verbose request/response structures
- Documentation that humans read and interpret
- SDKs that wrap HTTP calls
But AI assistants don't need SDKs. They don't read documentation the same way we do. They need structured, predictable interfaces with clear schemas and consistent error handling.
Enter the Model Context Protocol
MCP is an open protocol created by Anthropic that standardizes how AI assistants communicate with external tools. Think of it as a universal adapter between AI models and your services.
Instead of teaching an AI to make HTTP calls and parse responses, you define tools with clear input schemas and let the AI invoke them directly:
// Traditional API approach
"Make a POST to /tasks with this JSON body,
set the Authorization header, handle pagination..."
// MCP approach
"Call the create_task tool with title: "'Review PR #123'\""
The Motivation
We had a product with a REST API, and users were increasingly working through AI assistants. The question wasn't if we should support MCP, but how.
The business case wrote itself:
- Meet users where they are: AI assistants are becoming the default interface for many workflows
- Reduce integration friction: Instead of users learning our API, let the AI handle it
- Enable new use cases: Natural language interactions open doors that traditional APIs can't
The technical benefits followed:
- Standardize the interface: One protocol, multiple AI clients
- Improve AI comprehension: Structured tools instead of raw HTTP
- Enable natural language workflows: "Do X with Y" just works—the AI handles the complexity
- Future-proof the integration: As MCP evolves, so does the server
- Hide implementation details: Users don't need to know about endpoints, authentication, or payloads
A Concrete Example: Task Management API
To make this guide practical, I'll use a task management API as our running example throughout this post. Think of it as a simplified Trello or Asana API with endpoints for:
- Creating, updating, and deleting tasks
- Organizing tasks into projects
- Assigning tasks to team members
- Tracking task status and due dates
This is just one example. The same architecture and patterns apply equally well to:
- E-commerce APIs (products, orders, inventory)
- Content management systems (articles, media, categories)
- Customer relationship management (contacts, deals, activities)
- Analytics platforms (reports, dashboards, metrics)
- Any REST API you want to expose to AI assistants
The beauty of MCP is that once you understand the pattern, you can apply it to any domain. The AI assistant doesn't care whether it's managing tasks, processing orders, or analyzing data—it just needs well-defined tools with clear schemas.
Why Serverless?
When designing the architecture, I had several options:
- Traditional server (EC2, ECS)
- Containerized service (Fargate, Kubernetes)
- Serverless functions (Lambda)
I chose AWS Lambda for several compelling reasons:
1. Per-Request Billing Matches Usage Patterns
AI assistants don't make constant requests. Usage is bursty—sometimes hundreds of tool calls in a session, then nothing for hours. With Lambda:
- Pay only for actual invocations
- No idle compute costs
- Automatic scaling from 0 to thousands of concurrent requests
2. Reduced Operational Complexity
Running an MCP server isn't my core business. I don't want to:
- Manage server patches
- Configure auto-scaling groups
- Handle load balancer health checks
- Monitor container orchestration
Lambda handles all of this automatically.
3. Natural Fit for Request-Response Workloads
MCP is fundamentally request-response:
- AI sends a JSON-RPC request
- Server processes it
- Server returns a JSON-RPC response
This maps perfectly to Lambda's execution model. No websockets, no long-running connections—just clean, stateless request handling.
4. Built-in Integration with API Gateway
API Gateway provides:
- Custom domain management
- Request validation
- Rate limiting
- Authentication via Lambda authorizers
- CloudWatch logging
This entire infrastructure is defined in a single serverless.yml file.
The Architecture
Here's the high-level architecture I implemented:
┌──────────────────┐
│ AI Assistant │
│ (Claude, etc.) │
└────────┬─────────┘
│
HTTP POST /v1/mcp
│
▼
┌────────────────────────────────────────────────────────────────────┐
│ API Gateway │
│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ │
│ │ Custom Domain │ │ Rate Limiting │ │ Request Routing│ │
│ │ mcp.example.com │ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ └────────────────┘ │
└────────────────────────────────┬───────────────────────────────────┘
│
┌────────▼────────┐
│ Lambda Authorizer│
│ (API Key Auth) │
└────────┬─────────┘
│
▼
┌────────────────────────────────────────────────────────────────────┐
│ MCP Handler Lambda │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ JSON-RPC │ │ MCP Server │ │ Tools (example) │ │
│ │ Parser │──▶│ Router │──▶│ ├── create_task │ │
│ │ │ │ │ │ ├── list_tasks │ │
│ └──────────────┘ └──────────────┘ │ ├── update_task │ │
│ │ ├── assign_task │ │
│ │ └── get_project_stats │ │
│ └────────────┬───────────┘ │
└─────────────────────────────────────────────────────┼──────────────┘
│
▼
┌───────────────────┐
│ Your Existing │
│ REST API │
└───────────────────┘
Key Components
1. Multiple Lambda Handlers
Instead of one monolithic handler, I split responsibilities:
| Handler | Path | Purpose | Auth |
|---|---|---|---|
| MCP Handler | POST /v1/mcp |
Main protocol handler | Required |
| Health Check | GET /v1/health |
Monitoring/uptime | None |
| MCP Config | GET /.well-known/mcp-config |
Schema discovery | None |
| Server Card | GET /.well-known/mcp/server-card.json |
Server metadata | None |
This separation allows independent scaling and simpler debugging.
2. External Lambda Authorizer
Instead of validating API keys in every request, I use a shared Lambda authorizer:
functions:
mcp:
handler: dist/index.handler
events:
- http:
path: /v1/mcp
method: post
authorizer:
type: request
arn: ${env:AUTHORIZER_LAMBDA_ARN}
resultTtlInSeconds: 0
identitySource: method.request.header.apikey
This authorizer:
- Validates API keys against my authentication service
- Returns IAM policies for API Gateway
- Can be shared across multiple services
- Caches results (I disabled caching for immediate key revocation)
3. Tool Architecture
Each MCP tool follows a consistent pattern. Here's an example using our task management domain (remember, this same pattern works for any API):
// 1. Define the input schema with Zod
const CreateTaskInputSchema = z.object({
title: z.string().min(1),
description: z.string().optional(),
project_id: z.string().optional(),
assignee_id: z.string().optional(),
due_date: z.string().datetime().optional(),
priority: z.enum(['low', 'medium', 'high']).optional(),
});
// 2. Define the tool metadata
const createTaskToolDefinition: McpTool = {
name: 'create_task',
description: 'Create a new task in the task management system',
inputSchema: zodToJsonSchema(CreateTaskInputSchema),
};
// 3. Create the handler factory (dependency injection)
function createTaskHandler(deps: Dependencies): ToolHandler {
return async (input: unknown): Promise<ToolResult> => {
const parsed = CreateTaskInputSchema.safeParse(input);
if (!parsed.success) {
return formatToolError('VALIDATION_ERROR', parsed.error.message);
}
const result = await deps.apiClient.tasks.create(parsed.data);
if (!result.success) {
return formatToolError(result.error.type, result.error.message);
}
return formatToolResponse(`Task created: ${result.data.title} (ID: ${result.data.id})`);
};
}
This pattern provides:
- Runtime validation via Zod
- Type safety throughout
- Testability via dependency injection
- Consistent error handling
The Challenges
Building this wasn't without its difficulties. Here are the main challenges I faced:
1. Cold Start Latency
Lambda cold starts are the elephant in the room. For AI interactions, latency matters—users expect quick responses.
The Problem:
- Node.js Lambda cold starts: 300-500ms typically
- With dependencies (Zod, axios, etc.): 500-800ms
- In a VPC: add another 200-500ms
My Solution:
- Warmup Plugin: Pre-warms Lambdas every 5 minutes in production
- Minimal Dependencies: Carefully audited imports
- VPC Optimization: Used VPC endpoints where possible
- Lazy Loading: Tools only load schemas when called
# serverless.yml
warmup:
default:
enabled:
- prod
events:
- schedule: rate(5 minutes)
concurrency: 1
prewarm: true
2. Error Handling Across Layers
With MCP, errors can occur at multiple levels:
- JSON-RPC parsing errors
- Authentication failures
- Tool validation errors
- External API errors
- Rate limiting
The Challenge: Each level has different error formats and expectations.
My Solution: A two-tier error handling strategy:
// Protocol-level errors → JSON-RPC error format
{
"jsonrpc": "2.0",
"error": {
"code": -32600,
"message": "Invalid Request"
},
"id": null
}
// Tool execution errors → Tool response format (better for AI)
{
"jsonrpc": "2.0",
"result": {
"content": [{
"type": "text",
"text": "Error: Rate limit exceeded. Try again in 60 seconds."
}]
},
"id": 1
}
The key insight: Tool errors should be returned as tool responses, not protocol errors. This way, the AI can understand and act on them appropriately.
3. Authentication Context Propagation
The Lambda authorizer validates API keys, but the MCP handler needs that key to make downstream API calls.
The Challenge: API Gateway doesn't automatically pass the API key to the Lambda function.
My Solution: Configure the authorizer to return the API key in the authorization context:
// Authorizer returns
{
principalId: userId,
context: {
apiKey: validatedApiKey,
accountId: accountId
}
}
// Handler extracts
const apiKey = event.requestContext.authorizer?.apiKey
?? extractApiKeyFromHeaders(event.headers);
4. Testing Without Mocks
I follow a strict "no mocks" policy—no jest.fn() or jest.mock(). This forced me to build proper fakes:
// tests/__fakes__/api-client.fake.ts
// Example using task management domain - adapt to your domain
export function createFakeApiClient(): ApiClient {
const tasks = new Map<string, Task>();
return {
tasks: {
async create(input) {
const task = { id: generateId(), status: 'pending', ...input };
tasks.set(task.id, task);
return { success: true, data: task };
},
async get(id) {
const task = tasks.get(id);
return task
? { success: true, data: task }
: { success: false, error: { type: 'NOT_FOUND' } };
}
}
};
}
This approach:
- Tests real behavior, not implementation details
- Catches bugs that mocks would hide
- Makes tests more readable
- Forces better interface design
5. Choosing an Infrastructure as Code Tool
For deploying Lambda functions, you have several IaC options:
| Tool | Pros | Cons |
|---|---|---|
| Serverless Framework | Simple YAML config, great plugins ecosystem, fast to get started | Less control over fine-grained AWS resources |
| AWS SAM | Native AWS support, good for pure Lambda workloads | Smaller plugin ecosystem |
| AWS CDK | Full programming language, type safety, maximum flexibility | Steeper learning curve, more verbose |
| Terraform | Cloud-agnostic, mature ecosystem | More complex for Lambda-specific patterns |
We went with Serverless Framework for its simplicity and plugin ecosystem. Tools like serverless-offline for local development and serverless-domain-manager for custom domains made the developer experience smooth. The entire infrastructure fits in a single serverless.yml file that's easy to read and modify.
That said, if you're already invested in CDK or Terraform, there's no strong reason to switch—the architecture patterns in this post apply regardless of your IaC choice.
The Benefits
After running this in production, here are the tangible benefits:
1. True Pay-Per-Use Economics
Lambda pricing is based on two factors:
- Requests: Number of invocations
- Compute time: GB-seconds (memory allocated × execution duration)
The key insight: keep your functions fast and right-sized. A 5ms function costs 100x less than a 500ms function doing the same work. MCP tools that just proxy to an existing API are naturally fast.
The free tier (1M requests + 400K GB-seconds monthly) covers most development and low-traffic production scenarios. Compare that to a traditional server sitting idle at $50+/month waiting for traffic.
Use the AWS Lambda Pricing Calculator to estimate costs for your specific configuration.
2. Operational Simplicity
Things I don't manage:
- Server provisioning
- OS patching
- Load balancer configuration
- Auto-scaling policies
- Health check infrastructure
Things I do manage:
- Business logic
- Tool definitions
- Error handling
3. Deployment Velocity
Deploying a new tool:
- Write the tool handler
- Register it in the tool index
npm run deploy:dev- Test
npm run deploy:prod
Total time: ~5 minutes for the deployment itself.
4. Observability for Free (Almost)
Lambda provides built-in:
- CloudWatch Logs (automatic)
- CloudWatch Metrics (invocations, errors, duration)
- X-Ray tracing (optional)
- API Gateway access logs
I added OpenTelemetry on top for distributed tracing, but the baseline is already useful.
5. Natural Disaster Recovery
Lambda functions are:
- Deployed across multiple availability zones
- Automatically retried on infrastructure failures
- Stateless (easy to reason about)
No manual DR planning needed.
The Cons (Let's Be Honest)
No architecture is perfect. Here's what I'd do differently or what remains challenging:
1. Cold Starts Are Real
Despite warmup plugins, cold starts happen. When traffic spikes after idle periods, the first few requests are slow. For AI interactions where users expect sub-second responses, this can be jarring.
Mitigation: Provisioned Concurrency (but adds cost and complexity).
2. Debugging is Harder
When something fails in Lambda:
- You can't SSH in
- Logs are the only visibility
- Reproducing issues requires understanding the exact event
Mitigation: Extensive structured logging and local testing with serverless-offline.
3. Vendor Lock-In
This architecture is deeply tied to AWS:
- Lambda function handlers
- API Gateway events
- CloudWatch logging
- IAM for authorization
Moving to GCP or Azure would require significant rewrites.
Mitigation: Keep business logic in pure functions, isolate AWS-specific code.
4. Costs Can Spike from Unexpected Places
Lambda is often the cheapest part of your bill. Watch out for:
- API Gateway: Charges per request, can exceed Lambda costs at scale
- CloudWatch Logs: Verbose logging adds up fast—log wisely
- Retry loops: A bug causing infinite retries can burn budget in hours
- Long-running functions: Cost scales linearly with duration—optimize slow tools
Mitigation: Set billing alerts from day one, implement rate limiting, and keep functions fast. Review your AWS Cost Explorer monthly to catch surprises early.
5. Complex Local Development
Testing locally with serverless-offline isn't the same as production:
- Authorizers don't actually invoke
- Cold starts don't happen
- VPC networking differs
Mitigation: Deploy to a dev stage frequently, automate integration tests.
Tips for Building Your Own MCP Server on Lambda
Based on my experience, here are actionable tips. I'll continue using the task management example, but remember: replace "task" with your domain concept (product, order, article, contact, etc.) and the patterns remain exactly the same.
1. Start with the Tool Schema
Before writing any code, define your tool schemas:
// Define WHAT the tool does before HOW
// Example: task management (adapt to your domain)
const toolDefinition = {
name: 'create_task',
description: `Create a new task in the project.
Returns the task ID and creation timestamp.
Requires a title. Optionally accepts project_id, assignee, and due_date.`,
inputSchema: {
type: 'object',
properties: {
title: {
type: 'string',
description: 'The title of the task',
},
project_id: {
type: 'string',
description: 'The project to add this task to',
},
// ...
},
required: ['title'],
},
};
The description matters—it's what the AI uses to understand when to call your tool.
2. Use Zod for Runtime Validation
TypeScript types vanish at runtime. Use Zod for actual validation:
// Example: validating task input (adapt schema to your domain)
const CreateTaskSchema = z.object({
title: z.string().min(1, 'Title is required'),
due_date: z.string().datetime().optional(),
priority: z.enum(['low', 'medium', 'high']).default('medium'),
});
// In handler
const result = CreateTaskSchema.safeParse(input);
if (!result.success) {
return formatToolError('VALIDATION_ERROR', result.error.message);
}
This catches bad input before it causes problems downstream.
3. Return AI-Friendly Error Messages
Don't return technical errors to the AI:
// Bad
return { error: 'ECONNREFUSED 127.0.0.1:5432' };
// Good
return formatToolError(
'SERVICE_UNAVAILABLE',
'Unable to create task. The service is temporarily unavailable. Please try again.'
);
The AI will relay this to the user—make it understandable.
4. Use Factory Functions for Dependency Injection
Avoid global state. Use factories:
// Don't do this
const apiClient = new ApiClient(process.env.API_KEY);
export function handleCreateTask(input) {
return apiClient.tasks.create(input); // Hard to test!
}
// Do this
export function createTaskHandler(deps: { apiClient: ApiClient }) {
return async function handleCreateTask(input) {
return deps.apiClient.tasks.create(input); // Testable!
};
}
5. Implement Graceful Degradation
When external services fail, handle it gracefully:
async function handleToolCall(tool: string, input: unknown) {
try {
const handler = tools.get(tool);
if (!handler) {
return formatToolError('UNKNOWN_TOOL', `Tool "${tool}" not found`);
}
const result = await handler(input);
return result;
} catch (error) {
// Log the real error
logger.error('Tool execution failed', { tool, error });
// Return a user-friendly message
return formatToolError(
'INTERNAL_ERROR',
'Something went wrong. Please try again.'
);
}
}
6. Add Structured Logging from Day One
You'll need it for debugging:
const logger = createLogger({
service: 'mcp-server',
level: process.env.LOG_LEVEL || 'info',
});
// In handler
logger.info('Tool invoked', {
tool: toolName,
input: sanitizeInput(input),
requestId: context.awsRequestId,
});
7. Separate Protocol Handling from Business Logic
Keep the MCP protocol handling separate:
src/
├── index.ts # Lambda handler (thin)
├── mcp/
│ ├── protocol.ts # JSON-RPC parsing
│ └── server.ts # MCP routing
└── tools/
├── tasks/ # Example: task management
│ ├── create-task.tool.ts
│ ├── list-tasks.tool.ts
│ └── update-task.tool.ts
└── projects/ # Group tools by domain concept
└── get-project-stats.tool.ts
This makes testing easier and keeps concerns separated.
8. Test with Real AI Assistants Early
Don't wait until the end to test with actual AI:
- Deploy to dev
- Configure your AI assistant to use it
- Try natural language requests
- Iterate on tool descriptions
You'll discover that tool descriptions matter more than you think.
9. Implement Rate Limiting at Multiple Levels
Protect yourself:
- API Gateway: Request throttling
- Authorizer: Per-key rate limits
- Tools: Per-operation limits
# serverless.yml
provider:
apiGateway:
throttle:
burstLimit: 200
rateLimit: 100
10. Monitor Cold Starts
Track them explicitly:
let isColdStart = true;
export async function handler(event, context) {
if (isColdStart) {
logger.info('Cold start', { requestId: context.awsRequestId });
isColdStart = false;
}
// ...
}
Use this data to tune warmup settings.
Key Metrics After Running in Production
Here's what I'm seeing after some time:
| Metric | Value |
|---|---|
| Average latency (warm) | ~100ms |
| Average latency (cold) | ~600ms |
| Cold start rate | <5% (with warmup) |
| Error rate | <0.1% |
Conclusion
Building an MCP server on AWS Lambda has been a rewarding journey. The combination of:
- MCP's standardized protocol for AI interactions
- Lambda's serverless model for operational simplicity
- Functional programming principles for maintainability
...creates an architecture that's both powerful and practical.
Is it perfect? No. Cold starts are annoying, debugging requires discipline, and you're locked into AWS. But for the use case of exposing your existing APIs to AI assistants—letting users interact naturally without worrying about the technical details—the benefits far outweigh the drawbacks.
The real magic happens when you see someone ask an AI assistant to "do something" and watch it seamlessly translate that into the right API calls. No documentation lookup, no curl commands, no debugging JSON payloads. Just natural conversation that gets things done.
If you're considering building your own MCP server, I hope this guide helps you avoid some of my early mistakes and gives you a solid foundation to build on.
Resources
- Model Context Protocol Specification
- AWS Lambda Documentation
- Serverless Framework
- Zod - TypeScript-first Schema Validation
Have questions or want to share your own MCP journey? Find me on X @brognilogs or drop a comment below.
Top comments (0)