Lucas Geovani Castro Brogni

Posted on Feb 2

Building an MCP Server on AWS Lambda: Complete Serverless Architecture Guide

#aws #mcp #serverless

Introduction

Over the past few months, I've been working on something that fundamentally changed how I think about building APIs: implementing the Model Context Protocol (MCP) on AWS Lambda. This isn't just another serverless project—it's about rethinking how we expose services to AI assistants.

In this post, I'll share the complete architecture, the challenges I faced, and the lessons learned while building a production-ready MCP server that enables AI assistants like Claude to interact with APIs in a better way—allowing humans to interact with the assistant without having to worry about what's behind it. Whether you're considering building your own MCP server or just curious about serverless architectures for AI-native interfaces, this guide is for you.

Why Build an MCP Server?

The Problem with Traditional APIs

Traditional REST APIs are designed for human developers. They have:

Complex authentication flows
Verbose request/response structures
Documentation that humans read and interpret
SDKs that wrap HTTP calls

But AI assistants don't need SDKs. They don't read documentation the same way we do. They need structured, predictable interfaces with clear schemas and consistent error handling.

Enter the Model Context Protocol

MCP is an open protocol created by Anthropic that standardizes how AI assistants communicate with external tools. Think of it as a universal adapter between AI models and your services.

Instead of teaching an AI to make HTTP calls and parse responses, you define tools with clear input schemas and let the AI invoke them directly:

// Traditional API approach
"Make a POST to /tasks with this JSON body,
 set the Authorization header, handle pagination..."

// MCP approach
"Call the create_task tool with title: "'Review PR #123'\""

The Motivation

We had a product with a REST API, and users were increasingly working through AI assistants. The question wasn't if we should support MCP, but how.

The business case wrote itself:

Meet users where they are: AI assistants are becoming the default interface for many workflows
Reduce integration friction: Instead of users learning our API, let the AI handle it
Enable new use cases: Natural language interactions open doors that traditional APIs can't

The technical benefits followed:

Standardize the interface: One protocol, multiple AI clients
Improve AI comprehension: Structured tools instead of raw HTTP
Enable natural language workflows: "Do X with Y" just works—the AI handles the complexity
Future-proof the integration: As MCP evolves, so does the server
Hide implementation details: Users don't need to know about endpoints, authentication, or payloads

A Concrete Example: Task Management API

To make this guide practical, I'll use a task management API as our running example throughout this post. Think of it as a simplified Trello or Asana API with endpoints for:

Creating, updating, and deleting tasks
Organizing tasks into projects
Assigning tasks to team members
Tracking task status and due dates

This is just one example. The same architecture and patterns apply equally well to:

E-commerce APIs (products, orders, inventory)
Content management systems (articles, media, categories)
Customer relationship management (contacts, deals, activities)
Analytics platforms (reports, dashboards, metrics)
Any REST API you want to expose to AI assistants

The beauty of MCP is that once you understand the pattern, you can apply it to any domain. The AI assistant doesn't care whether it's managing tasks, processing orders, or analyzing data—it just needs well-defined tools with clear schemas.

Why Serverless?

When designing the architecture, I had several options:

Traditional server (EC2, ECS)
Containerized service (Fargate, Kubernetes)
Serverless functions (Lambda)

I chose AWS Lambda for several compelling reasons:

1. Per-Request Billing Matches Usage Patterns

AI assistants don't make constant requests. Usage is bursty—sometimes hundreds of tool calls in a session, then nothing for hours. With Lambda:

Pay only for actual invocations
No idle compute costs
Automatic scaling from 0 to thousands of concurrent requests

2. Reduced Operational Complexity

Running an MCP server isn't my core business. I don't want to:

Manage server patches
Configure auto-scaling groups
Handle load balancer health checks
Monitor container orchestration

Lambda handles all of this automatically.

3. Natural Fit for Request-Response Workloads

MCP is fundamentally request-response:

AI sends a JSON-RPC request
Server processes it
Server returns a JSON-RPC response

This maps perfectly to Lambda's execution model. No websockets, no long-running connections—just clean, stateless request handling.

4. Built-in Integration with API Gateway

API Gateway provides:

Custom domain management
Request validation
Rate limiting
Authentication via Lambda authorizers
CloudWatch logging

This entire infrastructure is defined in a single serverless.yml file.

The Architecture

Here's the high-level architecture I implemented:

                                    ┌──────────────────┐
                                    │   AI Assistant   │
                                    │  (Claude, etc.)  │
                                    └────────┬─────────┘
                                             │
                                    HTTP POST /v1/mcp
                                             │
                                             ▼
┌────────────────────────────────────────────────────────────────────┐
│                         API Gateway                                 │
│  ┌─────────────────┐    ┌─────────────────┐    ┌────────────────┐ │
│  │ Custom Domain   │    │  Rate Limiting  │    │ Request Routing│ │
│  │ mcp.example.com │    │                 │    │                │ │
│  └─────────────────┘    └─────────────────┘    └────────────────┘ │
└────────────────────────────────┬───────────────────────────────────┘
                                 │
                        ┌────────▼────────┐
                        │ Lambda Authorizer│
                        │  (API Key Auth)  │
                        └────────┬─────────┘
                                 │
                                 ▼
┌────────────────────────────────────────────────────────────────────┐
│                      MCP Handler Lambda                             │
│  ┌──────────────┐   ┌──────────────┐   ┌────────────────────────┐ │
│  │ JSON-RPC     │   │ MCP Server   │   │ Tools (example)        │ │
│  │ Parser       │──▶│ Router       │──▶│ ├── create_task        │ │
│  │              │   │              │   │ ├── list_tasks         │ │
│  └──────────────┘   └──────────────┘   │ ├── update_task        │ │
│                                        │ ├── assign_task        │ │
│                                        │ └── get_project_stats  │ │
│                                        └────────────┬───────────┘ │
└─────────────────────────────────────────────────────┼──────────────┘
                                                      │
                                                      ▼
                                          ┌───────────────────┐
                                          │  Your Existing    │
                                          │    REST API       │
                                          └───────────────────┘

Key Components

1. Multiple Lambda Handlers

Instead of one monolithic handler, I split responsibilities:

Handler	Path	Purpose	Auth
MCP Handler	`POST /v1/mcp`	Main protocol handler	Required
Health Check	`GET /v1/health`	Monitoring/uptime	None
MCP Config	`GET /.well-known/mcp-config`	Schema discovery	None
Server Card	`GET /.well-known/mcp/server-card.json`	Server metadata	None

This separation allows independent scaling and simpler debugging.

2. External Lambda Authorizer

Instead of validating API keys in every request, I use a shared Lambda authorizer:

functions:
  mcp:
    handler: dist/index.handler
    events:
      - http:
          path: /v1/mcp
          method: post
          authorizer:
            type: request
            arn: ${env:AUTHORIZER_LAMBDA_ARN}
            resultTtlInSeconds: 0
            identitySource: method.request.header.apikey

This authorizer:

Validates API keys against my authentication service
Returns IAM policies for API Gateway
Can be shared across multiple services
Caches results (I disabled caching for immediate key revocation)

3. Tool Architecture

Each MCP tool follows a consistent pattern. Here's an example using our task management domain (remember, this same pattern works for any API):

// 1. Define the input schema with Zod
const CreateTaskInputSchema = z.object({
  title: z.string().min(1),
  description: z.string().optional(),
  project_id: z.string().optional(),
  assignee_id: z.string().optional(),
  due_date: z.string().datetime().optional(),
  priority: z.enum(['low', 'medium', 'high']).optional(),
});

// 2. Define the tool metadata
const createTaskToolDefinition: McpTool = {
  name: 'create_task',
  description: 'Create a new task in the task management system',
  inputSchema: zodToJsonSchema(CreateTaskInputSchema),
};

// 3. Create the handler factory (dependency injection)
function createTaskHandler(deps: Dependencies): ToolHandler {
  return async (input: unknown): Promise<ToolResult> => {
    const parsed = CreateTaskInputSchema.safeParse(input);
    if (!parsed.success) {
      return formatToolError('VALIDATION_ERROR', parsed.error.message);
    }

    const result = await deps.apiClient.tasks.create(parsed.data);

    if (!result.success) {
      return formatToolError(result.error.type, result.error.message);
    }

    return formatToolResponse(`Task created: ${result.data.title} (ID: ${result.data.id})`);
  };
}

This pattern provides:

Runtime validation via Zod
Type safety throughout
Testability via dependency injection
Consistent error handling

The Challenges

Building this wasn't without its difficulties. Here are the main challenges I faced:

1. Cold Start Latency

Lambda cold starts are the elephant in the room. For AI interactions, latency matters—users expect quick responses.

The Problem:

Node.js Lambda cold starts: 300-500ms typically
With dependencies (Zod, axios, etc.): 500-800ms
In a VPC: add another 200-500ms

My Solution:

Warmup Plugin: Pre-warms Lambdas every 5 minutes in production
Minimal Dependencies: Carefully audited imports
VPC Optimization: Used VPC endpoints where possible
Lazy Loading: Tools only load schemas when called

# serverless.yml
warmup:
  default:
    enabled:
      - prod
    events:
      - schedule: rate(5 minutes)
    concurrency: 1
    prewarm: true

2. Error Handling Across Layers

With MCP, errors can occur at multiple levels:

JSON-RPC parsing errors
Authentication failures
Tool validation errors
External API errors
Rate limiting

The Challenge: Each level has different error formats and expectations.

My Solution: A two-tier error handling strategy:

// Protocol-level errors → JSON-RPC error format
{
  "jsonrpc": "2.0",
  "error": {
    "code": -32600,
    "message": "Invalid Request"
  },
  "id": null
}

// Tool execution errors → Tool response format (better for AI)
{
  "jsonrpc": "2.0",
  "result": {
    "content": [{
      "type": "text",
      "text": "Error: Rate limit exceeded. Try again in 60 seconds."
    }]
  },
  "id": 1
}

The key insight: Tool errors should be returned as tool responses, not protocol errors. This way, the AI can understand and act on them appropriately.

3. Authentication Context Propagation

The Lambda authorizer validates API keys, but the MCP handler needs that key to make downstream API calls.

The Challenge: API Gateway doesn't automatically pass the API key to the Lambda function.

My Solution: Configure the authorizer to return the API key in the authorization context:

// Authorizer returns
{
  principalId: userId,
  context: {
    apiKey: validatedApiKey,
    accountId: accountId
  }
}

// Handler extracts
const apiKey = event.requestContext.authorizer?.apiKey
  ?? extractApiKeyFromHeaders(event.headers);

4. Testing Without Mocks

I follow a strict "no mocks" policy—no jest.fn() or jest.mock(). This forced me to build proper fakes:

// tests/__fakes__/api-client.fake.ts
// Example using task management domain - adapt to your domain
export function createFakeApiClient(): ApiClient {
  const tasks = new Map<string, Task>();

  return {
    tasks: {
      async create(input) {
        const task = { id: generateId(), status: 'pending', ...input };
        tasks.set(task.id, task);
        return { success: true, data: task };
      },
      async get(id) {
        const task = tasks.get(id);
        return task
          ? { success: true, data: task }
          : { success: false, error: { type: 'NOT_FOUND' } };
      }
    }
  };
}

This approach:

Tests real behavior, not implementation details
Catches bugs that mocks would hide
Makes tests more readable
Forces better interface design

5. Choosing an Infrastructure as Code Tool

For deploying Lambda functions, you have several IaC options:

Tool	Pros	Cons
Serverless Framework	Simple YAML config, great plugins ecosystem, fast to get started	Less control over fine-grained AWS resources
AWS SAM	Native AWS support, good for pure Lambda workloads	Smaller plugin ecosystem
AWS CDK	Full programming language, type safety, maximum flexibility	Steeper learning curve, more verbose
Terraform	Cloud-agnostic, mature ecosystem	More complex for Lambda-specific patterns

We went with Serverless Framework for its simplicity and plugin ecosystem. Tools like serverless-offline for local development and serverless-domain-manager for custom domains made the developer experience smooth. The entire infrastructure fits in a single serverless.yml file that's easy to read and modify.

That said, if you're already invested in CDK or Terraform, there's no strong reason to switch—the architecture patterns in this post apply regardless of your IaC choice.

The Benefits

After running this in production, here are the tangible benefits:

1. True Pay-Per-Use Economics

Lambda pricing is based on two factors:

Requests: Number of invocations
Compute time: GB-seconds (memory allocated × execution duration)

The key insight: keep your functions fast and right-sized. A 5ms function costs 100x less than a 500ms function doing the same work. MCP tools that just proxy to an existing API are naturally fast.

The free tier (1M requests + 400K GB-seconds monthly) covers most development and low-traffic production scenarios. Compare that to a traditional server sitting idle at $50+/month waiting for traffic.

Use the AWS Lambda Pricing Calculator to estimate costs for your specific configuration.

2. Operational Simplicity

Things I don't manage:

Server provisioning
OS patching
Load balancer configuration
Auto-scaling policies
Health check infrastructure

Things I do manage:

Business logic
Tool definitions
Error handling

3. Deployment Velocity

Deploying a new tool:

Write the tool handler
Register it in the tool index
npm run deploy:dev
Test
npm run deploy:prod

Total time: ~5 minutes for the deployment itself.

4. Observability for Free (Almost)

Lambda provides built-in:

CloudWatch Logs (automatic)
CloudWatch Metrics (invocations, errors, duration)
X-Ray tracing (optional)
API Gateway access logs

I added OpenTelemetry on top for distributed tracing, but the baseline is already useful.

5. Natural Disaster Recovery

Lambda functions are:

Deployed across multiple availability zones
Automatically retried on infrastructure failures
Stateless (easy to reason about)

No manual DR planning needed.

The Cons (Let's Be Honest)

No architecture is perfect. Here's what I'd do differently or what remains challenging:

1. Cold Starts Are Real

Despite warmup plugins, cold starts happen. When traffic spikes after idle periods, the first few requests are slow. For AI interactions where users expect sub-second responses, this can be jarring.

Mitigation: Provisioned Concurrency (but adds cost and complexity).

2. Debugging is Harder

When something fails in Lambda:

You can't SSH in
Logs are the only visibility
Reproducing issues requires understanding the exact event

Mitigation: Extensive structured logging and local testing with serverless-offline.

3. Vendor Lock-In

This architecture is deeply tied to AWS:

Lambda function handlers
API Gateway events
CloudWatch logging
IAM for authorization

Moving to GCP or Azure would require significant rewrites.

Mitigation: Keep business logic in pure functions, isolate AWS-specific code.

4. Costs Can Spike from Unexpected Places

Lambda is often the cheapest part of your bill. Watch out for:

API Gateway: Charges per request, can exceed Lambda costs at scale
CloudWatch Logs: Verbose logging adds up fast—log wisely
Retry loops: A bug causing infinite retries can burn budget in hours
Long-running functions: Cost scales linearly with duration—optimize slow tools

Mitigation: Set billing alerts from day one, implement rate limiting, and keep functions fast. Review your AWS Cost Explorer monthly to catch surprises early.

5. Complex Local Development

Testing locally with serverless-offline isn't the same as production:

Authorizers don't actually invoke
Cold starts don't happen
VPC networking differs

Mitigation: Deploy to a dev stage frequently, automate integration tests.

Tips for Building Your Own MCP Server on Lambda

Based on my experience, here are actionable tips. I'll continue using the task management example, but remember: replace "task" with your domain concept (product, order, article, contact, etc.) and the patterns remain exactly the same.

1. Start with the Tool Schema

Before writing any code, define your tool schemas:

// Define WHAT the tool does before HOW
// Example: task management (adapt to your domain)
const toolDefinition = {
  name: 'create_task',
  description: `Create a new task in the project.
    Returns the task ID and creation timestamp.
    Requires a title. Optionally accepts project_id, assignee, and due_date.`,
  inputSchema: {
    type: 'object',
    properties: {
      title: {
        type: 'string',
        description: 'The title of the task',
      },
      project_id: {
        type: 'string',
        description: 'The project to add this task to',
      },
      // ...
    },
    required: ['title'],
  },
};

The description matters—it's what the AI uses to understand when to call your tool.

2. Use Zod for Runtime Validation

TypeScript types vanish at runtime. Use Zod for actual validation:

// Example: validating task input (adapt schema to your domain)
const CreateTaskSchema = z.object({
  title: z.string().min(1, 'Title is required'),
  due_date: z.string().datetime().optional(),
  priority: z.enum(['low', 'medium', 'high']).default('medium'),
});

// In handler
const result = CreateTaskSchema.safeParse(input);
if (!result.success) {
  return formatToolError('VALIDATION_ERROR', result.error.message);
}

This catches bad input before it causes problems downstream.

3. Return AI-Friendly Error Messages

Don't return technical errors to the AI:

// Bad
return { error: 'ECONNREFUSED 127.0.0.1:5432' };

// Good
return formatToolError(
  'SERVICE_UNAVAILABLE',
  'Unable to create task. The service is temporarily unavailable. Please try again.'
);

The AI will relay this to the user—make it understandable.

4. Use Factory Functions for Dependency Injection

Avoid global state. Use factories:

// Don't do this
const apiClient = new ApiClient(process.env.API_KEY);

export function handleCreateTask(input) {
  return apiClient.tasks.create(input); // Hard to test!
}

// Do this
export function createTaskHandler(deps: { apiClient: ApiClient }) {
  return async function handleCreateTask(input) {
    return deps.apiClient.tasks.create(input); // Testable!
  };
}

5. Implement Graceful Degradation

When external services fail, handle it gracefully:

async function handleToolCall(tool: string, input: unknown) {
  try {
    const handler = tools.get(tool);
    if (!handler) {
      return formatToolError('UNKNOWN_TOOL', `Tool "${tool}" not found`);
    }

    const result = await handler(input);
    return result;
  } catch (error) {
    // Log the real error
    logger.error('Tool execution failed', { tool, error });

    // Return a user-friendly message
    return formatToolError(
      'INTERNAL_ERROR',
      'Something went wrong. Please try again.'
    );
  }
}

6. Add Structured Logging from Day One

You'll need it for debugging:

const logger = createLogger({
  service: 'mcp-server',
  level: process.env.LOG_LEVEL || 'info',
});

// In handler
logger.info('Tool invoked', {
  tool: toolName,
  input: sanitizeInput(input),
  requestId: context.awsRequestId,
});

7. Separate Protocol Handling from Business Logic

Keep the MCP protocol handling separate:

src/
├── index.ts          # Lambda handler (thin)
├── mcp/
│   ├── protocol.ts   # JSON-RPC parsing
│   └── server.ts     # MCP routing
└── tools/
    ├── tasks/                      # Example: task management
    │   ├── create-task.tool.ts
    │   ├── list-tasks.tool.ts
    │   └── update-task.tool.ts
    └── projects/                   # Group tools by domain concept
        └── get-project-stats.tool.ts

This makes testing easier and keeps concerns separated.

8. Test with Real AI Assistants Early

Don't wait until the end to test with actual AI:

Deploy to dev
Configure your AI assistant to use it
Try natural language requests
Iterate on tool descriptions

You'll discover that tool descriptions matter more than you think.

9. Implement Rate Limiting at Multiple Levels

Protect yourself:

API Gateway: Request throttling
Authorizer: Per-key rate limits
Tools: Per-operation limits

# serverless.yml
provider:
  apiGateway:
    throttle:
      burstLimit: 200
      rateLimit: 100

10. Monitor Cold Starts

Track them explicitly:

let isColdStart = true;

export async function handler(event, context) {
  if (isColdStart) {
    logger.info('Cold start', { requestId: context.awsRequestId });
    isColdStart = false;
  }
  // ...
}

Use this data to tune warmup settings.

Key Metrics After Running in Production

Here's what I'm seeing after some time:

Metric	Value
Average latency (warm)	~100ms
Average latency (cold)	~600ms
Cold start rate	<5% (with warmup)
Error rate	<0.1%

Conclusion

Building an MCP server on AWS Lambda has been a rewarding journey. The combination of:

MCP's standardized protocol for AI interactions
Lambda's serverless model for operational simplicity
Functional programming principles for maintainability

...creates an architecture that's both powerful and practical.

Is it perfect? No. Cold starts are annoying, debugging requires discipline, and you're locked into AWS. But for the use case of exposing your existing APIs to AI assistants—letting users interact naturally without worrying about the technical details—the benefits far outweigh the drawbacks.

The real magic happens when you see someone ask an AI assistant to "do something" and watch it seamlessly translate that into the right API calls. No documentation lookup, no curl commands, no debugging JSON payloads. Just natural conversation that gets things done.

If you're considering building your own MCP server, I hope this guide helps you avoid some of my early mistakes and gives you a solid foundation to build on.

Resources

Have questions or want to share your own MCP journey? Find me on X @brognilogs or drop a comment below.

Top comments (5)

Bruna L • Feb 3

Excellent work, Lucas! This article shows a very mature understanding of both MCP and real-world serverless architecture. The way you map MCP’s AI-native model onto AWS Lambda, while being honest about trade-offs like cold starts, debugging complexity and cost pitfalls clearly comes from production experience, not theory.

The separation between protocol handling, tool execution, and infrastructure concerns is especially well done, as is the focus on AI-friendly error semantics and strong runtime validation. Overall, this is a clear, practical, and technically solid guide that will be genuinely useful to anyone building production MCP servers! 🎉