DEV Community

Cover image for Building an MCP Server on AWS Lambda: Complete Serverless Architecture Guide
Lucas Geovani Castro Brogni
Lucas Geovani Castro Brogni

Posted on

Building an MCP Server on AWS Lambda: Complete Serverless Architecture Guide

Introduction

Over the past few months, I've been working on something that fundamentally changed how I think about building APIs: implementing the Model Context Protocol (MCP) on AWS Lambda. This isn't just another serverless project—it's about rethinking how we expose services to AI assistants.

In this post, I'll share the complete architecture, the challenges I faced, and the lessons learned while building a production-ready MCP server that enables AI assistants like Claude to interact with APIs in a better way—allowing humans to interact with the assistant without having to worry about what's behind it. Whether you're considering building your own MCP server or just curious about serverless architectures for AI-native interfaces, this guide is for you.


Why Build an MCP Server?

The Problem with Traditional APIs

Traditional REST APIs are designed for human developers. They have:

  • Complex authentication flows
  • Verbose request/response structures
  • Documentation that humans read and interpret
  • SDKs that wrap HTTP calls

But AI assistants don't need SDKs. They don't read documentation the same way we do. They need structured, predictable interfaces with clear schemas and consistent error handling.

Enter the Model Context Protocol

MCP is an open protocol created by Anthropic that standardizes how AI assistants communicate with external tools. Think of it as a universal adapter between AI models and your services.

Instead of teaching an AI to make HTTP calls and parse responses, you define tools with clear input schemas and let the AI invoke them directly:

// Traditional API approach
"Make a POST to /tasks with this JSON body,
 set the Authorization header, handle pagination..."

// MCP approach
"Call the create_task tool with title: "'Review PR #123'\""
Enter fullscreen mode Exit fullscreen mode

The Motivation

We had a product with a REST API, and users were increasingly working through AI assistants. The question wasn't if we should support MCP, but how.

The business case wrote itself:

  • Meet users where they are: AI assistants are becoming the default interface for many workflows
  • Reduce integration friction: Instead of users learning our API, let the AI handle it
  • Enable new use cases: Natural language interactions open doors that traditional APIs can't

The technical benefits followed:

  1. Standardize the interface: One protocol, multiple AI clients
  2. Improve AI comprehension: Structured tools instead of raw HTTP
  3. Enable natural language workflows: "Do X with Y" just works—the AI handles the complexity
  4. Future-proof the integration: As MCP evolves, so does the server
  5. Hide implementation details: Users don't need to know about endpoints, authentication, or payloads

A Concrete Example: Task Management API

To make this guide practical, I'll use a task management API as our running example throughout this post. Think of it as a simplified Trello or Asana API with endpoints for:

  • Creating, updating, and deleting tasks
  • Organizing tasks into projects
  • Assigning tasks to team members
  • Tracking task status and due dates

This is just one example. The same architecture and patterns apply equally well to:

  • E-commerce APIs (products, orders, inventory)
  • Content management systems (articles, media, categories)
  • Customer relationship management (contacts, deals, activities)
  • Analytics platforms (reports, dashboards, metrics)
  • Any REST API you want to expose to AI assistants

The beauty of MCP is that once you understand the pattern, you can apply it to any domain. The AI assistant doesn't care whether it's managing tasks, processing orders, or analyzing data—it just needs well-defined tools with clear schemas.


Why Serverless?

When designing the architecture, I had several options:

  • Traditional server (EC2, ECS)
  • Containerized service (Fargate, Kubernetes)
  • Serverless functions (Lambda)

I chose AWS Lambda for several compelling reasons:

1. Per-Request Billing Matches Usage Patterns

AI assistants don't make constant requests. Usage is bursty—sometimes hundreds of tool calls in a session, then nothing for hours. With Lambda:

  • Pay only for actual invocations
  • No idle compute costs
  • Automatic scaling from 0 to thousands of concurrent requests

2. Reduced Operational Complexity

Running an MCP server isn't my core business. I don't want to:

  • Manage server patches
  • Configure auto-scaling groups
  • Handle load balancer health checks
  • Monitor container orchestration

Lambda handles all of this automatically.

3. Natural Fit for Request-Response Workloads

MCP is fundamentally request-response:

  1. AI sends a JSON-RPC request
  2. Server processes it
  3. Server returns a JSON-RPC response

This maps perfectly to Lambda's execution model. No websockets, no long-running connections—just clean, stateless request handling.

4. Built-in Integration with API Gateway

API Gateway provides:

  • Custom domain management
  • Request validation
  • Rate limiting
  • Authentication via Lambda authorizers
  • CloudWatch logging

This entire infrastructure is defined in a single serverless.yml file.


The Architecture

Here's the high-level architecture I implemented:

                                    ┌──────────────────┐
                                    │   AI Assistant   │
                                    │  (Claude, etc.)  │
                                    └────────┬─────────┘
                                             │
                                    HTTP POST /v1/mcp
                                             │
                                             ▼
┌────────────────────────────────────────────────────────────────────┐
│                         API Gateway                                 │
│  ┌─────────────────┐    ┌─────────────────┐    ┌────────────────┐ │
│  │ Custom Domain   │    │  Rate Limiting  │    │ Request Routing│ │
│  │ mcp.example.com │    │                 │    │                │ │
│  └─────────────────┘    └─────────────────┘    └────────────────┘ │
└────────────────────────────────┬───────────────────────────────────┘
                                 │
                        ┌────────▼────────┐
                        │ Lambda Authorizer│
                        │  (API Key Auth)  │
                        └────────┬─────────┘
                                 │
                                 ▼
┌────────────────────────────────────────────────────────────────────┐
│                      MCP Handler Lambda                             │
│  ┌──────────────┐   ┌──────────────┐   ┌────────────────────────┐ │
│  │ JSON-RPC     │   │ MCP Server   │   │ Tools (example)        │ │
│  │ Parser       │──▶│ Router       │──▶│ ├── create_task        │ │
│  │              │   │              │   │ ├── list_tasks         │ │
│  └──────────────┘   └──────────────┘   │ ├── update_task        │ │
│                                        │ ├── assign_task        │ │
│                                        │ └── get_project_stats  │ │
│                                        └────────────┬───────────┘ │
└─────────────────────────────────────────────────────┼──────────────┘
                                                      │
                                                      ▼
                                          ┌───────────────────┐
                                          │  Your Existing    │
                                          │    REST API       │
                                          └───────────────────┘
Enter fullscreen mode Exit fullscreen mode

Key Components

1. Multiple Lambda Handlers

Instead of one monolithic handler, I split responsibilities:

Handler Path Purpose Auth
MCP Handler POST /v1/mcp Main protocol handler Required
Health Check GET /v1/health Monitoring/uptime None
MCP Config GET /.well-known/mcp-config Schema discovery None
Server Card GET /.well-known/mcp/server-card.json Server metadata None

This separation allows independent scaling and simpler debugging.

2. External Lambda Authorizer

Instead of validating API keys in every request, I use a shared Lambda authorizer:

functions:
  mcp:
    handler: dist/index.handler
    events:
      - http:
          path: /v1/mcp
          method: post
          authorizer:
            type: request
            arn: ${env:AUTHORIZER_LAMBDA_ARN}
            resultTtlInSeconds: 0
            identitySource: method.request.header.apikey
Enter fullscreen mode Exit fullscreen mode

This authorizer:

  • Validates API keys against my authentication service
  • Returns IAM policies for API Gateway
  • Can be shared across multiple services
  • Caches results (I disabled caching for immediate key revocation)

3. Tool Architecture

Each MCP tool follows a consistent pattern. Here's an example using our task management domain (remember, this same pattern works for any API):

// 1. Define the input schema with Zod
const CreateTaskInputSchema = z.object({
  title: z.string().min(1),
  description: z.string().optional(),
  project_id: z.string().optional(),
  assignee_id: z.string().optional(),
  due_date: z.string().datetime().optional(),
  priority: z.enum(['low', 'medium', 'high']).optional(),
});

// 2. Define the tool metadata
const createTaskToolDefinition: McpTool = {
  name: 'create_task',
  description: 'Create a new task in the task management system',
  inputSchema: zodToJsonSchema(CreateTaskInputSchema),
};

// 3. Create the handler factory (dependency injection)
function createTaskHandler(deps: Dependencies): ToolHandler {
  return async (input: unknown): Promise<ToolResult> => {
    const parsed = CreateTaskInputSchema.safeParse(input);
    if (!parsed.success) {
      return formatToolError('VALIDATION_ERROR', parsed.error.message);
    }

    const result = await deps.apiClient.tasks.create(parsed.data);

    if (!result.success) {
      return formatToolError(result.error.type, result.error.message);
    }

    return formatToolResponse(`Task created: ${result.data.title} (ID: ${result.data.id})`);
  };
}
Enter fullscreen mode Exit fullscreen mode

This pattern provides:

  • Runtime validation via Zod
  • Type safety throughout
  • Testability via dependency injection
  • Consistent error handling

The Challenges

Building this wasn't without its difficulties. Here are the main challenges I faced:

1. Cold Start Latency

Lambda cold starts are the elephant in the room. For AI interactions, latency matters—users expect quick responses.

The Problem:

  • Node.js Lambda cold starts: 300-500ms typically
  • With dependencies (Zod, axios, etc.): 500-800ms
  • In a VPC: add another 200-500ms

My Solution:

  • Warmup Plugin: Pre-warms Lambdas every 5 minutes in production
  • Minimal Dependencies: Carefully audited imports
  • VPC Optimization: Used VPC endpoints where possible
  • Lazy Loading: Tools only load schemas when called
# serverless.yml
warmup:
  default:
    enabled:
      - prod
    events:
      - schedule: rate(5 minutes)
    concurrency: 1
    prewarm: true
Enter fullscreen mode Exit fullscreen mode

2. Error Handling Across Layers

With MCP, errors can occur at multiple levels:

  • JSON-RPC parsing errors
  • Authentication failures
  • Tool validation errors
  • External API errors
  • Rate limiting

The Challenge: Each level has different error formats and expectations.

My Solution: A two-tier error handling strategy:

// Protocol-level errors → JSON-RPC error format
{
  "jsonrpc": "2.0",
  "error": {
    "code": -32600,
    "message": "Invalid Request"
  },
  "id": null
}

// Tool execution errors → Tool response format (better for AI)
{
  "jsonrpc": "2.0",
  "result": {
    "content": [{
      "type": "text",
      "text": "Error: Rate limit exceeded. Try again in 60 seconds."
    }]
  },
  "id": 1
}
Enter fullscreen mode Exit fullscreen mode

The key insight: Tool errors should be returned as tool responses, not protocol errors. This way, the AI can understand and act on them appropriately.

3. Authentication Context Propagation

The Lambda authorizer validates API keys, but the MCP handler needs that key to make downstream API calls.

The Challenge: API Gateway doesn't automatically pass the API key to the Lambda function.

My Solution: Configure the authorizer to return the API key in the authorization context:

// Authorizer returns
{
  principalId: userId,
  context: {
    apiKey: validatedApiKey,
    accountId: accountId
  }
}

// Handler extracts
const apiKey = event.requestContext.authorizer?.apiKey
  ?? extractApiKeyFromHeaders(event.headers);
Enter fullscreen mode Exit fullscreen mode

4. Testing Without Mocks

I follow a strict "no mocks" policy—no jest.fn() or jest.mock(). This forced me to build proper fakes:

// tests/__fakes__/api-client.fake.ts
// Example using task management domain - adapt to your domain
export function createFakeApiClient(): ApiClient {
  const tasks = new Map<string, Task>();

  return {
    tasks: {
      async create(input) {
        const task = { id: generateId(), status: 'pending', ...input };
        tasks.set(task.id, task);
        return { success: true, data: task };
      },
      async get(id) {
        const task = tasks.get(id);
        return task
          ? { success: true, data: task }
          : { success: false, error: { type: 'NOT_FOUND' } };
      }
    }
  };
}
Enter fullscreen mode Exit fullscreen mode

This approach:

  • Tests real behavior, not implementation details
  • Catches bugs that mocks would hide
  • Makes tests more readable
  • Forces better interface design

5. Choosing an Infrastructure as Code Tool

For deploying Lambda functions, you have several IaC options:

Tool Pros Cons
Serverless Framework Simple YAML config, great plugins ecosystem, fast to get started Less control over fine-grained AWS resources
AWS SAM Native AWS support, good for pure Lambda workloads Smaller plugin ecosystem
AWS CDK Full programming language, type safety, maximum flexibility Steeper learning curve, more verbose
Terraform Cloud-agnostic, mature ecosystem More complex for Lambda-specific patterns

We went with Serverless Framework for its simplicity and plugin ecosystem. Tools like serverless-offline for local development and serverless-domain-manager for custom domains made the developer experience smooth. The entire infrastructure fits in a single serverless.yml file that's easy to read and modify.

That said, if you're already invested in CDK or Terraform, there's no strong reason to switch—the architecture patterns in this post apply regardless of your IaC choice.


The Benefits

After running this in production, here are the tangible benefits:

1. True Pay-Per-Use Economics

Lambda pricing is based on two factors:

  • Requests: Number of invocations
  • Compute time: GB-seconds (memory allocated × execution duration)

The key insight: keep your functions fast and right-sized. A 5ms function costs 100x less than a 500ms function doing the same work. MCP tools that just proxy to an existing API are naturally fast.

The free tier (1M requests + 400K GB-seconds monthly) covers most development and low-traffic production scenarios. Compare that to a traditional server sitting idle at $50+/month waiting for traffic.

Use the AWS Lambda Pricing Calculator to estimate costs for your specific configuration.

2. Operational Simplicity

Things I don't manage:

  • Server provisioning
  • OS patching
  • Load balancer configuration
  • Auto-scaling policies
  • Health check infrastructure

Things I do manage:

  • Business logic
  • Tool definitions
  • Error handling

3. Deployment Velocity

Deploying a new tool:

  1. Write the tool handler
  2. Register it in the tool index
  3. npm run deploy:dev
  4. Test
  5. npm run deploy:prod

Total time: ~5 minutes for the deployment itself.

4. Observability for Free (Almost)

Lambda provides built-in:

  • CloudWatch Logs (automatic)
  • CloudWatch Metrics (invocations, errors, duration)
  • X-Ray tracing (optional)
  • API Gateway access logs

I added OpenTelemetry on top for distributed tracing, but the baseline is already useful.

5. Natural Disaster Recovery

Lambda functions are:

  • Deployed across multiple availability zones
  • Automatically retried on infrastructure failures
  • Stateless (easy to reason about)

No manual DR planning needed.


The Cons (Let's Be Honest)

No architecture is perfect. Here's what I'd do differently or what remains challenging:

1. Cold Starts Are Real

Despite warmup plugins, cold starts happen. When traffic spikes after idle periods, the first few requests are slow. For AI interactions where users expect sub-second responses, this can be jarring.

Mitigation: Provisioned Concurrency (but adds cost and complexity).

2. Debugging is Harder

When something fails in Lambda:

  • You can't SSH in
  • Logs are the only visibility
  • Reproducing issues requires understanding the exact event

Mitigation: Extensive structured logging and local testing with serverless-offline.

3. Vendor Lock-In

This architecture is deeply tied to AWS:

  • Lambda function handlers
  • API Gateway events
  • CloudWatch logging
  • IAM for authorization

Moving to GCP or Azure would require significant rewrites.

Mitigation: Keep business logic in pure functions, isolate AWS-specific code.

4. Costs Can Spike from Unexpected Places

Lambda is often the cheapest part of your bill. Watch out for:

  • API Gateway: Charges per request, can exceed Lambda costs at scale
  • CloudWatch Logs: Verbose logging adds up fast—log wisely
  • Retry loops: A bug causing infinite retries can burn budget in hours
  • Long-running functions: Cost scales linearly with duration—optimize slow tools

Mitigation: Set billing alerts from day one, implement rate limiting, and keep functions fast. Review your AWS Cost Explorer monthly to catch surprises early.

5. Complex Local Development

Testing locally with serverless-offline isn't the same as production:

  • Authorizers don't actually invoke
  • Cold starts don't happen
  • VPC networking differs

Mitigation: Deploy to a dev stage frequently, automate integration tests.


Tips for Building Your Own MCP Server on Lambda

Based on my experience, here are actionable tips. I'll continue using the task management example, but remember: replace "task" with your domain concept (product, order, article, contact, etc.) and the patterns remain exactly the same.

1. Start with the Tool Schema

Before writing any code, define your tool schemas:

// Define WHAT the tool does before HOW
// Example: task management (adapt to your domain)
const toolDefinition = {
  name: 'create_task',
  description: `Create a new task in the project.
    Returns the task ID and creation timestamp.
    Requires a title. Optionally accepts project_id, assignee, and due_date.`,
  inputSchema: {
    type: 'object',
    properties: {
      title: {
        type: 'string',
        description: 'The title of the task',
      },
      project_id: {
        type: 'string',
        description: 'The project to add this task to',
      },
      // ...
    },
    required: ['title'],
  },
};
Enter fullscreen mode Exit fullscreen mode

The description matters—it's what the AI uses to understand when to call your tool.

2. Use Zod for Runtime Validation

TypeScript types vanish at runtime. Use Zod for actual validation:

// Example: validating task input (adapt schema to your domain)
const CreateTaskSchema = z.object({
  title: z.string().min(1, 'Title is required'),
  due_date: z.string().datetime().optional(),
  priority: z.enum(['low', 'medium', 'high']).default('medium'),
});

// In handler
const result = CreateTaskSchema.safeParse(input);
if (!result.success) {
  return formatToolError('VALIDATION_ERROR', result.error.message);
}
Enter fullscreen mode Exit fullscreen mode

This catches bad input before it causes problems downstream.

3. Return AI-Friendly Error Messages

Don't return technical errors to the AI:

// Bad
return { error: 'ECONNREFUSED 127.0.0.1:5432' };

// Good
return formatToolError(
  'SERVICE_UNAVAILABLE',
  'Unable to create task. The service is temporarily unavailable. Please try again.'
);
Enter fullscreen mode Exit fullscreen mode

The AI will relay this to the user—make it understandable.

4. Use Factory Functions for Dependency Injection

Avoid global state. Use factories:

// Don't do this
const apiClient = new ApiClient(process.env.API_KEY);

export function handleCreateTask(input) {
  return apiClient.tasks.create(input); // Hard to test!
}

// Do this
export function createTaskHandler(deps: { apiClient: ApiClient }) {
  return async function handleCreateTask(input) {
    return deps.apiClient.tasks.create(input); // Testable!
  };
}
Enter fullscreen mode Exit fullscreen mode

5. Implement Graceful Degradation

When external services fail, handle it gracefully:

async function handleToolCall(tool: string, input: unknown) {
  try {
    const handler = tools.get(tool);
    if (!handler) {
      return formatToolError('UNKNOWN_TOOL', `Tool "${tool}" not found`);
    }

    const result = await handler(input);
    return result;
  } catch (error) {
    // Log the real error
    logger.error('Tool execution failed', { tool, error });

    // Return a user-friendly message
    return formatToolError(
      'INTERNAL_ERROR',
      'Something went wrong. Please try again.'
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

6. Add Structured Logging from Day One

You'll need it for debugging:

const logger = createLogger({
  service: 'mcp-server',
  level: process.env.LOG_LEVEL || 'info',
});

// In handler
logger.info('Tool invoked', {
  tool: toolName,
  input: sanitizeInput(input),
  requestId: context.awsRequestId,
});
Enter fullscreen mode Exit fullscreen mode

7. Separate Protocol Handling from Business Logic

Keep the MCP protocol handling separate:

src/
├── index.ts          # Lambda handler (thin)
├── mcp/
│   ├── protocol.ts   # JSON-RPC parsing
│   └── server.ts     # MCP routing
└── tools/
    ├── tasks/                      # Example: task management
    │   ├── create-task.tool.ts
    │   ├── list-tasks.tool.ts
    │   └── update-task.tool.ts
    └── projects/                   # Group tools by domain concept
        └── get-project-stats.tool.ts
Enter fullscreen mode Exit fullscreen mode

This makes testing easier and keeps concerns separated.

8. Test with Real AI Assistants Early

Don't wait until the end to test with actual AI:

  1. Deploy to dev
  2. Configure your AI assistant to use it
  3. Try natural language requests
  4. Iterate on tool descriptions

You'll discover that tool descriptions matter more than you think.

9. Implement Rate Limiting at Multiple Levels

Protect yourself:

  • API Gateway: Request throttling
  • Authorizer: Per-key rate limits
  • Tools: Per-operation limits
# serverless.yml
provider:
  apiGateway:
    throttle:
      burstLimit: 200
      rateLimit: 100
Enter fullscreen mode Exit fullscreen mode

10. Monitor Cold Starts

Track them explicitly:

let isColdStart = true;

export async function handler(event, context) {
  if (isColdStart) {
    logger.info('Cold start', { requestId: context.awsRequestId });
    isColdStart = false;
  }
  // ...
}
Enter fullscreen mode Exit fullscreen mode

Use this data to tune warmup settings.


Key Metrics After Running in Production

Here's what I'm seeing after some time:

Metric Value
Average latency (warm) ~100ms
Average latency (cold) ~600ms
Cold start rate <5% (with warmup)
Error rate <0.1%

Conclusion

Building an MCP server on AWS Lambda has been a rewarding journey. The combination of:

  • MCP's standardized protocol for AI interactions
  • Lambda's serverless model for operational simplicity
  • Functional programming principles for maintainability

...creates an architecture that's both powerful and practical.

Is it perfect? No. Cold starts are annoying, debugging requires discipline, and you're locked into AWS. But for the use case of exposing your existing APIs to AI assistants—letting users interact naturally without worrying about the technical details—the benefits far outweigh the drawbacks.

The real magic happens when you see someone ask an AI assistant to "do something" and watch it seamlessly translate that into the right API calls. No documentation lookup, no curl commands, no debugging JSON payloads. Just natural conversation that gets things done.

If you're considering building your own MCP server, I hope this guide helps you avoid some of my early mistakes and gives you a solid foundation to build on.


Resources


Have questions or want to share your own MCP journey? Find me on X @brognilogs or drop a comment below.

Top comments (0)