James Lee

Posted on May 26

Serverless Best Practices: Production Architecture, Stateless Design & Cost Optimization

#architecture #aws #serverless #systemdesign

Over the past six articles, we've covered how Lambda works internally — cold starts, triggers, scaling, traffic routing, automation, and workflow orchestration.

This final article is different. It's not about how Lambda works — it's about how to use it well.

These are the patterns, pitfalls, and architectural decisions that separate a Lambda function that works in a demo from one that runs reliably in production at scale.

1. Function Granularity: How Much Should One Function Do?

The "Function" in FaaS is misleading. In traditional programming, a function is a small, single-purpose unit of code. In serverless, a "function" is better understood as a deployable unit — it can be a single method, a complete feature, an entire module, or even a full web framework.

This flexibility creates a real architectural decision: how much should one Lambda function do?

The Two Failure Modes

Too granular (one Lambda per API endpoint):

Hundreds of functions to manage and monitor
Repeated configuration across functions (IAM roles, VPC settings, env vars)
Higher cold start frequency — each function has its own warm pool
Debugging distributed failures becomes complex

Too coarse (one Lambda for everything):

Memory configuration is dominated by the most expensive operation
High-memory functions cost more even for lightweight requests
A single deployment updates unrelated functionality
Concurrency limits affect all operations equally

Two Practical Principles

Principle 1: Resource Similarity

Group operations that have similar resource requirements into one function. Separate operations with dramatically different requirements.

Example: Brand API with 10 endpoints
├── 9 endpoints: 128MB memory, <100ms, read-only DynamoDB
└── 1 endpoint:  2048MB memory, 30s timeout, runs ML inference

→ Split into two functions:
   brand-api-standard   (128MB, handles 9 endpoints)
   brand-api-ml         (2048MB, handles ML endpoint)

# brand-api-standard/handler.py — lightweight CRUD operations
import boto3
import json

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('brands')

def handler(event, context):
    path = event['rawPath']
    method = event['requestContext']['http']['method']

    routes = {
        ('GET',  '/brand/{id}'):     get_brand,
        ('POST', '/brand'):          create_brand,
        ('PUT',  '/brand/{id}'):     update_brand,
        ('GET',  '/brand/{id}/colors'): get_colors,
        # ... 9 lightweight routes
    }

    handler_fn = routes.get((method, path))
    if not handler_fn:
        return {'statusCode': 404, 'body': json.dumps({'error': 'Not found'})}

    return handler_fn(event)

# brand-api-ml/handler.py — memory-intensive ML operations
import boto3
import torch  # heavy dependency — justified here

# Model loaded once during cold start, reused across warm invocations
model = None

def get_model():
    global model
    if model is None:
        s3 = boto3.client('s3')
        # Download and load model
        model = load_logo_classifier()
    return model

def handler(event, context):
    # Only this function pays the 2048MB memory cost
    classifier = get_model()
    return classify_logo(classifier, event)

Principle 2: Functional Cohesion

Don't bundle fundamentally different concerns into one function, even if their resource requirements are similar.

❌ Bad: one function handles both:
   - WebSocket chat connections (stateful, long-lived)
   - User registration/login (stateless, short-lived)

✅ Good: separate functions:
   brand-chat-handler     (WebSocket connections)
   brand-auth-handler     (registration, login, token refresh)

Cost Impact of Right-Sizing

Memory configuration directly multiplies your bill. Here's a concrete example:

Two functions, each invoked 10,000 times/day, ~100ms duration:

Function A: 1536MB (oversized)
  Cost = (1536/1024) × (100/1000) × 10,000 × $0.0000166667/GB-s
       = 1.5 × 0.1 × 10,000 × $0.0000166667
       ≈ $0.25/day → ~$7.50/month

Function B: 256MB (right-sized)
  Cost = (256/1024) × (100/1000) × 10,000 × $0.0000166667/GB-s
       = 0.25 × 0.1 × 10,000 × $0.0000166667
       ≈ $0.04/day → ~$1.25/month

Right-sizing saves ~83% on that function alone. Multiply across dozens of functions and the savings compound significantly.

# Use AWS Lambda Power Tuning to find the optimal memory setting
# https://github.com/alexcasalboni/aws-lambda-power-tuning
# Run it as a Step Functions workflow — it tests multiple memory configs
# and returns a cost/performance curve

# Quick manual approach: measure actual memory usage
def handler(event, context):
    # After execution, check CloudWatch Logs for:
    # "Max Memory Used: XXX MB"
    # Set your memory config to ~1.5x the max observed usage
    pass

2. Stateless by Design (But Not Naive About It)

Lambda functions are stateless — execution environments are ephemeral and can be recycled at any time. But "stateless" doesn't mean "no shared state ever exists."

What Stateless Actually Means

# ❌ What stateless PREVENTS — don't do this:
request_counter = 0  # This WILL drift — multiple instances, recycled environments

def handler(event, context):
    global request_counter
    request_counter += 1      # unreliable across instances
    return {'count': request_counter}  # meaningless in distributed context

# ✅ What stateless REQUIRES — persist state externally:
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('request-counters')

def handler(event, context):
    # Atomic increment in DynamoDB — correct across all instances
    response = table.update_item(
        Key={'counterId': 'global'},
        UpdateExpression='ADD #count :inc',
        ExpressionAttributeNames={'#count': 'count'},
        ExpressionAttributeValues={':inc': 1},
        ReturnValues='UPDATED_NEW'
    )
    return {'count': int(response['Attributes']['count'])}

Instance Reuse: The "Stateful Stateless" Reality

Lambda recycles execution environments — but not immediately. An environment that handled a request may handle the next one too. This is a feature (warm starts, reusable connections) and a risk (stale state from previous requests).

# ✅ Good: leverage instance reuse for connection pooling
import boto3

# Initialized ONCE per execution environment (not per request)
# Reused across warm invocations — this is intentional and correct
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('brands')
ssm_client = boto3.client('ssm')

# Cache config — valid for the lifetime of this environment
_config_cache = None

def get_config():
    global _config_cache
    if _config_cache is None:
        response = ssm_client.get_parameter(Name='/brand-api/config')
        _config_cache = json.loads(response['Parameter']['Value'])
    return _config_cache

def handler(event, context):
    config = get_config()   # SSM called once, then cached
    result = table.get_item(Key={'brandId': event['brandId']})
    return result.get('Item')

# ❌ Risk: stale temporary files from previous requests
import os
import tempfile

def handler(event, context):
    tmp_path = '/tmp/processing_output.json'

    # ❌ If a previous request created this file and it wasn't cleaned up,
    # this open() call will read stale data from the previous request
    with open(tmp_path, 'r') as f:
        return json.load(f)

# ✅ Safe: use unique filenames per request
import os
import uuid

def handler(event, context):
    # Unique filename per invocation — no collision with previous requests
    tmp_path = f'/tmp/{context.aws_request_id}.json'

    try:
        # Process and write
        with open(tmp_path, 'w') as f:
            json.dump(process(event), f)

        with open(tmp_path, 'r') as f:
            return json.load(f)
    finally:
        # Always clean up — don't leave state for the next request
        if os.path.exists(tmp_path):
            os.remove(tmp_path)

The rule: Use instance reuse intentionally (connection pools, config caches). Guard against it accidentally (temp files, global mutable state).

3. File Handling in Lambda

Lambda's stateless nature changes how you handle file uploads and storage. The traditional pattern of saving files to local disk doesn't work.

Why Local File Storage Fails

/tmp is limited to 512MB (up to 10GB with ephemeral storage configuration)
Files in /tmp are lost when the execution environment is recycled
Multiple concurrent instances each have their own /tmp — no shared filesystem

Pattern 1: S3 Pre-Signed URLs (Recommended for Large Files)

Never route large file uploads through Lambda. Instead, generate a pre-signed S3 URL and let the client upload directly to S3.

# generate_upload_url.py
import boto3
import uuid
import os

s3 = boto3.client('s3')
UPLOAD_BUCKET = os.environ['UPLOAD_BUCKET']

def handler(event, context):
    """
    Client requests an upload URL.
    Lambda generates a pre-signed S3 PUT URL.
    Client uploads directly to S3 — Lambda never touches the file bytes.
    """
    brand_id = event['pathParameters']['brandId']
    content_type = event['queryStringParameters'].get('contentType', 'image/png')

    # Validate content type
    allowed_types = {'image/png', 'image/jpeg', 'image/svg+xml', 'image/webp'}
    if content_type not in allowed_types:
        return {
            'statusCode': 400,
            'body': json.dumps({'error': f'Unsupported type: {content_type}'})
        }

    # Generate unique S3 key
    file_id = str(uuid.uuid4())
    s3_key = f'uploads/{brand_id}/{file_id}'

    # Generate pre-signed URL (valid for 15 minutes)
    upload_url = s3.generate_presigned_url(
        'put_object',
        Params={
            'Bucket': UPLOAD_BUCKET,
            'Key': s3_key,
            'ContentType': content_type,
        },
        ExpiresIn=900  # 15 minutes
    )

    return {
        'statusCode': 200,
        'body': json.dumps({
            'uploadUrl': upload_url,
            'fileId': file_id,
            's3Key': s3_key,
            'expiresIn': 900
        })
    }

Upload flow:
Client → GET /upload-url → Lambda → S3 pre-signed URL → Client
Client → PUT {file bytes} → S3 directly (Lambda not involved)
S3 ObjectCreated event → Lambda (process the uploaded file)

Pattern 2: Base64 for Small Files (Avatars, Icons)

For small files (<1MB), you can accept Base64-encoded content through API Gateway:

# handle_small_upload.py
import base64
import boto3
import uuid

s3 = boto3.client('s3')

def handler(event, context):
    """Handle small file uploads via Base64 encoding"""
    body = json.loads(event['body'])

    file_data = base64.b64decode(body['fileData'])
    content_type = body['contentType']
    brand_id = body['brandId']

    # Size check — API Gateway limit is 10MB, keep it under 1MB for safety
    if len(file_data) > 1 * 1024 * 1024:
        return {
            'statusCode': 413,
            'body': json.dumps({'error': 'File too large. Use pre-signed URL for files >1MB'})
        }

    s3_key = f'logos/{brand_id}/{uuid.uuid4()}'
    s3.put_object(
        Bucket=os.environ['UPLOAD_BUCKET'],
        Key=s3_key,
        Body=file_data,
        ContentType=content_type
    )

    return {
        'statusCode': 200,
        'body': json.dumps({'s3Key': s3_key})
    }

4. WebSocket with Lambda

Lambda is stateless and request-driven — it can't maintain a persistent WebSocket connection itself. But you can implement WebSocket by combining API Gateway WebSocket API with Lambda.

API Gateway maintains the persistent connections; Lambda handles the messages.

Client ←──WebSocket──→ API Gateway ←──events──→ Lambda
         (persistent)   (manages connections)   (stateless)

Three Lambda Handlers for WebSocket

# websocket_handlers.py
import boto3
import json
import os

dynamodb = boto3.resource('dynamodb')
connections_table = dynamodb.Table('websocket-connections')

apigw = boto3.client(
    'apigatewaymanagementapi',
    endpoint_url=os.environ['WEBSOCKET_ENDPOINT']  # e.g., https://abc123.execute-api.us-east-1.amazonaws.com/prod
)


def connect_handler(event, context):
    """
    Called when a client establishes a WebSocket connection.
    Store the connection ID for later message delivery.
    """
    connection_id = event['requestContext']['connectionId']
    brand_id = event['queryStringParameters'].get('brandId', 'anonymous')

    connections_table.put_item(Item={
        'connectionId': connection_id,
        'brandId': brand_id,
        'connectedAt': event['requestContext']['requestTimeEpoch']
    })

    print(f'Client connected: {connection_id} (brand: {brand_id})')
    return {'statusCode': 200}


def disconnect_handler(event, context):
    """Called when a client disconnects."""
    connection_id = event['requestContext']['connectionId']

    connections_table.delete_item(Key={'connectionId': connection_id})

    print(f'Client disconnected: {connection_id}')
    return {'statusCode': 200}


def message_handler(event, context):
    """Called when a client sends a message."""
    connection_id = event['requestContext']['connectionId']
    body = json.loads(event['body'])

    message_type = body.get('type')

    if message_type == 'subscribe_brand':
        brand_id = body['brandId']
        # Update subscription in DynamoDB
        connections_table.update_item(
            Key={'connectionId': connection_id},
            UpdateExpression='SET subscribedBrand = :brand',
            ExpressionAttributeValues={':brand': brand_id}
        )
        # Send acknowledgment back to this client
        send_message(connection_id, {
            'type': 'subscribed',
            'brandId': brand_id
        })

    return {'statusCode': 200}


def send_message(connection_id: str, message: dict):
    """Send a message to a specific connected client."""
    try:
        apigw.post_to_connection(
            ConnectionId=connection_id,
            Data=json.dumps(message).encode('utf-8')
        )
    except apigw.exceptions.GoneException:
        # Client disconnected — clean up stale connection
        connections_table.delete_item(Key={'connectionId': connection_id})


def broadcast_brand_update(brand_id: str, update_data: dict):
    """
    Broadcast a brand update to all subscribed clients.
    Called from other Lambda functions when brand data changes.
    """
    # Find all connections subscribed to this brand
    response = connections_table.scan(
        FilterExpression='subscribedBrand = :brand',
        ExpressionAttributeValues={':brand': brand_id}
    )

    message = {'type': 'brand_updated', 'brandId': brand_id, 'data': update_data}

    for item in response['Items']:
        send_message(item['connectionId'], message)

    print(f'Broadcast to {len(response["Items"])} clients for brand {brand_id}')

# serverless.yml
functions:
  wsConnect:
    handler: websocket_handlers.connect_handler
    events:
      - websocket:
          route: $connect

  wsDisconnect:
    handler: websocket_handlers.disconnect_handler
    events:
      - websocket:
          route: $disconnect

  wsMessage:
    handler: websocket_handlers.message_handler
    events:
      - websocket:
          route: $default

5. Lambda Extensions: Graceful Lifecycle Management

AWS Lambda Extensions allow you to run code alongside your function — for flushing metrics, closing connections, and handling graceful shutdown. This is Lambda's equivalent of Kubernetes lifecycle hooks (preStop, postStart).

The Problem They Solve

# ❌ Without extensions: metrics may be lost
import datadog

def handler(event, context):
    result = process_brand(event)

    # This metric send is async — if Lambda freezes the environment
    # immediately after handler returns, the metric may never arrive
    datadog.statsd.increment('brand.processed')

    return result

# ✅ With Lambda Extensions: flush metrics before freeze/shutdown
# extensions/metrics_flusher.py — runs as a separate process alongside your function

import http.server
import urllib.request
import json

class ExtensionHandler(http.server.BaseHTTPRequestHandler):

    def do_GET(self):
        if self.path == '/pre-freeze':
            # Called before Lambda freezes this environment
            # Flush all pending metrics synchronously
            flush_metrics_to_datadog()
            self.send_response(200)
            self.end_headers()

        elif self.path == '/pre-stop':
            # Called before Lambda terminates this environment
            # Close database connections, flush logs, update status
            close_db_connections()
            flush_final_metrics()
            self.send_response(200)
            self.end_headers()

    def log_message(self, format, *args):
        pass  # suppress default logging


def flush_metrics_to_datadog():
    """Ensure all buffered metrics are sent before environment freezes"""
    # Implementation: flush your metrics client's buffer
    print('Pre-freeze: flushing metrics buffer')
    # datadog.statsd.flush()


def close_db_connections():
    """Gracefully close connections before environment is terminated"""
    print('Pre-stop: closing database connections')
    # db_pool.close_all()

# serverless.yml — attach the extension
functions:
  brandApi:
    handler: handler.handler
    layers:
      - !Ref MetricsFlusherExtensionLayer  # your extension as a Lambda Layer

resources:
  Resources:
    MetricsFlusherExtensionLayer:
      Type: AWS::Lambda::LayerVersion
      Properties:
        LayerName: metrics-flusher-extension
        Content:
          S3Bucket: your-deployment-bucket
          S3Key: extensions/metrics-flusher.zip
        CompatibleRuntimes:
          - python3.12

6. Static Assets: Keep Them Out of Lambda

A common mistake when migrating existing applications to Lambda: routing static asset requests through your Lambda function.

❌ Bad architecture:
Client → API Gateway → Lambda → returns CSS/JS/images
         (every asset request consumes Lambda concurrency and costs money)

✅ Good architecture:
Client → CloudFront → S3 (static assets: CSS, JS, images)
Client → CloudFront → API Gateway → Lambda (API calls only)

# serverless.yml — separate static assets from API
resources:
  Resources:
    StaticAssetsBucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: brand-platform-static

    CloudFrontDistribution:
      Type: AWS::CloudFront::Distribution
      Properties:
        DistributionConfig:
          Origins:
            - Id: StaticAssets
              DomainName: !GetAtt StaticAssetsBucket.DomainName
              S3OriginConfig: {}
            - Id: BrandApi
              DomainName: !Sub '${ApiGateway}.execute-api.${AWS::Region}.amazonaws.com'
              CustomOriginConfig:
                HTTPSPort: 443
                OriginProtocolPolicy: https-only
          CacheBehaviors:
            - PathPattern: '/api/*'
              TargetOriginId: BrandApi
              CachePolicyId: 4135ea2d-6df8-44a3-9df3-4b5a84be39ad  # CachingDisabled
              ViewerProtocolPolicy: https-only
          DefaultCacheBehavior:
            TargetOriginId: StaticAssets
            ViewerProtocolPolicy: https-only
            CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639e58f6   # CachingOptimized

Production Readiness Checklist

Before deploying a Lambda function to production, verify:

Architecture

[ ] Function granularity follows resource similarity + functional cohesion principles
[ ] Static assets served from S3 + CloudFront, not Lambda
[ ] Heavy operations (ML inference, video processing) in separate functions

Stateless Design

[ ] No persistent state stored in global variables across requests
[ ] Temp files use unique names (/tmp/{request_id}.ext) and are cleaned up
[ ] Connection pools and config caches are intentionally reused (not accidentally shared)

Cost Optimization

[ ] Memory configured based on measured usage (not default 128MB or maximum 3008MB)
[ ] Timeout set to realistic maximum (not default 3s or maximum 15min)
[ ] Reserved concurrency set where appropriate to cap costs and protect downstream

Reliability

[ ] DLQ or failure destination configured for all async functions
[ ] Retry logic defined for all Task states (if using Step Functions)
[ ] CloudWatch alarms on error rate, throttles, and duration P99

Security

[ ] IAM role follows least-privilege (no * actions unless justified)
[ ] Secrets in Secrets Manager or Parameter Store, not environment variables
[ ] VPC only configured where genuinely needed

Observability

[ ] Structured logging (JSON) for CloudWatch Logs Insights queries
[ ] X-Ray tracing enabled for latency debugging
[ ] Custom metrics for business-level monitoring

Summary: The Mental Model

After six articles, here's the mental model that ties everything together:

Lambda function = a stateless, event-driven compute unit

Trigger     → defines invocation model (sync vs async)
             → determines retry behavior and error routing

Concurrency → scales automatically, but has limits
             → control with reserved concurrency + Provisioned Concurrency

State       → lives outside Lambda (DynamoDB, S3, ElastiCache)
             → execution environment reuse is a performance feature, not a state store

Cost        → memory × duration × invocations
             → right-size memory, minimize duration, avoid unnecessary invocations

Reliability → DLQ for async, Catch/Retry for Step Functions
             → idempotent handlers for at-least-once delivery

Deployment  → always use aliases, never $LATEST in production
             → canary + CloudWatch alarms for safe rollouts

Serverless doesn't eliminate operational complexity — it relocates it. The infrastructure concerns move to AWS; the architectural concerns move to you. Understanding Lambda's internals — the cold start pipeline, the concurrency model, the invocation types, the scaling mechanics — is what lets you make those architectural decisions confidently.

Build small. Scale automatically. Fail gracefully.

This concludes the **Serverless Internals: How AWS Lambda Really Works* series.*

If you found this series useful, consider following for more content on AWS architecture, LLM engineering, and production AI systems.

DEV Community