Over the past six articles, we've covered how Lambda works internally — cold starts, triggers, scaling, traffic routing, automation, and workflow orchestration.
This final article is different. It's not about how Lambda works — it's about how to use it well.
These are the patterns, pitfalls, and architectural decisions that separate a Lambda function that works in a demo from one that runs reliably in production at scale.
1. Function Granularity: How Much Should One Function Do?
The "Function" in FaaS is misleading. In traditional programming, a function is a small, single-purpose unit of code. In serverless, a "function" is better understood as a deployable unit — it can be a single method, a complete feature, an entire module, or even a full web framework.
This flexibility creates a real architectural decision: how much should one Lambda function do?
The Two Failure Modes
Too granular (one Lambda per API endpoint):
- Hundreds of functions to manage and monitor
- Repeated configuration across functions (IAM roles, VPC settings, env vars)
- Higher cold start frequency — each function has its own warm pool
- Debugging distributed failures becomes complex
Too coarse (one Lambda for everything):
- Memory configuration is dominated by the most expensive operation
- High-memory functions cost more even for lightweight requests
- A single deployment updates unrelated functionality
- Concurrency limits affect all operations equally
Two Practical Principles
Principle 1: Resource Similarity
Group operations that have similar resource requirements into one function. Separate operations with dramatically different requirements.
Example: Brand API with 10 endpoints
├── 9 endpoints: 128MB memory, <100ms, read-only DynamoDB
└── 1 endpoint: 2048MB memory, 30s timeout, runs ML inference
→ Split into two functions:
brand-api-standard (128MB, handles 9 endpoints)
brand-api-ml (2048MB, handles ML endpoint)
# brand-api-standard/handler.py — lightweight CRUD operations
import boto3
import json
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('brands')
def handler(event, context):
path = event['rawPath']
method = event['requestContext']['http']['method']
routes = {
('GET', '/brand/{id}'): get_brand,
('POST', '/brand'): create_brand,
('PUT', '/brand/{id}'): update_brand,
('GET', '/brand/{id}/colors'): get_colors,
# ... 9 lightweight routes
}
handler_fn = routes.get((method, path))
if not handler_fn:
return {'statusCode': 404, 'body': json.dumps({'error': 'Not found'})}
return handler_fn(event)
# brand-api-ml/handler.py — memory-intensive ML operations
import boto3
import torch # heavy dependency — justified here
# Model loaded once during cold start, reused across warm invocations
model = None
def get_model():
global model
if model is None:
s3 = boto3.client('s3')
# Download and load model
model = load_logo_classifier()
return model
def handler(event, context):
# Only this function pays the 2048MB memory cost
classifier = get_model()
return classify_logo(classifier, event)
Principle 2: Functional Cohesion
Don't bundle fundamentally different concerns into one function, even if their resource requirements are similar.
❌ Bad: one function handles both:
- WebSocket chat connections (stateful, long-lived)
- User registration/login (stateless, short-lived)
✅ Good: separate functions:
brand-chat-handler (WebSocket connections)
brand-auth-handler (registration, login, token refresh)
Cost Impact of Right-Sizing
Memory configuration directly multiplies your bill. Here's a concrete example:
Two functions, each invoked 10,000 times/day, ~100ms duration:
Function A: 1536MB (oversized)
Cost = (1536/1024) × (100/1000) × 10,000 × $0.0000166667/GB-s
= 1.5 × 0.1 × 10,000 × $0.0000166667
≈ $0.25/day → ~$7.50/month
Function B: 256MB (right-sized)
Cost = (256/1024) × (100/1000) × 10,000 × $0.0000166667/GB-s
= 0.25 × 0.1 × 10,000 × $0.0000166667
≈ $0.04/day → ~$1.25/month
Right-sizing saves ~83% on that function alone. Multiply across dozens of functions and the savings compound significantly.
# Use AWS Lambda Power Tuning to find the optimal memory setting
# https://github.com/alexcasalboni/aws-lambda-power-tuning
# Run it as a Step Functions workflow — it tests multiple memory configs
# and returns a cost/performance curve
# Quick manual approach: measure actual memory usage
def handler(event, context):
# After execution, check CloudWatch Logs for:
# "Max Memory Used: XXX MB"
# Set your memory config to ~1.5x the max observed usage
pass
2. Stateless by Design (But Not Naive About It)
Lambda functions are stateless — execution environments are ephemeral and can be recycled at any time. But "stateless" doesn't mean "no shared state ever exists."
What Stateless Actually Means
# ❌ What stateless PREVENTS — don't do this:
request_counter = 0 # This WILL drift — multiple instances, recycled environments
def handler(event, context):
global request_counter
request_counter += 1 # unreliable across instances
return {'count': request_counter} # meaningless in distributed context
# ✅ What stateless REQUIRES — persist state externally:
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('request-counters')
def handler(event, context):
# Atomic increment in DynamoDB — correct across all instances
response = table.update_item(
Key={'counterId': 'global'},
UpdateExpression='ADD #count :inc',
ExpressionAttributeNames={'#count': 'count'},
ExpressionAttributeValues={':inc': 1},
ReturnValues='UPDATED_NEW'
)
return {'count': int(response['Attributes']['count'])}
Instance Reuse: The "Stateful Stateless" Reality
Lambda recycles execution environments — but not immediately. An environment that handled a request may handle the next one too. This is a feature (warm starts, reusable connections) and a risk (stale state from previous requests).
# ✅ Good: leverage instance reuse for connection pooling
import boto3
# Initialized ONCE per execution environment (not per request)
# Reused across warm invocations — this is intentional and correct
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('brands')
ssm_client = boto3.client('ssm')
# Cache config — valid for the lifetime of this environment
_config_cache = None
def get_config():
global _config_cache
if _config_cache is None:
response = ssm_client.get_parameter(Name='/brand-api/config')
_config_cache = json.loads(response['Parameter']['Value'])
return _config_cache
def handler(event, context):
config = get_config() # SSM called once, then cached
result = table.get_item(Key={'brandId': event['brandId']})
return result.get('Item')
# ❌ Risk: stale temporary files from previous requests
import os
import tempfile
def handler(event, context):
tmp_path = '/tmp/processing_output.json'
# ❌ If a previous request created this file and it wasn't cleaned up,
# this open() call will read stale data from the previous request
with open(tmp_path, 'r') as f:
return json.load(f)
# ✅ Safe: use unique filenames per request
import os
import uuid
def handler(event, context):
# Unique filename per invocation — no collision with previous requests
tmp_path = f'/tmp/{context.aws_request_id}.json'
try:
# Process and write
with open(tmp_path, 'w') as f:
json.dump(process(event), f)
with open(tmp_path, 'r') as f:
return json.load(f)
finally:
# Always clean up — don't leave state for the next request
if os.path.exists(tmp_path):
os.remove(tmp_path)
The rule: Use instance reuse intentionally (connection pools, config caches). Guard against it accidentally (temp files, global mutable state).
3. File Handling in Lambda
Lambda's stateless nature changes how you handle file uploads and storage. The traditional pattern of saving files to local disk doesn't work.
Why Local File Storage Fails
-
/tmpis limited to 512MB (up to 10GB with ephemeral storage configuration) - Files in
/tmpare lost when the execution environment is recycled - Multiple concurrent instances each have their own
/tmp— no shared filesystem
Pattern 1: S3 Pre-Signed URLs (Recommended for Large Files)
Never route large file uploads through Lambda. Instead, generate a pre-signed S3 URL and let the client upload directly to S3.
# generate_upload_url.py
import boto3
import uuid
import os
s3 = boto3.client('s3')
UPLOAD_BUCKET = os.environ['UPLOAD_BUCKET']
def handler(event, context):
"""
Client requests an upload URL.
Lambda generates a pre-signed S3 PUT URL.
Client uploads directly to S3 — Lambda never touches the file bytes.
"""
brand_id = event['pathParameters']['brandId']
content_type = event['queryStringParameters'].get('contentType', 'image/png')
# Validate content type
allowed_types = {'image/png', 'image/jpeg', 'image/svg+xml', 'image/webp'}
if content_type not in allowed_types:
return {
'statusCode': 400,
'body': json.dumps({'error': f'Unsupported type: {content_type}'})
}
# Generate unique S3 key
file_id = str(uuid.uuid4())
s3_key = f'uploads/{brand_id}/{file_id}'
# Generate pre-signed URL (valid for 15 minutes)
upload_url = s3.generate_presigned_url(
'put_object',
Params={
'Bucket': UPLOAD_BUCKET,
'Key': s3_key,
'ContentType': content_type,
},
ExpiresIn=900 # 15 minutes
)
return {
'statusCode': 200,
'body': json.dumps({
'uploadUrl': upload_url,
'fileId': file_id,
's3Key': s3_key,
'expiresIn': 900
})
}
Upload flow:
Client → GET /upload-url → Lambda → S3 pre-signed URL → Client
Client → PUT {file bytes} → S3 directly (Lambda not involved)
S3 ObjectCreated event → Lambda (process the uploaded file)
Pattern 2: Base64 for Small Files (Avatars, Icons)
For small files (<1MB), you can accept Base64-encoded content through API Gateway:
# handle_small_upload.py
import base64
import boto3
import uuid
s3 = boto3.client('s3')
def handler(event, context):
"""Handle small file uploads via Base64 encoding"""
body = json.loads(event['body'])
file_data = base64.b64decode(body['fileData'])
content_type = body['contentType']
brand_id = body['brandId']
# Size check — API Gateway limit is 10MB, keep it under 1MB for safety
if len(file_data) > 1 * 1024 * 1024:
return {
'statusCode': 413,
'body': json.dumps({'error': 'File too large. Use pre-signed URL for files >1MB'})
}
s3_key = f'logos/{brand_id}/{uuid.uuid4()}'
s3.put_object(
Bucket=os.environ['UPLOAD_BUCKET'],
Key=s3_key,
Body=file_data,
ContentType=content_type
)
return {
'statusCode': 200,
'body': json.dumps({'s3Key': s3_key})
}
4. WebSocket with Lambda
Lambda is stateless and request-driven — it can't maintain a persistent WebSocket connection itself. But you can implement WebSocket by combining API Gateway WebSocket API with Lambda.
API Gateway maintains the persistent connections; Lambda handles the messages.
Client ←──WebSocket──→ API Gateway ←──events──→ Lambda
(persistent) (manages connections) (stateless)
Three Lambda Handlers for WebSocket
# websocket_handlers.py
import boto3
import json
import os
dynamodb = boto3.resource('dynamodb')
connections_table = dynamodb.Table('websocket-connections')
apigw = boto3.client(
'apigatewaymanagementapi',
endpoint_url=os.environ['WEBSOCKET_ENDPOINT'] # e.g., https://abc123.execute-api.us-east-1.amazonaws.com/prod
)
def connect_handler(event, context):
"""
Called when a client establishes a WebSocket connection.
Store the connection ID for later message delivery.
"""
connection_id = event['requestContext']['connectionId']
brand_id = event['queryStringParameters'].get('brandId', 'anonymous')
connections_table.put_item(Item={
'connectionId': connection_id,
'brandId': brand_id,
'connectedAt': event['requestContext']['requestTimeEpoch']
})
print(f'Client connected: {connection_id} (brand: {brand_id})')
return {'statusCode': 200}
def disconnect_handler(event, context):
"""Called when a client disconnects."""
connection_id = event['requestContext']['connectionId']
connections_table.delete_item(Key={'connectionId': connection_id})
print(f'Client disconnected: {connection_id}')
return {'statusCode': 200}
def message_handler(event, context):
"""Called when a client sends a message."""
connection_id = event['requestContext']['connectionId']
body = json.loads(event['body'])
message_type = body.get('type')
if message_type == 'subscribe_brand':
brand_id = body['brandId']
# Update subscription in DynamoDB
connections_table.update_item(
Key={'connectionId': connection_id},
UpdateExpression='SET subscribedBrand = :brand',
ExpressionAttributeValues={':brand': brand_id}
)
# Send acknowledgment back to this client
send_message(connection_id, {
'type': 'subscribed',
'brandId': brand_id
})
return {'statusCode': 200}
def send_message(connection_id: str, message: dict):
"""Send a message to a specific connected client."""
try:
apigw.post_to_connection(
ConnectionId=connection_id,
Data=json.dumps(message).encode('utf-8')
)
except apigw.exceptions.GoneException:
# Client disconnected — clean up stale connection
connections_table.delete_item(Key={'connectionId': connection_id})
def broadcast_brand_update(brand_id: str, update_data: dict):
"""
Broadcast a brand update to all subscribed clients.
Called from other Lambda functions when brand data changes.
"""
# Find all connections subscribed to this brand
response = connections_table.scan(
FilterExpression='subscribedBrand = :brand',
ExpressionAttributeValues={':brand': brand_id}
)
message = {'type': 'brand_updated', 'brandId': brand_id, 'data': update_data}
for item in response['Items']:
send_message(item['connectionId'], message)
print(f'Broadcast to {len(response["Items"])} clients for brand {brand_id}')
# serverless.yml
functions:
wsConnect:
handler: websocket_handlers.connect_handler
events:
- websocket:
route: $connect
wsDisconnect:
handler: websocket_handlers.disconnect_handler
events:
- websocket:
route: $disconnect
wsMessage:
handler: websocket_handlers.message_handler
events:
- websocket:
route: $default
5. Lambda Extensions: Graceful Lifecycle Management
AWS Lambda Extensions allow you to run code alongside your function — for flushing metrics, closing connections, and handling graceful shutdown. This is Lambda's equivalent of Kubernetes lifecycle hooks (preStop, postStart).
The Problem They Solve
# ❌ Without extensions: metrics may be lost
import datadog
def handler(event, context):
result = process_brand(event)
# This metric send is async — if Lambda freezes the environment
# immediately after handler returns, the metric may never arrive
datadog.statsd.increment('brand.processed')
return result
# ✅ With Lambda Extensions: flush metrics before freeze/shutdown
# extensions/metrics_flusher.py — runs as a separate process alongside your function
import http.server
import urllib.request
import json
class ExtensionHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
if self.path == '/pre-freeze':
# Called before Lambda freezes this environment
# Flush all pending metrics synchronously
flush_metrics_to_datadog()
self.send_response(200)
self.end_headers()
elif self.path == '/pre-stop':
# Called before Lambda terminates this environment
# Close database connections, flush logs, update status
close_db_connections()
flush_final_metrics()
self.send_response(200)
self.end_headers()
def log_message(self, format, *args):
pass # suppress default logging
def flush_metrics_to_datadog():
"""Ensure all buffered metrics are sent before environment freezes"""
# Implementation: flush your metrics client's buffer
print('Pre-freeze: flushing metrics buffer')
# datadog.statsd.flush()
def close_db_connections():
"""Gracefully close connections before environment is terminated"""
print('Pre-stop: closing database connections')
# db_pool.close_all()
# serverless.yml — attach the extension
functions:
brandApi:
handler: handler.handler
layers:
- !Ref MetricsFlusherExtensionLayer # your extension as a Lambda Layer
resources:
Resources:
MetricsFlusherExtensionLayer:
Type: AWS::Lambda::LayerVersion
Properties:
LayerName: metrics-flusher-extension
Content:
S3Bucket: your-deployment-bucket
S3Key: extensions/metrics-flusher.zip
CompatibleRuntimes:
- python3.12
6. Static Assets: Keep Them Out of Lambda
A common mistake when migrating existing applications to Lambda: routing static asset requests through your Lambda function.
❌ Bad architecture:
Client → API Gateway → Lambda → returns CSS/JS/images
(every asset request consumes Lambda concurrency and costs money)
✅ Good architecture:
Client → CloudFront → S3 (static assets: CSS, JS, images)
Client → CloudFront → API Gateway → Lambda (API calls only)
# serverless.yml — separate static assets from API
resources:
Resources:
StaticAssetsBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: brand-platform-static
CloudFrontDistribution:
Type: AWS::CloudFront::Distribution
Properties:
DistributionConfig:
Origins:
- Id: StaticAssets
DomainName: !GetAtt StaticAssetsBucket.DomainName
S3OriginConfig: {}
- Id: BrandApi
DomainName: !Sub '${ApiGateway}.execute-api.${AWS::Region}.amazonaws.com'
CustomOriginConfig:
HTTPSPort: 443
OriginProtocolPolicy: https-only
CacheBehaviors:
- PathPattern: '/api/*'
TargetOriginId: BrandApi
CachePolicyId: 4135ea2d-6df8-44a3-9df3-4b5a84be39ad # CachingDisabled
ViewerProtocolPolicy: https-only
DefaultCacheBehavior:
TargetOriginId: StaticAssets
ViewerProtocolPolicy: https-only
CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639e58f6 # CachingOptimized
Production Readiness Checklist
Before deploying a Lambda function to production, verify:
Architecture
- [ ] Function granularity follows resource similarity + functional cohesion principles
- [ ] Static assets served from S3 + CloudFront, not Lambda
- [ ] Heavy operations (ML inference, video processing) in separate functions
Stateless Design
- [ ] No persistent state stored in global variables across requests
- [ ] Temp files use unique names (
/tmp/{request_id}.ext) and are cleaned up - [ ] Connection pools and config caches are intentionally reused (not accidentally shared)
Cost Optimization
- [ ] Memory configured based on measured usage (not default 128MB or maximum 3008MB)
- [ ] Timeout set to realistic maximum (not default 3s or maximum 15min)
- [ ] Reserved concurrency set where appropriate to cap costs and protect downstream
Reliability
- [ ] DLQ or failure destination configured for all async functions
- [ ] Retry logic defined for all Task states (if using Step Functions)
- [ ] CloudWatch alarms on error rate, throttles, and duration P99
Security
- [ ] IAM role follows least-privilege (no
*actions unless justified) - [ ] Secrets in Secrets Manager or Parameter Store, not environment variables
- [ ] VPC only configured where genuinely needed
Observability
- [ ] Structured logging (JSON) for CloudWatch Logs Insights queries
- [ ] X-Ray tracing enabled for latency debugging
- [ ] Custom metrics for business-level monitoring
Summary: The Mental Model
After six articles, here's the mental model that ties everything together:
Lambda function = a stateless, event-driven compute unit
Trigger → defines invocation model (sync vs async)
→ determines retry behavior and error routing
Concurrency → scales automatically, but has limits
→ control with reserved concurrency + Provisioned Concurrency
State → lives outside Lambda (DynamoDB, S3, ElastiCache)
→ execution environment reuse is a performance feature, not a state store
Cost → memory × duration × invocations
→ right-size memory, minimize duration, avoid unnecessary invocations
Reliability → DLQ for async, Catch/Retry for Step Functions
→ idempotent handlers for at-least-once delivery
Deployment → always use aliases, never $LATEST in production
→ canary + CloudWatch alarms for safe rollouts
Serverless doesn't eliminate operational complexity — it relocates it. The infrastructure concerns move to AWS; the architectural concerns move to you. Understanding Lambda's internals — the cold start pipeline, the concurrency model, the invocation types, the scaling mechanics — is what lets you make those architectural decisions confidently.
Build small. Scale automatically. Fail gracefully.
This concludes the **Serverless Internals: How AWS Lambda Really Works* series.*
If you found this series useful, consider following for more content on AWS architecture, LLM engineering, and production AI systems.
Top comments (0)