TL;DR: Here's the math that pushed me off EC2: a
t3. microinstance runs you roughly $8-10/month whether it serves zero requests or ten thousand.
📖 Reading time: ~30 min
What's in this article
- The Problem: Your Flask App Works Locally, Now What?
- How Lambda Actually Runs Your Flask App (Understand This First)
- Prerequisites and Local Setup
- Step 1 — Build a Minimal Flask App Worth Deploying
- Step 2 — Package Dependencies Correctly (Where Most Tutorials Fail)
- Step 3 — Write the SAM Template (template.yaml)
- Step 4 — Deploy and Test
- Gotchas I Hit That Aren't in the AWS Docs
The Problem: Your Flask App Works Locally, Now What?
Here's the math that pushed me off EC2: a t3.micro instance runs you roughly $8-10/month whether it serves zero requests or ten thousand. Lambda charges $0.20 per million invocations plus compute time billed in 1ms increments. If your API gets sporadic traffic — a webhook receiver, an internal tool, a side project that spikes once a week — you're essentially paying an idle tax on EC2. I had a Flask API sitting on a t3.small handling maybe 500 requests a day. Moved it to Lambda, my monthly bill dropped to cents. That's not a hypothetical — that's the bill I actually saw.
The catch nobody tells you upfront: Flask expects a persistent server loop. It binds to a port, listens, handles requests one after another. Lambda doesn't work that way. Lambda receives an event dict, runs your handler function, returns a response, and shuts down. There's no port, no persistent socket, no app.run(). So you can't just upload your Flask app and call it done — you need something in between that translates Lambda's event format into WSGI-compatible HTTP requests that Flask actually understands. That adapter is Mangum, and it's the piece most tutorials either skip over or treat as magic. It's not magic, and understanding what it does will save you hours of debugging.
Here's exactly what this guide walks through, in order:
- Wrapping your existing Flask app with Mangum — three lines of code, but the config matters
- Packaging your dependencies correctly so Lambda doesn't throw
ModuleNotFoundErrorat runtime - Deploying with AWS SAM (not the console, not raw CloudFormation — SAM is the right tool here)
- Handling the gotchas I hit that aren't in the official docs: binary responses, cold starts with large packages, and the IAM permissions that silently fail
If you're also looking for tools to write this code faster, check out Best AI Coding Tools in 2026 (thorough Guide).
One honest trade-off before you commit to this path: Lambda has a 15-minute execution limit and a 10MB response payload cap. If your Flask app streams large files, processes long-running jobs, or maintains WebSocket connections — Lambda is the wrong tool. Full stop. But for REST APIs, webhooks, and request-response patterns under a few seconds of compute, it's genuinely excellent and the operational overhead is close to zero. No patching, no SSH, no "who restarted nginx at 3am" incidents.
The thing that caught me off guard the first time was the cold start behavior with heavier Flask apps. A fresh Lambda container — one that hasn't been invoked recently — has to import your entire application before handling the first request. With a lean Flask app, that's 200-400ms. Add SQLAlchemy, boto3, and a few other dependencies and you can hit 2-3 seconds on cold starts. That's not acceptable for a user-facing API. There are real ways to mitigate it (provisioned concurrency, keeping your dependency tree lean, lazy imports) and I'll show you exactly where in the deployment config to set those up.
How Lambda Actually Runs Your Flask App (Understand This First)
Flask's built-in dev server — the one that starts when you call app.run() — is completely irrelevant on Lambda. Lambda doesn't start a long-running server process. It calls a single Python function. That's it. Your entire mental model of "a server listening on port 5000" needs to go out the window before you write a single line of deployment config.
What Lambda actually expects is a handler function with this exact signature:
def handler(event, context):
# event is a dict — API Gateway packs the HTTP request in here
# context has runtime metadata (time remaining, function name, etc.)
return {
"statusCode": 200,
"body": "Hello from Lambda"
}
API Gateway translates an incoming HTTP request into that event dict — it contains the path, method, headers, query params, body, everything. Your job is to return a dict with at minimum a statusCode and a body. Flask has no idea this format exists. That's where Mangum comes in. Mangum is a WSGI/ASGI adapter that sits between Lambda's handler interface and Flask's WSGI interface. You wrap your Flask app like this:
from flask import Flask
from mangum import Mangum
app = Flask(__name__)
@app.route("/")
def index():
return {"message": "ok"}
handler = Mangum(app)
Mangum takes the raw event and context from Lambda, translates them into a valid WSGI environ dict, runs your Flask app against that fake request, then translates Flask's response back into the API Gateway response format. The thing that caught me off guard the first time was that Mangum handles both API Gateway v1 (REST APIs) and v2 (HTTP APIs) payload formats — and they're different. If you're getting 502 errors with no useful log output, the first thing I'd check is whether your API Gateway payload format version matches what Mangum expects. You can force it explicitly: Mangum(app, lifespan="off", api_gateway_base_path="/").
Cold starts deserve a concrete explanation rather than hand-waving. When Lambda hasn't run your function recently, it spins up a new execution environment from scratch: it pulls your deployment package, unzips it, starts the Python runtime, then runs all your top-level imports. Every import statement at module level adds latency here. A bare Flask app with Mangum cold-starts in roughly 400-800ms. Add SQLAlchemy, Boto3, Pandas, and a few other libraries and you can push past 3-4 seconds easily. I've seen teams accidentally import their entire ORM and connection pool at the top of the file, then wonder why every first request feels broken. The fix is lazy imports — defer anything heavy until it's actually needed inside the handler, not at module load time.
The execution environment has one more constraint that bites people: almost everything in your deployment package is mounted read-only. If your Flask app tries to write a log file, cache something to disk, or generate a temp file somewhere other than /tmp, it will crash with a permission error. /tmp is the only writable directory, and you get 512MB there by default (configurable up to 10GB if you need it). That limit applies per-execution-environment — not per invocation. If Lambda reuses a warm container, whatever you wrote to /tmp last time might still be there, which can be a feature or a bug depending on what you're doing. Use it for caching things like downloaded model weights or compiled templates, but never assume it's empty at the start of a request.
Prerequisites and Local Setup
The SAM CLI install is the first place people trip up. Do not install it via pip — the PyPI version lags months behind the official releases and you'll hit bugs that are already fixed upstream. On Mac, use Homebrew:
brew install aws-sam-cli
On Windows, grab the MSI installer from the official GitHub releases page. After installing, verify you're on a recent version:
sam --version
# Should output something like: SAM CLI, version 1.120.0
If you're still seeing 1.x versions from 2022, uninstall whatever you have and start fresh. I've wasted hours debugging sam local invoke failures that turned out to be version mismatches between the CLI and the CloudFormation transform.
The Tool Stack You Actually Need
- Python 3.11+ — Lambda supports 3.12 now. Use 3.11 minimum; it's noticeably faster than 3.9 for cold starts due to interpreter improvements. Avoid 3.8 and 3.9, they're near end-of-life on Lambda.
- AWS CLI v2 — not v1. Run
aws --versionand confirm it saysaws-cli/2.x.x. The v1 and v2 config formats differ slightly and you'll get confusing auth errors if you mix them. - AWS SAM CLI — this handles packaging, local emulation, and deployment. It wraps CloudFormation under the hood.
- Docker Desktop — required for
sam local invokeandsam local start-api. SAM spins up a Lambda-like container locally. Without Docker running, local testing just fails silently in some versions. - An AWS account with IAM permissions covering Lambda, API Gateway, S3, CloudFormation, and IAM role creation. PowerUserAccess works. A scoped policy is better in production but start with PowerUser to avoid fighting permissions while learning the workflow.
Configuring AWS Credentials
Run aws configure and provide your IAM user's access key ID, secret, default region (use us-east-1 if you have no preference — it gets new features first and has the best pricing documentation), and output format (json). This writes to ~/.aws/credentials and ~/.aws/config. One thing that caught me off guard early on: SAM uses the same credential chain as the CLI, so if you have multiple AWS profiles, be explicit — prefix every SAM command with AWS_PROFILE=myprofile sam deploy or set export AWS_PROFILE=myprofile in your shell session. Forgetting this deploys to the wrong account and you won't notice until you check the console and see nothing there.
aws configure
# AWS Access Key ID: AKIA...
# AWS Secret Access Key: ****
# Default region name: us-east-1
# Default output format: json
# Verify it works
aws sts get-caller-identity
If get-caller-identity returns your account ID and IAM user ARN, you're connected. If it throws an auth error, fix credentials before touching SAM — everything downstream depends on this working cleanly.
Project Structure Before You Deploy
Get your local structure right before touching any AWS resources. Here's what you're aiming for:
my-flask-app/
├── template.yaml # SAM/CloudFormation template
├── samconfig.toml # SAM deployment config (auto-generated on first deploy)
├── app/
│ ├── __init__.py
│ ├── app.py # Flask app entry point
│ └── requirements.txt # App dependencies only
└── events/
└── event.json # Sample API Gateway event for local testing
Keep your requirements.txt inside the app/ directory, not the project root. SAM's build process looks for it relative to the CodeUri you specify in template.yaml, and if you put it at the root you'll either package unnecessary dev dependencies or watch the build silently skip your deps. The events/ folder is optional but useful — store sample API Gateway payloads there so you can test with sam local invoke MyFunction -e events/event.json without constructing the JSON by hand every time.
Step 1 — Build a Minimal Flask App Worth Deploying
The single line that makes Flask work on Lambda is handler = Mangum(app). That's it. Mangum translates API Gateway's event format into a WSGI-compatible request that Flask understands, then packages the response back up. Without it, Lambda gets an HTTP event object and your Flask app has no idea what to do with it. I wasted two hours trying to write that translation layer myself before I found Mangum — don't repeat that mistake.
Here's the actual app.py you're deploying. Two routes: one trivial, one that hits a database, because that's what forces your packaging to matter:
import os
import psycopg2
from flask import Flask, jsonify
from mangum import Mangum
app = Flask(__name__)
@app.route("/health")
def health():
return jsonify({"status": "ok"})
@app.route("/users/")
def get_user(user_id):
conn = psycopg2.connect(os.environ["DATABASE_URL"])
cur = conn.cursor()
cur.execute("SELECT id, email FROM users WHERE id = %s", (user_id,))
row = cur.fetchone()
conn.close()
if not row:
return jsonify({"error": "not found"}), 404
return jsonify({"id": row[0], "email": row[1]})
handler = Mangum(app)
Notice the DATABASE_URL comes from an environment variable, not a config file checked into git. Lambda lets you set env vars directly in the function configuration — that's where secrets live, not in your deployment package. If you're hardcoding credentials in app.py, stop and fix that before anything else.
Your requirements.txt should look exactly like this — pinned, nothing extra:
Flask==3.0.3
mangum==0.17.0
psycopg2-binary==2.9.9
Pin everything. Lambda's runtime environment doesn't auto-update your dependencies, but the thing that catches people is drift in the other direction — you update locally without thinking, re-deploy, and suddenly you're running Flask 3.1 in Lambda with a requirements.txt that says 3.0.3. Lambda will not warn you. It just runs whatever's in the package you uploaded. Explicit version pins mean your local environment and Lambda are identical, which is the only state worth being in.
The most expensive mistake I see in Lambda deployments is including boto3 and botocore in requirements.txt. Both are already baked into the Lambda runtime — you're adding roughly 30MB to your zip file for zero benefit. That matters because Lambda has a 250MB unzipped deployment limit, and psycopg2 plus your other dependencies will eat into that headroom fast. Keep boto3 out of the file entirely. If your linter complains that it's not declared, add a comment explaining why — future you will appreciate it.
- Do include: Flask, Mangum, your DB driver, any HTTP clients your app actually calls
- Do not include: boto3, botocore, awscli — these are runtime freebies
- Do not include: pytest, black, mypy, or any dev tooling — those are for your local environment, not the function
Step 2 — Package Dependencies Correctly (Where Most Tutorials Fail)
Most Lambda deployment tutorials are written on Macs. The author zips up their site-packages, pushes it, and everything works — because they're using pure-Python libraries. Then you add psycopg2 or Pillow and suddenly Lambda throws ImportError: /lib64/libc.so.6: version GLIBC_2.28 not found at you. This isn't a config problem. It's a binary compatibility problem. Your Mac compiled native extensions against macOS system libraries, and Lambda runs Amazon Linux 2. Those binaries are not interchangeable.
The correct fix is sam build --use-container. This spins up a Docker container that mirrors the exact Lambda execution environment — same OS, same glibc version, same architecture — and runs pip install inside it. Your compiled extensions get built against the right libraries the first time.
# Make sure Docker is running first, then:
sam build --use-container
The first run is slow because SAM pulls the Amazon Linux 2 base image. Subsequent builds are faster since Docker caches the layer. I switched to this after wasting two hours debugging a cryptography package failure in staging — the kind of failure that doesn't reproduce locally no matter what you try. The Docker approach costs you maybe 90 seconds per build. It's worth every second.
If you genuinely can't use Docker — some CI environments block it, some developers refuse to install it — there's a workaround using pip's --platform flag:
pip install -r requirements.txt \
-t ./package \
--platform manylinux2014_x86_64 \
--only-binary=:all:
This tells pip to download pre-built wheels compiled for manylinux2014, which Lambda supports. It works reliably for packages that publish manylinux wheels on PyPI — numpy, pandas, Pillow all do. Where it breaks down: any package that only ships source distributions, or obscure C extensions that don't publish manylinux wheels. You'll get a No matching distribution found error and have to fall back to the container approach anyway. Use this as a fallback, not a first choice.
Before you even attempt a deployment, check your package size. Lambda's hard limits are 50MB zipped and 250MB unzipped for a direct upload — if you're pushing via S3, the zipped limit goes up to 250MB, but unzipped still caps at 250MB. After sam build finishes, run this before touching sam deploy:
du -sh .aws-sam/build/YourFunctionName/
If you're already over 100MB unzipped, you have a problem before you've even started. The usual offenders are boto3 (don't bundle it — Lambda includes it in the runtime), pandas (swap for polars if you can, it's smaller and faster), or accidentally including dev dependencies. Strip dev packages from your requirements.txt, and if you're still pushing 200MB+, look into Lambda Layers to offload heavy dependencies like numpy or the AWS SDK into a separately managed layer that gets reused across functions.
Step 3 — Write the SAM Template (template.yaml)
The template.yaml is the entire brain of your deployment — get it wrong and you'll spend an afternoon debugging why your function deploys but returns 502s. Here's a full working template you can drop into your project root right now:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Flask app on Lambda
Globals:
Function:
Timeout: 29
Runtime: python3.11
MemorySize: 512
Environment:
Variables:
DATABASE_URL: '{{resolve:ssm:/myapp/prod/database_url}}'
SECRET_KEY: '{{resolve:ssm:/myapp/prod/secret_key}}'
Resources:
FlaskFunction:
Type: AWS::Serverless::Function
Properties:
Handler: app.handler
CodeUri: .
Description: Flask WSGI application
Events:
Api:
Type: HttpApi
Properties:
Path: /{proxy+}
Method: ANY
RootPath:
Type: HttpApi
Properties:
Path: /
Method: ANY
Outputs:
ApiUrl:
Description: API Gateway endpoint URL
Value: !Sub "https://${ServerlessHttpApi}.execute-api.${AWS::Region}.amazonaws.com/"
Two things in that template will bite you if you miss them. First, you need both event entries — /{proxy+} catches everything like /api/users/123, but the bare / root path won't match that pattern. I lost 45 minutes to this once because my health check endpoint was at / and kept returning 403. Second, Handler: app.handler means Lambda is looking for a variable called handler inside your app.py file — not a function, a variable. That's the Mangum adapter instance you created in the previous step. If you named your file main.py or your adapter app_handler, update this accordingly.
On memory: don't let anyone talk you into 128MB to save pennies. Lambda's CPU allocation is proportional to memory — 128MB gets you roughly 1/16th of a vCPU. The practical result is that cold starts that take ~800ms at 512MB can stretch past 3 seconds at 128MB, because Python itself is slow to initialize at low CPU allocation. The cost difference between 128MB and 512MB is almost nothing at realistic traffic volumes. Lambda pricing is per GB-second, so 512MB running for 100ms costs the same as 128MB running for 400ms. Pick 512MB, move on.
The {{resolve:ssm:...}} syntax is CloudFormation's native way to pull values from Parameter Store at deploy time. Before this works, you need to actually put your secrets there. Do it once with the AWS CLI:
# Store as SecureString so it's encrypted at rest
aws ssm put-parameter \
--name "/myapp/prod/database_url" \
--value "postgresql://user:pass@host:5432/dbname" \
--type SecureString
aws ssm put-parameter \
--name "/myapp/prod/secret_key" \
--value "your-actual-secret-key-here" \
--type SecureString
The gotcha here is IAM permissions. Your Lambda execution role needs ssm:GetParameters on those specific paths, and if you used SecureString, it also needs kms:Decrypt on the KMS key used to encrypt them. SAM creates an execution role automatically, but it won't have these permissions unless you add them. Add this block inside your FlaskFunction properties:
Policies:
- SSMParameterReadPolicy:
ParameterName: "myapp/prod/*"
SAM has a managed policy connector for this exact use case — SSMParameterReadPolicy is a SAM policy template that expands into the correct IAM policy with least-privilege access. Using myapp/prod/* as the parameter name pattern means you can add new secrets later without touching the IAM config. One last thing: the Timeout: 29 isn't arbitrary caution — API Gateway HTTP APIs have a hard maximum integration timeout of 29 seconds and will return a 504 before Lambda finishes if you set Lambda higher. Setting them equal means Lambda's timeout error message gets returned instead of a silent gateway timeout, which makes debugging significantly less painful.
Step 4 — Deploy and Test
Before you push anything to AWS, run sam local start-api. This spins up a local API Gateway emulator backed by a Docker container running your actual Lambda runtime. It's not a perfect simulation — cold starts behave differently, IAM permissions aren't enforced, and environment variables from Parameter Store won't resolve — but it catches 80% of packaging mistakes before they waste a deploy cycle. Here's what your terminal should look like when it's working:
$ sam local start-api
Mounting FlaskFunction at http://127.0.0.1:3000/ [GET, POST, DELETE, PATCH, PUT]
You can now browse to the above endpoints to invoke your function.
Press CTRL+C to quit.
START RequestId: a4f3c821-... Version: $LATEST
END RequestId: a4f3c821-...
REPORT RequestId: a4f3c821-... Duration: 312.45 ms Billed Duration: 313 ms
Hit http://127.0.0.1:3000/your-route with curl or Postman. If you get a 502, it's almost always a missing dependency in your layer or a broken import — not a routing problem. The thing that caught me off guard the first time was that sam local uses the built artifact from .aws-sam/build/, not your live source files. So if you change code, you need to rebuild before testing locally again. Run sam build --use-container first, every time.
Building with --use-container
sam build --use-container pulls a Docker image that mirrors the actual Lambda execution environment (Amazon Linux 2) and installs your dependencies inside it. This matters because packages like cryptography or psycopg2 have compiled C extensions. If you build on macOS and deploy to Lambda, those binaries will silently fail. The container build adds 30–60 seconds to your build time, but it eliminates an entire class of runtime errors that are genuinely painful to debug. I don't skip it.
First Deploy with sam deploy --guided
Run sam deploy --guided exactly once. SAM walks you through a series of prompts — here's what each one actually means:
- Stack Name — The CloudFormation stack name. Use something like
flask-lambda-prod. This is what you'll reference everywhere, including in log commands. - AWS Region — Where your Lambda and API Gateway live. Pick the region closest to your users.
us-east-1gets new features first but is also the noisiest region for outages. - Confirm changes before deploy — Say yes. You want to see the CloudFormation changeset before it deletes something you didn't mean to touch.
- Allow SAM CLI IAM role creation — Yes. SAM needs to create the execution role for your Lambda. If you say no, you have to supply a pre-created role ARN.
- FlaskFunction may not have authorization defined, Is this okay? — This is SAM warning you your API endpoint has no auth. Fine for now, but remember to revisit this before you share the URL publicly.
- Save arguments to samconfig.toml — Say yes. This is the whole point.
After that first deploy, SAM writes a samconfig.toml that looks like this:
version = 0.1
[default]
[default.deploy]
[default.deploy.parameters]
stack_name = "flask-lambda-prod"
s3_bucket = "aws-sam-cli-managed-default-samclisourcebucket-xxxxxxxxxxxx"
s3_prefix = "flask-lambda-prod"
region = "us-east-1"
confirm_changeset = true
capabilities = "CAPABILITY_IAM"
From this point on, every future deploy is just sam deploy — no flags, no prompts. That S3 bucket SAM created is where it stages your packaged Lambda code before CloudFormation picks it up. Don't delete it manually or your deploys will break with a confusing error about a missing bucket.
Tailing Logs While You Debug
The moment your first real deploy goes out, open a second terminal and run:
$ sam logs -n FlaskFunction --stack-name flask-lambda-prod --tail
2024/01/15/[$LATEST]a3f8... START RequestId: b2c4d... Version: $LATEST
2024/01/15/[$LATEST]a3f8... [ERROR] Runtime.ImportModuleError: Unable to import module 'app': No module named 'flask'
2024/01/15/[$LATEST]a3f8... END RequestId: b2c4d...
The --tail flag keeps the connection open and streams new log events as they arrive — no refreshing CloudWatch console, no guessing if your fix worked. That specific error above (No module named 'flask') means your dependencies didn't package correctly; double-check your requirements.txt is in the same directory as your app.py and rebuild. Without --tail, you'd be running the command repeatedly and hoping you caught the right log window. Use the flag. Also note that CloudWatch log delivery has a few seconds of latency, so if you hit your endpoint and see nothing, wait 5 seconds before assuming the logs are missing.
Gotchas I Hit That Aren't in the AWS Docs
Static Files Will Silently Break and You Won't Know Why
Lambda's filesystem is read-only everywhere except /tmp. Flask's url_for('static', filename='style.css') assumes it can serve files from disk — and technically it can read them during the cold start, but the path resolution breaks in ways that aren't obvious, and serving static files through Lambda is wasteful anyway since you're paying per-millisecond for compute. The right move is to stop using Flask to serve static assets entirely. Move everything to S3 and front it with CloudFront. Then hardcode the CDN base URL in an environment variable and reference it directly in your templates:
# In your config
STATIC_BASE_URL = os.environ.get("STATIC_BASE_URL", "https://d1abc123.cloudfront.net")
# In your Jinja template instead of url_for('static', ...)
<link rel="stylesheet" href="{{ config.STATIC_BASE_URL }}/css/style.css">
Yes, this means you lose the fingerprinting that url_for gives you. Deal with it by setting a cache-busting query param tied to your deploy version, or configure CloudFront to respect S3 ETags. This is not optional — it's just how Lambda works.
DEBUG=True Doesn't Start a Dev Server, But It Does Leak Your Stack Traces
app.run(debug=True) does absolutely nothing in Lambda — Mangum takes over the WSGI serving, so that call never executes. The dangerous part is thinking you're safe because of that. If DEBUG = True is sitting in your Flask config (or FLASK_DEBUG=1 is in your environment), Flask will still return full stack traces in API error responses. I tested this myself: throw a ZeroDivisionError anywhere in a route, and the JSON response will include your full traceback, local variable values, and file paths. That's your Lambda function's directory structure and potentially sensitive variable names going straight to whoever hit that endpoint. Set this explicitly:
app = Flask(__name__)
app.config["DEBUG"] = False
app.config["PROPAGATE_EXCEPTIONS"] = False
And double-check your Lambda environment variables in the console. I've seen staging configs slip into production with FLASK_DEBUG=1 still set because someone copy-pasted the env block.
Binary Responses Need Two Config Changes, Not One
This one cost me two hours. If you're returning images, PDFs, or handling file downloads, you need to configure binary_media_types in both Mangum and API Gateway — they don't sync automatically. Mangum's config alone isn't enough:
# This is necessary but not sufficient
handler = Mangum(app, lifespan="off", binary_media_types=["image/*", "application/pdf"])
You also need to go into API Gateway → your API → Binary Media Types and add the same entries there. If you're using the AWS console, it's under Settings. If you're deploying with SAM or CDK, add it to your API resource definition:
# SAM template snippet
Properties:
BinaryMediaTypes:
- "image/*"
- "application/pdf"
- "multipart/form-data"
Miss the API Gateway side and you'll get base64 garbage in your response body, or the browser will try to render binary data as text. The symptom looks like a broken image tag with no network error, which makes it extra confusing to debug.
SQLAlchemy's Connection Pool Will Destroy Your RDS Under Load
SQLAlchemy's default pool creates multiple connections per process and keeps them alive. The problem with Lambda is that AWS reuses execution environments — so your "process" from 10 minutes ago, which already has 5 open RDS connections sitting idle in the pool, gets thawed and immediately opens 5 more. Under any real traffic, you'll blow past RDS's connection limit and start getting too many connections errors that look completely random because they depend on how many warm Lambda instances exist at that moment. The fix is to treat each Lambda invocation as disposable and use the most conservative pool settings:
engine = create_engine(
DATABASE_URL,
pool_size=1,
max_overflow=0,
pool_pre_ping=True,
pool_recycle=300
)
pool_pre_ping=True is important too — it validates the connection before using it, which catches the case where Lambda reuses a container whose database connection timed out server-side. If you're hitting RDS at scale, also look at RDS Proxy, which is specifically designed to handle Lambda's bursty connection behavior. It adds ~2ms latency but saves you from hitting connection limits.
The /tmp Directory Persists and You're Responsible for Cleaning It Up
Lambda gives you 512MB in /tmp — configurable up to 10GB now, but you pay for it. If your endpoint processes file uploads, you need to write to /tmp and then explicitly delete the file when you're done. Lambda containers get reused across invocations, meaning files you wrote to /tmp in the last request are still there when the next one hits. This causes two problems: your storage fills up silently until you hit the limit and start getting OSError: [Errno 28] No space left on device, and you might accidentally serve one user's uploaded file to another if your naming isn't unique per-request.
import tempfile
import os
@app.route("/process", methods=["POST"])
def process_upload():
file = request.files["upload"]
# Use a unique temp path, not a hardcoded filename
with tempfile.NamedTemporaryFile(dir="/tmp", delete=False, suffix=".pdf") as tmp:
tmp_path = tmp.name
file.save(tmp_path)
try:
result = do_processing(tmp_path)
return jsonify(result)
finally:
os.unlink(tmp_path) # Always clean up, even on exception
The finally block is non-negotiable. An unhandled exception mid-processing will leave the file behind otherwise, and you'll hit storage exhaustion on a Lambda container that handles a lot of traffic. If you're processing large files, also check the actual compressed size of what you're working with — 512MB sounds like a lot until you're unzipping archives or converting video.
Handling Cold Starts in Practice
Flask adds somewhere between 200-400ms to your first invocation, and that's before your own code does anything. The cold start sequence goes: AWS spins up a new container (runtime overhead you can't control), Python initializes the interpreter, then your module-level imports run. That last part is where I've seen the most variation — a bare Flask app is annoying but survivable, but the second you add SQLAlchemy, Pandas, or anything with a C extension, you're looking at cold starts that can push past a full second. The thing that caught me off guard the first time was realizing that every import at the top of your file runs every single time a new container boots, not just once for the lifetime of the Lambda account.
The fix that actually moves the needle is lazy imports. If only two of your fifteen endpoints touch Pandas, don't import it at the module level. Move it inside the function:
@app.route("/reports/generate", methods=["POST"])
def generate_report():
import pandas as pd # only runs when this endpoint is hit
import numpy as np
df = pd.DataFrame(request.json["data"])
# ... rest of handler
Yes, subsequent calls to that endpoint will also run the import — but Python caches modules in sys.modules, so after the first hit within a warm container it's essentially a dictionary lookup. You pay the import cost once per container lifetime, not once per request. I've cut cold starts from ~1.4 seconds to ~350ms doing nothing else but moving heavy imports inside functions on a data pipeline API.
Provisioned concurrency is AWS's official answer to cold starts: you tell Lambda to keep N containers initialized and ready. You configure it like this:
aws lambda put-provisioned-concurrency-config \
--function-name my-flask-app \
--qualifier prod \
--provisioned-concurrent-executions 3
The honest trade-off: provisioned concurrency is priced separately from regular Lambda invocations. At the time of writing, you're paying roughly $0.015 per GB-hour for provisioned concurrency on top of the execution cost. For a 512MB function, keeping 3 containers warm 24/7 runs about $16/month — which sounds cheap until you realize that's $16/month even at 3am when nobody is using your app. Use it only for latency-sensitive endpoints where a 1-second delay causes real business pain, like checkout flows or auth endpoints. Don't blanket-apply it to your whole API because it felt like the right thing to do.
The cheap alternative that most people skip over: a CloudWatch Events rule that pings your Lambda every 5 minutes costs almost nothing and keeps one container warm. Set it up in your serverless.yml or SAM template:
# serverless.yml
functions:
app:
handler: wsgi_handler.handler
events:
- http: ANY /
- http: 'ANY /{proxy+}'
- schedule:
rate: rate(5 minutes)
enabled: true
input:
source: "warm-up-ping"
Then in your Flask app, short-circuit the warmup request before it touches any real logic:
@app.before_request
def handle_warmup():
if request.headers.get("X-Source") == "warm-up-ping" or \
request.json and request.json.get("source") == "warm-up-ping":
return "ok", 200
This keeps exactly one container warm. The moment you get a traffic spike that needs more than one concurrent execution, those new containers will still cold-start. So: use the warmup ping for low-traffic apps where one warm container handles most requests, use provisioned concurrency for anything running at scale with strict latency SLAs, and fix your imports regardless — because even a warm container benefits from not re-importing a 40MB library on every cold start that slips through.
When Flask on Lambda Makes Sense (and When It Doesn't)
Where This Setup Actually Shines
The pattern I reach for Flask on Lambda is an internal API that gets hammered for 20 minutes after the daily standup and then sits completely idle for hours. You pay for exactly those 20 minutes. A t3.small running 24/7 to handle that same workload costs you real money while sleeping. The same logic applies to webhook receivers — Stripe, GitHub, Twilio all send bursts of events that are completely unpredictable. Wrapping a webhook handler in Flask and deploying it to Lambda means you never think about capacity planning for that endpoint again. Prototypes are another obvious win: you can ship something to a real URL in under an hour without touching EC2, security groups, or load balancers.
Scheduled jobs wrapped in Flask routes are a sneaky good use case. Instead of a separate Lambda function with its own deployment pipeline, you expose a /run-nightly-report route and trigger it from EventBridge on a cron schedule. You get logging, error handling, and local testing almost for free because it's just Flask. The thing that caught me off guard was how clean this pattern is compared to maintaining separate Lambda handler files for every background job.
Where You'll Regret This Choice
If your app needs WebSockets, walk away. API Gateway does have a WebSocket API, but connecting it to a Flask app running through Mangum (the ASGI/WSGI adapter most people use) is a special kind of pain. You end up with connection management state stored in DynamoDB just to simulate what a single long-running process does for free. I've seen teams spend two weeks on this and end up with something more fragile than just running a small EC2 instance with Nginx.
Sub-100ms latency requirements at scale are also a hard no. Cold starts on Lambda with a Flask app — after you've packaged in your dependencies — routinely hit 800ms to 2 seconds. Provisioned concurrency fixes this but costs almost as much as a dedicated instance at that point, which defeats the whole argument. If you're building a customer-facing product where p99 latency matters, this architecture will cause you pain. Heavy persistent state is the other killer: Flask on Lambda should be completely stateless. If you're reaching for anything that lives in memory between requests, you've already broken the model.
Run the Cost Math Before You Commit
Lambda pricing is $0.20 per million requests plus compute time billed in 1ms increments at $0.0000166667 per GB-second. That sounds cheap until you're doing 50 million requests a month with a 512MB function averaging 200ms per invocation. Run those numbers and compare against a t3.small at roughly $15/month or a t3.medium at $30. Lambda gets expensive faster than people expect. The break-even point depends heavily on your average function duration — a fast, lightweight route stays economical much longer than a slow one hitting a database. My rule of thumb: if you're consistently above a few million invocations per month with non-trivial compute time, price out a small container before assuming Lambda is cheaper.
The Alternative Worth Knowing About
If you want the operational simplicity of serverless but Flask on Lambda feels like you're fighting the framework, look at AWS App Runner or Fargate with auto-scaling configured down to zero tasks. You deploy a container image, set minimum capacity to zero, and pay only when traffic is active — same economic model as Lambda, but you're running a real Flask server with persistent connections, proper WebSocket support, and no cold start gymnastics. App Runner in particular is almost as fast to set up as a Lambda deployment and doesn't require you to think about WSGI adapters, package size limits (Lambda caps your deployment package at 250MB unzipped), or layer management. The packaging constraints alone have pushed me toward App Runner on two projects where Flask on Lambda was the original plan.
Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.
Originally published on techdigestor.com. Follow for more developer-focused tooling reviews and productivity guides.
Top comments (0)