As a CTO, your job during a Proof of Concept (POC) is deceptively simple: don’t over-engineer, and don’t overspend.
You don’t need the perfect ML infrastructure—you need the cheapest architecture that works well enough.
Here’s the pipeline we built for our ML POC:
Audio file → S3 → Compute → Prediction → DynamoDB
The real question isn’t how to run inference—it’s:
At what point does Lambda stop being the smartest choice?
In this blog post, we are comparing the 3 Serverless ways of processing the data with an already trained Machine Learning model:
- AWS Lambda with standard configuration
- AWS Lambda with Snapstart enabled
- AWS Lambda used as a proxy to use AWS SageMaker
To access the full project code, you can click the link here.
Experiment setup
The architecture across the board is very similar — all compute resources (Lambdas and SageMaker instance) have the same 4GB RAM configuration.
There is a subtle difference in assigning the vCPUs, as for each 1.769GB of RAM in AWS Lambda, you get the equivalent of one vCPU, meaning that our Lambdas would have 2.31 vCPU assigned (based on AWS docs here), and our SageMaker instance (ml.t2.medium instance) would have 2 vCPU assigned.
In addition, SageMaker stack has a proxy Lambda, with 128MB of RAM assigned, which gets the information from the uploaded file in S3, forwards the information to SageMaker and saves the results into DynamoDB.
All stacks do not use any GPU instances, making the playing field as level as possible.
Here is some other experiment choices to make the benchmark fair:
- Same CPU architecture everywhere (x86_64): Lambda functions use the x86_64 architecture, dependencies are bundled with the SAM x86_64 Python 3.12 image, and the SageMaker container image is built for linux/amd64 so ONNX and wheels behave the same across paths.
- Same language runtime: All Lambda handlers run Python 3.12 with the same packaged
lambda_srclayout (only the handler and SnapStart wiring differ). - Same model and container vs zip trade-off is intentional: One shared ONNX artifact from S3; standard and SnapStart load it inside the function, SageMaker serves it from a dedicated container behind an endpoint.
To keep the benchmark fair, SageMaker serverless was intentionally excluded. The reason for this is to keep the costs of running the ML model as low as possible and to keep the performance fair across the board.
The architecture diagram for this benchmark can be seen in the following picture:
Here is an overview of all stacks in this experiment:
| Option | Setup | Cost Model | Pros | Cons |
|---|---|---|---|---|
| Lambda (4GB) | Model runs directly inside Lambda, ~2.31 vCPU, 4GB of RAM | Pay-per-request | Scales to zero, no idle cost, fast per request | High memory cost, not ideal at very high traffic |
| Lambda with SnapStart Enabled | Model runs directly inside Lambda, ~2.31 vCPU, 4GB of RAM | Pay-per-request | Predictable performance, cost-efficient at scale, SnapStart helping in cold starts | High memory cost, not ideal at very high traffic, additional SnapStart cost if traffic is sporadic |
| SageMaker Endpoint | Model hosted on ml.t2.medium (2 vCPU with 4GB of RAM), invoked via 128MB Lambda | Fixed monthly | Predictable performance, cost-efficient at scale | Always-on, pays even when idle, slightly higher latency |
Configuration in CDK code
Here are the code snippets of all compute resources used in this experiment. All Lambdas are created with the following method, to reduce code duplication.
def create_python_function(
*,
scope: Construct,
function_name: str,
handler: str,
environment: dict[str, str] | None = None,
timeout: Duration = Duration.seconds(60),
memory_size: int = 4096,
architecture: _lambda.Architecture = _lambda.Architecture.X86_64,
snapstart: _lambda.SnapStartConf | None = None,
) -> _lambda.Function:
runtime_environment: dict[str, str] = {**_DEFAULT_RUNTIME_ENV}
if environment:
runtime_environment.update(environment)
return _lambda.Function(
scope,
function_name,
function_name=function_name,
runtime=_lambda.Runtime.PYTHON_3_12,
architecture=architecture,
handler=handler,
code=bundled_lambda_code(),
timeout=timeout,
memory_size=memory_size,
environment=runtime_environment,
snap_start=snapstart,
)
Standard Lambda:
# Initialize a Lambda with the standard configuration
standard_lambda = create_python_function(
scope=self,
function_name="standard-predictor",
handler="standard_handler.handler",
timeout=Duration.seconds(90),
environment={
"PREDICTIONS_TABLE": self.predictions_table.table_name,
"PREDICTOR": "standard",
"MODEL_S3_URI": f"s3://{self.model_asset.s3_bucket_name}/{self.model_asset.s3_object_key}",
},
)
SnapStart Lambda:
# Initialize the SnapStart Lamba
snapstart_lambda = create_python_function(
scope=self,
function_name="snapstart-predictor",
handler="snapstart_handler.handler",
timeout=Duration.seconds(90),
snapstart=_lambda.SnapStartConf.ON_PUBLISHED_VERSIONS, # Very important to configure this parameter!
environment={
"PREDICTIONS_TABLE": predictions_table.table_name,
"PREDICTOR": "snapstart",
"MODEL_S3_URI": f"s3://{model_asset.s3_bucket_name}/{model_asset.s3_object_key}",
},
)
# Initializing an Alias, as SnapStart doesn't work without it
live_alias = _lambda.Alias(
self,
"SnapStartLiveAlias",
alias_name="live",
version=snapstart_lambda.current_version,
)
SageMaker endpoint + Lambda proxy:
# Configure the SageMaker endpoint
endpoint_config = SageMaker.CfnEndpointConfig(
self,
"AudioPredictorEndpointConfig",
production_variants=[
SageMaker.CfnEndpointConfig.ProductionVariantProperty(
variant_name="AllTraffic",
model_name=model.attr_model_name,
initial_instance_count=1,
instance_type="ml.t2.medium",
initial_variant_weight=1.0,
)
],
)
# Define the endpoint
endpoint = SageMaker.CfnEndpoint(
self,
"AudioPredictorEndpoint",
endpoint_config_name=endpoint_config.attr_endpoint_config_name,
)
# Initialize the SageMaker Lambda proxy
SageMaker_trigger = create_python_function(
scope=self,
function_name="SageMaker-predictor",
handler="SageMaker_trigger_handler.handler",
timeout=Duration.seconds(90),
environment={
"PREDICTIONS_TABLE": predictions_table.table_name,
"PREDICTOR": "SageMaker",
"MODEL_S3_URI": f"s3://{model_asset.s3_bucket_name}/{model_asset.s3_object_key}",
"ENDPOINT_NAME": endpoint.attr_endpoint_name,
},
)
Results
In the following image, you can see the execution duration of all the stacks which were used (note - SnapStart Lambda was ran once before to save the environment and then waited for 10 minutes for the Lambda to have a cold start again):
| Method | Mean Latency | Median | Stability (Std) |
|---|---|---|---|
| Standard Lambda | 280.72 ms | 127.65 ms | 443.29 ms |
| SnapStart | 178.60 ms | 124.69 ms | 166.35 ms |
| SageMaker | 339.18 ms | 226.91 ms | 350.76 ms |
From the graph, we can see that the Lambdas execute faster than the SageMaker endpoint, staying under the 200ms mark. The circles represent the cold starts, and you can see that the SnapStart Lambda was at least 2x faster than other resources, thanks to SnapStart. SageMaker stack performed the worst, but not by a lot, having the most Lambda invocations just above the 200ms mark and the cold start taking almost 1.4 seconds.
Cost Breakdown
Lambda Cost (per request)
Formula: Cost = Duration × Memory × $0.0000166667
- Duration: ~200 ms
- Memory: 4 GB
Cost per request: ~$0.0000133
Cost per 1M requests: ~$13.80
SageMaker Cost (fixed)
- ml.t4g.medium ≈ $24–30/month
- Runs 24/7, even when idle
Takeaway:
Lambda has variable costs that scale with usage. SageMaker has fixed costs, making the tradeoff clear when requests grow.
The main question is:
When does SageMaker become the better option?
I’ve done the math.
SageMaker becomes a better option at ~72 requests per minute — take a look at the following graph:
It is obvious that, with the serverless nature of Lambda, costs are going to be lower since you have a fixed price for running the SageMaker endpoint, but as you have more traffic, SageMaker will handle it cheaper.
You can notice that the green line, representing the SageMaker endpoint, starts going up as well, — that is expected, as you will have many Lambda invocations as well, however it’s manageable as the already mentioned Lambda proxy is configured to use the lowest configuration.
Here is a broader look at the cost of this benchmark, it shows a broader view of expected cost, based on the latest pricing and traffic you can expect.
| Traffic Volume | Standard Lambda (4GB) | SnapStart Lambda (4GB) | SageMaker (ml.t2.medium + 128MB Caller) |
|---|---|---|---|
| Price per 1M Req (Variable) | $13.80 | $16.82 | $0.81 |
| Fixed Monthly Cost | $0.00 | $0.00 | $40.88 |
| Total: 10 RPM (~438k req/mo) | $6.05 | $7.37 | $41.24 |
| Total: 50 RPM (~2.1M req/mo) | $30.24 | $36.87 | $42.66 |
| Total: 72 RPM (~3.1M req/mo) | $43.51 | $53.05 | $43.43 (Crossover point) |
| Total: 200 RPM (~8.7M req/mo) | $120.84 | $147.32 | $47.97 |
| Total: 1000 RPM (~43.8M req/mo) | $604.22 | $736.62 | $76.38 |
References:
CTO Verdict: A Decision Framework
Think in thresholds, not services.
Use Standard Lambda when:
- You’re in POC or early stage
- Traffic is low or unpredictable
- You want zero idle cost
Use Lambda with SnapStart when:
- Traffic is low and sporadic
- You are willing to pay for the SnapStart snapshot restoration
- You also want a zero idle cost
Use SageMaker when:
- You exceed the mentioned 72 requests/minute consistently
- Traffic is steady
- You want predictable cost
Final Rule:
- Lambda is the default
- SageMaker is the optimization



Top comments (0)