Scaling Carmen Cloud: From Containers to Lambda SnapStart

#aws #lambda #snapstart #saas

At Adaptive Recognition, we run Carmen Cloud for ANPR & MMR recognition. Our neural-network engines have been evolving for 30+ years – for us, "AI" is business as usual.

The real challenge: scaling. On ECS Fargate + EC2, startup + engine init took ~60 seconds. Not acceptable.

The Fargate/ECS Approach

Our first deployment ran on ECS Fargate.

For predictable workloads, this is often good enough. You can define scaling rules or even schedule tasks so that capacity matches traffic (e.g. higher during business hours, lower at night).

But our workload is the opposite of predictable. Recognition requests come from many customers, across multiple regions, at almost random times. Bursts can hit at 2 a.m. from one continent and spike again an hour later from another.

With that traffic pattern, Fargate's trade-offs became painful:

Scale-up lag: EC2 boot + engine init ~60s.
Idle cost: keeping containers pre-warmed all the time.
Ops overhead: building/pushing images, patching, managing ECR.

That's when we started looking for a new approach.

Early Adoption of SnapStart

When AWS Lambda SnapStart (Java 21) was released, we migrated immediately.

That made us among the first to see just how well it scales – and how much money it saves.

Years later, those benefits are still holding true.

The Shift: API Gateway + Lambda SnapStart + CRaC

Key trick: CRaC (Coordinated Restore at Checkpoint) pre-initializes our engines at checkpoint time.

But SnapStart has a serious limitation: it doesn't support >512 MB of ephemeral storage, nor EFS. Our recognition engines for a single region (EUR, NAM) are much larger than that.

So we built a staged engine loading mechanism:

During handler initialization, we download a batch of engine data from S3.
Engines are initialized into memory (via our VehicleHandler class).
Temporary .dat files are deleted to free up space.
The next batch is downloaded and initialized.

This keeps us under the 512 MB limit while still giving us full coverage. Initialization is more complex, but the scalability and cost benefits make it well worth it.

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyRequestEvent;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyResponseEvent;

public class VehicleHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>, org.crac.Resource {

    static {
        // Load engines in batches at checkpoint time from S3
    }

    @Override
    public void beforeCheckpoint(org.crac.Context<? extends org.crac.Resource> ctx) throws Exception {
        // Close network connections – CRaC cannot persist them
    }

    @Override
    public void afterRestore(org.crac.Context<? extends org.crac.Resource> ctx) throws Exception {
        // Re-open connections on restore
    }

    @Override
    public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent input, Context context) {
        // Process image with the chosen engine
        return new APIGatewayProxyResponseEvent();
    }
}

Supporting AWS Stack

API Gateway – entry point
Lambda (SnapStart) – recognition engines
Cognito – user auth
DynamoDB – API keys, billing records
SNS/EventBridge – async billing + subscription events
S3, SSM, CloudFront, WAF, Route53 – storage, config, delivery, security

Takeaways

Even heavy neural engines can scale serverless.
CRaC makes SnapStart viable by restoring pre-initialized state.
Closing & reopening network connections is essential.
Staged loading solves ephemeral storage/EFS limits.
Being an early adopter of SnapStart saved us both time and cost from day one.