At Adaptive Recognition, we run Carmen Cloud for ANPR & MMR recognition. Our neural-network engines have been evolving for 30+ years – for us, "AI" is business as usual.
The real challenge: scaling. On ECS Fargate + EC2, startup + engine init took ~60 seconds. Not acceptable.
The Fargate/ECS Approach
Our first deployment ran on ECS Fargate.
For predictable workloads, this is often good enough. You can define scaling rules or even schedule tasks so that capacity matches traffic (e.g. higher during business hours, lower at night).
But our workload is the opposite of predictable. Recognition requests come from many customers, across multiple regions, at almost random times. Bursts can hit at 2 a.m. from one continent and spike again an hour later from another.
With that traffic pattern, Fargate's trade-offs became painful:
- Scale-up lag: EC2 boot + engine init ~60s.
- Idle cost: keeping containers pre-warmed all the time.
- Ops overhead: building/pushing images, patching, managing ECR.
That's when we started looking for a new approach.
Early Adoption of SnapStart
When AWS Lambda SnapStart (Java 21) was released, we migrated immediately.
That made us among the first to see just how well it scales – and how much money it saves.
Years later, those benefits are still holding true.
The Shift: API Gateway + Lambda SnapStart + CRaC
Key trick: CRaC (Coordinated Restore at Checkpoint) pre-initializes our engines at checkpoint time.
But SnapStart has a serious limitation: it doesn't support >512 MB of ephemeral storage, nor EFS. Our recognition engines for a single region (EUR, NAM) are much larger than that.
So we built a staged engine loading mechanism:
- During handler initialization, we download a batch of engine data from S3.
- Engines are initialized into memory (via our
VehicleHandler
class). - Temporary
.dat
files are deleted to free up space. - The next batch is downloaded and initialized.
This keeps us under the 512 MB limit while still giving us full coverage. Initialization is more complex, but the scalability and cost benefits make it well worth it.
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyRequestEvent;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyResponseEvent;
public class VehicleHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>, org.crac.Resource {
static {
// Load engines in batches at checkpoint time from S3
}
@Override
public void beforeCheckpoint(org.crac.Context<? extends org.crac.Resource> ctx) throws Exception {
// Close network connections – CRaC cannot persist them
}
@Override
public void afterRestore(org.crac.Context<? extends org.crac.Resource> ctx) throws Exception {
// Re-open connections on restore
}
@Override
public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent input, Context context) {
// Process image with the chosen engine
return new APIGatewayProxyResponseEvent();
}
}
Supporting AWS Stack
- API Gateway – entry point
- Lambda (SnapStart) – recognition engines
- Cognito – user auth
- DynamoDB – API keys, billing records
- SNS/EventBridge – async billing + subscription events
- S3, SSM, CloudFront, WAF, Route53 – storage, config, delivery, security
Takeaways
- Even heavy neural engines can scale serverless.
- CRaC makes SnapStart viable by restoring pre-initialized state.
- Closing & reopening network connections is essential.
- Staged loading solves ephemeral storage/EFS limits.
- Being an early adopter of SnapStart saved us both time and cost from day one.
👉 More about Carmen Cloud: carmencloud.com
👉 Corporate homepage: adaptiverecognition.com
👉 My side project for WordPress + Cognito login: Gatey
Top comments (0)