We’ve all been there. Everything works flawlessly in your SQA environment, but the moment your code hits UAT, it behaves like sluggish.
Recently, we ran into a bizarre ghost in our AWS infrastructure: a Lambda function, triggered on a regular 10-minute interval by Amazon EventBridge, was consistently taking 37 seconds to log its very first line of code.
Here is how we hunted down the lag, stripped away 29 seconds of it, and realized the rest wasn't a bug at all—it was just AWS being AWS.
The Problem: The 37-Second Wall
In SQA, the Lambda invoked within 2 to 3 seconds. In UAT, it took 37 seconds.
Initially, we assumed it was a standard container cold start issue. But when we changed the EventBridge trigger to run every single minute, the 37-second delay still happened. Even weirder: looking at CloudWatch, our very first console.log inside the handler didn't show up until that 37-second mark.
The environment wasn't just slow; it was completely stalled before our code even kicked off.
How We Debugged It
Step 1: Bringing out the big guns (Provisioned Concurrency)
To rule out the AWS infrastructure taking its sweet time assigning network interfaces (ENIs) inside our private UAT VPC, we turned on Provisioned Concurrency. This forces AWS to keep the Lambda containers pre-warmed and ready to rock.
The result: The delay plummeted from 37 seconds down to exactly 8 seconds.
A massive win, but it left us scratching our heads. If the container was already warm and waiting, why was it still taking 8 seconds for the START RequestId log line to appear?
Step 2: Peeking inside the payload
We compared this background cron job to our web microservices sitting behind an Application Load Balancer (ALB). The ALB-triggered Lambdas fired instantly. Why was EventBridge lagging?
To find out, we printed out the raw event object passed into the Lambda handler by EventBridge and looked at the metadata:
{
"source": "aws.events",
"time": "2026-06-07T18:00:08Z",
"resources": ["arn:aws:events:...:rule/ten-min-cron"]
}
The Epiphany: EventBridge Jitter
When we looked at the "time" field generated by EventBridge, the lightbulb finally went on. The timestamp read :08 seconds past the minute.
EventBridge wasn't even sending the event to our Lambda until the 8th second. Our Lambda wasn't lagging; it was executing the exact millisecond AWS handed it the job.
As it turns out, AWS explicitly states that EventBridge scheduled rules have a 60-second precision window. To prevent millions of customer crons from firing at exactly 12:00:00.000 and melting downstream databases worldwide (the "thundering herd" problem), AWS intentionally jitters and staggers the execution across those first few seconds.
Our UAT cron rule just happened to get bucketed into an 8-second delay slot.
How We Resolved It (The Math Add Up)
Once we understood the architecture, the pieces of the original 37-second mystery fell right into place:
- The Old Flow (No Provisioned Concurrency): 8 seconds of intentional EventBridge scheduling jitter + 29 seconds of heavy VPC network/cold boot setup = 37 seconds total delay.
- The New Flow (With Provisioned Concurrency): 8 seconds of EventBridge jitter + 0 seconds of infrastructure lag = 8 seconds total delay.
The Final Takeaway
If you are triggering Lambdas via an ALB or API Gateway, AWS treats it as real-time, synchronous traffic and routes it in milliseconds. But if you are using EventBridge crons, expect a few seconds of intentional padding.
Our system wasn't broken, and our code was fine. By keeping Provisioned Concurrency active, we permanently killed the 29-second infrastructure lag, and we accepted the remaining 8 seconds as standard operating procedure for AWS scheduled automation.
Top comments (0)