If you've spent any time managing infrastructure at scale, you've probably written a cron job that polls for something. Maybe it checks for untagged resources every hour, or scans for missing CloudWatch alarms on a schedule. It works. It's simple. And it's almost always the wrong long-term answer.
I recently rebuilt one of these systems — a compliance remediation tool that ensures every EC2 instance in our multi-account AWS organisation has CloudWatch CPU alarms — and the shift from scheduled polling to event-driven architecture made a surprising difference.
The cron approach
The original setup ran a Lambda on a CloudWatch Events schedule every 30 minutes. It would:
- Assume a role into each member account
- List all EC2 instances
- Check for the existence of CloudWatch alarms
- Create any that were missing
This worked, but had problems:
- Latency: A new instance could run for up to 30 minutes without monitoring
- Cost: Every run scanned every instance, even if nothing had changed
- Complexity: The Lambda needed to handle pagination across dozens of accounts, manage rate limiting, and deal with partial failures gracefully
- Noise: CloudWatch Logs filled up with successful "nothing to do" runs
The event-driven approach
The replacement uses EventBridge rules deployed to each member account via StackSets. When an EC2 instance launches or has its tags modified, the event is forwarded to a central event bus where a Lambda evaluates and applies alarms.
The reconciliation Lambda still exists — it runs daily as a safety net — but it catches edge cases rather than doing the heavy lifting.
What changed
- Remediation time: From up to 30 minutes to under 60 seconds
- Lambda invocations: Dropped significantly — we only run when something actually happens
- Code complexity: The event-driven Lambda handles one instance at a time, not a full cross-account sweep
- Terraform: The module became simpler because each component has a single, clear responsibility
When cron still wins
Event-driven isn't always the answer. Use scheduled runs when:
- There's no reliable event source for the change you care about
- You need a full reconciliation sweep (drift detection, for example)
- The event volume would be higher than the polling cost
But for "react when something changes" — which is what most compliance automation is doing — EventBridge is the better tool.
Getting started
If you're currently running a polling Lambda and want to shift:
- Identify the AWS API action that triggers the change you care about
- Create an EventBridge rule matching that event pattern
- Keep your existing Lambda as a daily reconciliation fallback
- Deploy the rule to member accounts via StackSets
The two patterns complement each other. Events handle the real-time path, scheduled runs handle the "trust but verify" path.
Top comments (0)