jasonmills94

Posted on Jul 4

A Safer AWS ECR Alert Email Check With Docker in CI/CD

#aws #cicd #devops #docker

AWS ECR scan findings are easy to trust too early. The registry flags a package issue, EventBridge forwards the event, and the pipeline log says the notification step finished. Then the message lands in a stale team alias, gets filtered away, or never reaches the on-call person you thought would see it. That gap is small on paper, but it is where release confidence starts to wobble.

I handle that path like any other production-facing dependency now: one isolated inbox per run, one reproducible Docker job, and one set of assertions that proves the mail was actually delivered. If your current check stops at "the Lambda executed," you are testing plumbing, not delivery.

Why ECR alert email checks fail late

The late failures are usually boring:

the wrong SNS topic is wired in staging
the message body still points at a production console URL
two CI/CD jobs reuse the same inbox and muddy the results
a secret rotation changes sender settings and nobody notices until the next high-severity alert

That is why I like one real mailbox check next to the normal unit and integration coverage. The idea is very similar to parallel email test isolation: if every run gets its own destination, it becomes much easier to see which pipeline actually produced which message.

I also keep seeing internal notes that mention tepm mail com or temp org mail like they are stable testing strategies by themselves. They are not. A fake emails generator or temp mailbox only helps when it is tied to a clear assertion plan, short retention, and cleanup that the team will really follow.

The Docker workflow I actually use

The workflow that keeps working for me is pretty plain, which is probly why it survives handoffs:

A CI/CD job starts a small Docker container with AWS credentials scoped to a staging account.
The container triggers a known ECR finding event or replays a sanitized sample into the same notification route used by staging.
The job creates a short-lived inbox for that run and injects the address into the alert path.
A polling step waits for the message, validates the payload, and expires the inbox.

When I need a quick target for that last step, I use a disposable email inbox for non-production checks. The win is not the mailbox itself. The win is that each run gets a clean destination, so I can tell if the alert belongs to this pipeline execution instead of some earlier leftover test.

That same isolation mindset shows up in oauth recovery inbox isolation. Different workflow, same lesson: shared inboxes hide bad assumptions for way too long.

What I validate before the job can pass

I do not pass the job just because a message arrived. The check needs to prove the alert is usable:

exactly one email was delivered for the triggered ECR event
the subject mentions the expected repository or severity marker
the sender matches the staging enviroment, not production
links inside the body point to the correct AWS account and region
timestamps and identifiers line up with the current pipeline run

I also record the Docker image tag, repository name, EventBridge rule, and CI/CD run ID in one log block. That sounds a bit fussy, but it saves time later when somebody asks why the deploy gate failed after midnight and you need to seperate infra noise from a real delivery issue.

If you rely on a temp mailbox, validate the content, not only inbox existence. I have seen teams recieve the message, mark the test green, and miss that the body still referenced the wrong console host. From an ops view, that is still a failed alert path.

A few traps that create noisy results

Three traps show up over and over:

Reusing the same inbox across parallel jobs.
Letting the Docker job retry without a unique execution marker.
Treating sample payload tests as equal to the full routed notification.

The first one creates false failures. The second one makes it hard to tell whether a duplicate came from the app or the runner. The third one hides IAM, routing, and formatting problems until a real alert fires. None of these bugs are fancy, but they waste a surprising amount of weekend time.

I am also careful not to overbuild this. For every pull request, a mocked notification check is still fine. The end-to-end mailbox validation earns its keep on merge pipelines, release candidates, or scheduled safety checks where you actually need confidence that AWS, Docker, and the delivery path still agree with each other.

A rollout checklist for weekend deploys

Before I trust the pipeline, I want this checklist green:

the staging route can emit one known ECR alert on demand
the Docker job creates one fresh inbox for the run
the email arrives once, not zero times and not twice
repository, severity, region, and links all match the expected account
the inbox expires or is cleaned up right after the assertions finish

It is not a long checklist, but it catches the stuff that slips past config review. Clean Terraform and successful Lambda logs are nice. The message showing up in the right place, with the right content, is what really tells you the system is ready.

Q&A

Should this run on every commit?

Usually no. I would keep it for merges, release branches, or a scheduled validation job. Running it on every commit can get expensive and a little noisy.

Why involve Docker if AWS already emits the alert?

Because the Docker job gives you a repeatable runner with the exact tools and credentials the pipeline expects. That makes debugging much easier when the check breaks.

Is a temp mailbox okay for staging?

Yes, if the data is non-production, the retention is short, and the team treats it like disposable infrastructure instead of a shared dumping ground.

Top comments (1)

DapperX • Jul 4

Great breakdown of a failure point that many teams overlook. I like the emphasis on validating actual email delivery instead of assuming success because Lambda executed or EventBridge fired. Using a fresh inbox for every pipeline run is a simple but effective way to eliminate flaky results from shared mailboxes, especially in parallel CI/CD jobs. Your checklist is also practical because it verifies message content, links, sender, and metadata rather than just inbox existence. That kind of end-to-end validation provides much stronger confidence that security alerts will reach the right people when they matter most. Thanks for sharing this practical workflow.