Lee Wynne

Posted on Mar 26 • Edited on Apr 25 • Originally published at leewynne.com

Your DEV Credentials Shouldn't Be Able to Sink PROD

#cloud #aws #security #devops

Most engineering teams think environment isolation means having a "dev" and "prod" flag somewhere in their deployment pipeline.

They're wrong.

That approach doesn't isolate anything, it just moves the risk around.

The AWS SDLC Account Pattern with Full Environment Segregation is what serious cloud architecture actually looks like. It's not just a best practice. It's the difference between teams that accidentally push breaking changes to production at 2am and teams that catch those changes before they ever leave a development branch. It's the difference between a breach in your DEV environment that gets contained, blast radius controlled, damage limited - and a breach in DEV that silently walks into PROD, taking customer data with it and sinking the whole ship.

Here's how it works

And here's why every growing engineering team should be building this way.

The Problem With Shared AWS Accounts

If your DEV, STAGING, and PROD workloads live in the same AWS account, you have a blast radius problem.

A misconfigured IAM policy in your development environment can expose production resources. A runaway Lambda function eating through concurrency limits in staging can throttle your live API. A developer testing aggressive auto-scaling rules in dev can hit account-level service quotas that affect production before anyone notices.

But the operational blast radius is only half the story.

The security blast radius from a compromised shared account is where things get truly dangerous.

If an attacker compromises a single access key, an OIDC token, or a CI/CD pipeline role in a shared account, they could inherit access to every environment in that account, including production data, production databases, and production secrets. There is no hard boundary between environments within the same account. One breach, total exposure.

The insider threat picture is equally concerning. In a shared account, an engineer with dev access and enough determination can reach production. You can write IAM policies that try to prevent it, but IAM within a single account is advisory, it can be overly permissive, it can be misconfigured, and it can be escalated.

Account-level segregation makes the boundary physical. A developer credential scoped to DEV simply cannot access PROD without an explicit cross-account IAM role grant, a deliberate architectural decision, not a policy footnote that someone forgot to update.

Shared accounts feel cheaper and simpler at first. But as your team grows and your infrastructure complexity increases, shared accounts become a liability. You can't enforce strict RBAC by environment. You can't audit cost by environment cleanly. And you certainly can't guarantee that a bad deployment in your lowest environment won't propagate upward.

The solution isn't better tagging or smarter guardrails within a shared account. The solution is account-level segregation, and AWS Organisations makes it operationally feasible to implement properly.

Three Accounts, One Purpose Each

The pattern separates workloads into three completely isolated AWS accounts, each serving a single purpose.

The DEV Account is for feature development and integration testing. Engineers deploy more freely here. Branches get their own environments when needed. Nothing here is permanent, nothing here is customer-facing, all private zero-trust. The URL pattern (dev.domain.com) makes it clear that this is development. The cost here is intentionally low, and it should report as such.

The STAGE Account is a production replica. Same resources, same network architecture, same IAM setup as production, just without real customer traffic. This is where UAT happens, where load tests run, and where you validate that what you're about to ship actually behaves like you think it will. Stage sits at stage.domain.com, nothing here is customer-facing, all private zero-trust and it should scare you slightly to deploy there. That fear is useful.

The PROD Account is the one that matters. Blue/green deployments running at domain.com, with traffic switching done deliberately and with rollback paths tested in stage first. Nothing reaches PROD that hasn't passed STAGE. That's not a policy suggestion it's enforced by the pipeline architecture itself.

Account-level segregation means there's no accidental blast radius between these environments. A misconfigured resource in DEV literally cannot affect PROD, they're in different AWS accounts with separate IAM boundaries, separate VPCs, and separate billing.

One Infrastructure Repo, Three Environments

Infrastructure-as-code is non-negotiable at this scale, and the pattern uses a mono-infra approach with Terragrunt to manage it cleanly.

A single Git repository contains the infrastructure for all three accounts, organised into three folder paths: /prod, /stage, and /dev. Each folder contains the Terragrunt configuration for that account, same modules, different variable files and remote state backends.

Why Terragrunt? Because raw Terraform at this scale leads to copy-paste hell. When you need to update your VPC module, you want to change it once and have it propagate through environments in a controlled way, not hunt down three duplicated configurations and hope you got them all. Terragrunt's include blocks and dependency management let you DRY up your infrastructure code the same way you'd DRY up application code.

The mono-repo structure also enforces a natural promotion pattern: infrastructure changes move from dev to stage to prod via pull requests, with review and approval gates at each stage. No environment drifts silently. Every difference between PROD and DEV is intentional and reviewable.

Two PR Workflows, One Direction

The pattern distinguishes between two distinct flows, and most teams conflate them at their peril.

The infra workflow governs changes to the infrastructure itself, the Terraform modules, the VPC configs, the account baselines. These PRs are raised against the infrastructure mono-repo and require review from the people who understand the blast radius of an IAM policy change or a security group modification. Infrastructure changes flow from DEV account config to STAGE account config to PROD account config.

The app workflow governs application deployments, the code that runs inside the infrastructure. A feature branch gets deployed to DEV, a PR merge to main triggers a STAGE deployment and automated test suite, and a deliberate promotion step moves the release to PROD. These workflows live in the application repos, not the infra repo, but they reference the same account structure and environment naming conventions.

Keeping these workflows separate matters because they operate at different risk levels and different cadences. Application code might deploy to production daily, infrastructure changes might go through a full change management process. Mixing them creates confusion about who reviews what and what the rollback options are.

Shared Services, The Fourth Account You Don't Often See

Here's where many AWS multi-account architectures fall short, they model the application environments well but neglect the shared services problem.

Every environment needs to pull container images. Every environment needs identity and access management for users. If you solve that independently per account, you end up with three separate ECR registries, three separate Cognito user pools, and a synchronisation problem that compounds over time.

The SDLC Account Pattern includes a Shared Services Account that sits alongside the three environment accounts.

ECR and Cognito are good examples:

ECR: All three environments pull from the same registry. A single image build, tested and tagged, gets promoted across environments without rebuilding. The image you ran your integration tests on in STAGE is exactly the image that ships to PROD, no surprises, no environment-specific build drift.

Cognito: Rather than duplicating user pools per environment, Shared Services hosts the identity plane. Environment-specific configurations reference the shared pool, with appropriate user segmentation between DEV, UAT, and PROD user groups.

The Shared Services Account uses Zero Trust connectivity to each environment account. explicit, least-privilege cross-account IAM roles with no implicit trust assumptions. A resource in PROD pulling an image from ECR gets exactly the permissions needed for that operation, scoped to that resource, with no ambient access to anything else in the Shared Services Account.

The Provider to Consumer Model

The governance philosophy underpinning this entire pattern is closer to a shared responsibility model than a simple owner/user split and it's worth being precise about what each side is responsible for.

The Provider is the platform team. They own the AWS Landing Zone, manage core transit networking, and maintain the version-controlled VPC configuration that defines how accounts connect and communicate. They vend new AWS accounts on request, applying a security baseline before a single line of application code is ever written, SCPs, GuardDuty, centralised logging, IAM boundaries, the works. When a new team needs an environment, the Provider provisions the account, attaches the consumer's mono repo, and hands over something that's ready to build in.

The Consumer is the workload build team. They inherit a fully baselined, network-connected AWS account with a repo already wired up. Their job is to build product inside the guardrails the Provider has established, not to manage networking, not to configure security tooling, not to think about account structures. That separation is the whole point.

This clean division is what makes the pattern scale. The platform team can iterate on the landing zone and security baseline without touching application code. Application teams can ship fast without becoming AWS infrastructure experts. And when something goes wrong in a consumer account, the Provider's controls mean the blast radius stops at the account boundary.

Why This Matters Now

Multi-account AWS architecture used to be something only large enterprises could afford to implement. AWS Organisations made it accessible to teams of any size. Terragrunt has made it maintainable. And several high-profile production incidents, caused by developers with too much access in too few accounts, have made the case for account-level segregation non negotiable.

The AWS SDLC Account Pattern isn't about adding complexity. It's about moving complexity from the runtime, where a mistake becomes an incident, to the design phase, where a mistake becomes a PR comment.

Build environments that are safe to experiment in. Build a staging environment that earns its name. Build production isolation that actually works.

The principles this pattern encodes are the right ones. Your implementation will vary, but if you're not operating with account-level segregation today, start planning the migration.

The 2am call, the compromised key, the rogue credential, they all get a lot quieter with a bulkhead between them and production.