<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matt</title>
    <description>The latest articles on DEV Community by Matt (@dspv).</description>
    <link>https://dev.to/dspv</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3782713%2Fa6b49eb1-89f2-4142-97df-c0dc96c281e1.png</url>
      <title>DEV Community: Matt</title>
      <link>https://dev.to/dspv</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dspv"/>
    <language>en</language>
    <item>
      <title>Why Can't You See Per-Environment AWS Costs?</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Mon, 08 Jun 2026 09:00:46 +0000</pubDate>
      <link>https://dev.to/dspv/why-cant-you-see-per-environment-aws-costs-3e1l</link>
      <guid>https://dev.to/dspv/why-cant-you-see-per-environment-aws-costs-3e1l</guid>
      <description>&lt;h1&gt;
  
  
  Why Can't You See Per-Environment AWS Costs?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/ecs-fargate-cost-visibility" rel="noopener noreferrer"&gt;https://fortem.dev/blog/ecs-fargate-cost-visibility&lt;/a&gt;&lt;br&gt;
Cost Explorer shows the total. Tags cover Fargate but miss the $90/mo per environment of ALB, NAT, and CloudWatch overhead. Here's why per-env cost is structurally hard on ECS Fargate — and what works.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Cost Explorer actually shows you
&lt;/h2&gt;

&lt;p&gt;AWS Cost Explorer is a billing tool, not an environment tool. It groups your charges by &lt;em&gt;service&lt;/em&gt;, &lt;em&gt;tag&lt;/em&gt;, &lt;em&gt;linked account&lt;/em&gt;, and &lt;em&gt;region&lt;/em&gt;. It does not know what an "environment" is. That concept lives in your head and in your Terraform code — not in the AWS data model.&lt;/p&gt;

&lt;p&gt;When you open Cost Explorer and try to answer the per-environment question, here's what you actually get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Total spend by service&lt;/strong&gt;— "$4,200/mo in Fargate compute, $890 in data transfer, $660 in CloudWatch." Useful, but it's not an environment number.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;By cost allocation tag&lt;/strong&gt;— only if you've activated the tag in the Billing console AND tagged every resource. Even then, you get rows for each unique tag value, not grouped environments.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;By linked account&lt;/strong&gt; — useful for multi-account setups, but most teams run multiple environments in a single account.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "staging environment" you have in your head is a logical bundle: 3-15 Fargate services + an ALB + target groups + a NAT Gateway + CloudWatch log groups + an ECR repo + Secrets Manager entries + an S3 bucket or two. Cost Explorer sees 10-20 unrelated line items. None of them say "staging."&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/blog/aws-fargate-pricing-real-costs/"&gt;real Fargate pricing breakdown&lt;/a&gt; gets you the components. It doesn't get you the per-environment grouping. That's the gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why per-environment cost is hard on ECS Fargate specifically
&lt;/h2&gt;

&lt;p&gt;Per-environment cost is a problem on any AWS service. On ECS Fargate it's particularly bad because an environment is a &lt;em&gt;bundle&lt;/em&gt; of services, not a single resource. And the bundle crosses the boundaries of how AWS models tagging.&lt;/p&gt;

&lt;p&gt;Here's what tag coverage looks like for a typical 8-service staging environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Fargate tasks&lt;/strong&gt; (8 services) — tag follows the ECS service. Works if you tagged the cluster.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;ALB&lt;/strong&gt; (1) — usually tagged at cluster level. Sometimes missed.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Target groups&lt;/strong&gt; (8) — tag propagates from the ALB, not always consistent.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;NAT Gateway&lt;/strong&gt; (1-2) — lives at the VPC level. No env tag possible.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;CloudWatch log groups&lt;/strong&gt; (8-10) — tagged separately, often forgotten.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;ECR repositories&lt;/strong&gt; (3-5) — created by ECS, sometimes tagged, sometimes not.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Secrets Manager entries&lt;/strong&gt; (5-10) — often tagged, sometimes managed at account level.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even with a perfect tagging convention, the NAT Gateway line item is ungroupable. CloudWatch log groups might be. ECR is hit-or-miss. So your "staging" tag covers maybe 60% of the actual cost of running staging.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; An ECS environment is a bundle of services you provision together, not a single resource. AWS pricing is per-resource. The mismatch is the root of the per-environment cost problem.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The $90/mo/env blind spot: shared overhead tags can't see
&lt;/h2&gt;

&lt;p&gt;Here's the number that should be in every platform engineer's head: every ECS environment carries &lt;strong&gt;~$90/mo of fixed overhead&lt;/strong&gt; that is never captured in a tag-based report. The full breakdown is in our &lt;a href="https://dev.to/blog/aws-fargate-pricing-real-costs/"&gt;real Fargate pricing post&lt;/a&gt;; the short version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Application Load Balancer:&lt;/strong&gt; ~$22/mo base + LCU charges. Required for any HTTP service that needs a stable URL.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;NAT Gateway:&lt;/strong&gt; ~$33/mo per AZ (~$66/mo at 2 AZs for HA). Charges for outbound traffic from private subnets.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;CloudWatch logs:&lt;/strong&gt; $0.50/GB ingest + $0.03/GB-mo storage. Most staging envs process 20-50 GB/mo.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Secrets Manager + ECR + SSM:&lt;/strong&gt;~$5-10/mo combined per env. Usually small, but tags don't always propagate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The $90 is real and unavoidable.&lt;/strong&gt;It bills whether your tasks are running or stopped. It bills on Saturday at 3am when nobody's looking. For 20 environments, that's &lt;strong&gt;$1,800/mo&lt;/strong&gt; — $21,600/yr — sitting in your bill, untagged, unattributable to any single environment.&lt;/p&gt;

&lt;p&gt;This is the number that the CFO question is really asking about. When she says "what does staging cost," she's not asking about Fargate tasks. She's asking about the full bundle. The tag-based approach gives her an answer that's missing 30-40% of the real number.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why cost allocation tags fail in practice
&lt;/h2&gt;

&lt;p&gt;Cost allocation tags are AWS's official answer to the per-resource attribution problem. They work in theory. In practice, they fail in five predictable ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tags must be activated separately
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Environment&lt;/code&gt; tag on your ECS services does not show up in Cost Explorer until you activate it in the Billing console. This is a separate, one-time-per-tag step. Most teams forget. Even &lt;a href="https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html" rel="noopener noreferrer"&gt;AWS's own documentation&lt;/a&gt; notes it takes up to 24 hours for tags to appear after activation.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tags only work from the moment of activation
&lt;/h3&gt;

&lt;p&gt;No backfill. If you activated your &lt;code&gt;Environment&lt;/code&gt; tag on June 1, your May cost data has no environment breakdown. AWS does not compute historical attribution from newly-activated tags.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Inconsistencies split data silently
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Env=Prod&lt;/code&gt; and &lt;code&gt;env=prod&lt;/code&gt;are two different tag values in AWS's view. &lt;code&gt;Environment=staging&lt;/code&gt; and &lt;code&gt;Environment=staging-1&lt;/code&gt; are two different tag values. A team that evolves its naming convention ends up with three rows of "staging" in Cost Explorer, none of which are the full picture.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Shared services can't be tagged per-env
&lt;/h3&gt;

&lt;p&gt;A NAT Gateway is a property of a VPC, not of an environment. A CloudWatch Logs group is per-service, not per-env. ECR repos are often shared across envs. These costs are &lt;em&gt;real&lt;/em&gt; and &lt;em&gt;per-env-influenced&lt;/em&gt;, but no tag will ever attribute them to a specific environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Multi-account setups compound the problem
&lt;/h3&gt;

&lt;p&gt;If staging runs in account A and prod runs in account B, you can't aggregate them in a single Cost Explorer view without enabling Cross-Account Cost Allocation. Even when enabled, you get a per-account view — not a per-environment view if both accounts host multiple envs.&lt;/p&gt;

&lt;p&gt;As &lt;a href="https://www.cloudzero.com/blog/aws-cost-allocation/" rel="noopener noreferrer"&gt;CloudZero's own tagging analysis&lt;/a&gt; puts it: "Tagging fails in shared or multi-tenant environments." The fact that CloudZero (a competitor to Fortem in some ways) writes this so plainly is telling.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually works: a hybrid model
&lt;/h2&gt;

&lt;p&gt;The answer is not "tag harder." It's &lt;em&gt;stop trying to make one mechanism solve a multi-mechanism problem&lt;/em&gt;. Here's the hybrid model that actually produces a usable per-environment number.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Tags for Fargate compute (the easy 60%)
&lt;/h3&gt;

&lt;p&gt;Cost allocation tags work fine for Fargate tasks. Activate the tag, apply it consistently, and you get per-env Fargate cost. This is roughly 60% of the total per-env cost on a typical fleet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: A fixed overhead model for shared services (the missing 30%)
&lt;/h3&gt;

&lt;p&gt;For the $90/mo/env of NAT + ALB + CW that tags can't see, allocate by a simple model: &lt;em&gt;each environment gets $X fixed overhead + $Y per Fargate service running in it&lt;/em&gt;. The exact numbers depend on your architecture (one shared ALB vs many; one VPC NAT vs per-env NAT), but the model is straightforward and doesn't require any AWS-side changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Read-time calculation for real-time answers (the remaining 10%)
&lt;/h3&gt;

&lt;p&gt;Cost Explorer data lags 24 hours. For some questions — "what did that scheduling change save us this week" — you need real-time. A read-time calculation reads your ECS task definitions, sums vCPU × $0.04048 and GB × $0.004445, and multiplies by hours running. It's how Fortem's AI skill computes the per-environment number on the fly. You can also do it with a 10-line bash script — see the next section.&lt;/p&gt;

&lt;p&gt;None of the three layers is sufficient alone. Together, they give you a per-environment cost number that holds up to the CFO question.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DIY version (10-line script)
&lt;/h2&gt;

&lt;p&gt;If you have fewer than 10 environments and you just need a one-time per-env cost number, this bash script does the Fargate-only calculation. It reads every ECS task in every cluster, looks up vCPU and memory from the task definition, multiplies by Fargate pricing, and prints a table.&lt;/p&gt;

&lt;p&gt;It does not include ALB, NAT, or CloudWatch — those are the "blind spot" already covered above. Add your $90/env fixed overhead to the result for a rough total.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="c"&gt;# Requires: aws cli, jq, bc. Pricing: us-east-1 Linux/x86 on-demand (May 2026).&lt;/span&gt;

&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail
&lt;span class="nv"&gt;VCPU_RATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.04048   &lt;span class="c"&gt;# $/vCPU-hour&lt;/span&gt;
&lt;span class="nv"&gt;MEM_RATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.004445   &lt;span class="c"&gt;# $/GB-hour&lt;/span&gt;
&lt;span class="nv"&gt;HOURS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;730           &lt;span class="c"&gt;# hours per month&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;cluster &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;aws ecs list-clusters &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'clusterArns[]'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;basename&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$cluster&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
  &lt;span class="k"&gt;for &lt;/span&gt;task &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;aws ecs list-tasks &lt;span class="nt"&gt;--cluster&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$cluster&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'taskArns[]'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nv"&gt;td&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ecs describe-tasks &lt;span class="nt"&gt;--cluster&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$cluster&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--tasks&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$task&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'tasks[0].taskDefinitionArn'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;cpu_mem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ecs describe-task-definition &lt;span class="nt"&gt;--task-definition&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$td&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'taskDefinition.{cpu:cpu,memory:memory}'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;cpu_units&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$cpu_mem&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $1}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;mem_mib&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$cpu_mem&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $2}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;vcpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"scale=4; &lt;/span&gt;&lt;span class="nv"&gt;$cpu_units&lt;/span&gt;&lt;span class="s2"&gt; / 1024"&lt;/span&gt; | bc&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;gb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"scale=4; &lt;/span&gt;&lt;span class="nv"&gt;$mem_mib&lt;/span&gt;&lt;span class="s2"&gt; / 1024"&lt;/span&gt; | bc&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"scale=4; &lt;/span&gt;&lt;span class="nv"&gt;$vcpu&lt;/span&gt;&lt;span class="s2"&gt; * &lt;/span&gt;&lt;span class="nv"&gt;$VCPU_RATE&lt;/span&gt;&lt;span class="s2"&gt; + &lt;/span&gt;&lt;span class="nv"&gt;$gb&lt;/span&gt;&lt;span class="s2"&gt; * &lt;/span&gt;&lt;span class="nv"&gt;$MEM_RATE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | bc&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"scale=2; &lt;/span&gt;&lt;span class="nv"&gt;$cost&lt;/span&gt;&lt;span class="s2"&gt; + &lt;/span&gt;&lt;span class="nv"&gt;$rate&lt;/span&gt;&lt;span class="s2"&gt; * &lt;/span&gt;&lt;span class="nv"&gt;$HOURS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | bc&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;done
  &lt;/span&gt;&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s2"&gt;"%-40s &lt;/span&gt;&lt;span class="nv"&gt;$%&lt;/span&gt;&lt;span class="s2"&gt;s/mo&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$cost&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it. Get a table. Add $90/env for the overhead you can't see. That's your per-environment number. It works for fleets up to ~10 envs.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to graduate from DIY
&lt;/h2&gt;

&lt;p&gt;The bash script is honest and useful for small fleets. It starts to hurt when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  You have 15+ environments and the script takes 20+ minutes to run&lt;/li&gt;
&lt;li&gt;  You need real-time cost (the script is a snapshot, not continuous)&lt;/li&gt;
&lt;li&gt;  You have Fargate Spot in some envs and on-demand in others (Spot needs separate calculation)&lt;/li&gt;
&lt;li&gt;  You need to track cost over time, not just right now&lt;/li&gt;
&lt;li&gt;  Multi-account fleets need aggregation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, you need a system that runs continuously, not a script. Fortem is one option (it does the same calculation as the script, but continuously, with Fargate Spot handled, with multi-account support, and with a real UI). Vantage and CloudZero are others (different approaches — they focus on total AWS spend and add tags, where Fortem starts from the ECS environment and computes what tags can't see).&lt;/p&gt;

&lt;p&gt;Whichever you pick, the question you should be asking is: _does it answer the CFO question in 5 seconds, or do I have to export a CSV and explain the methodology first?_If the latter, the tool isn't doing the job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Questions you probably have next
&lt;/h2&gt;

&lt;p&gt;Not product FAQ. The things you actually wonder about after reading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does AWS Cost Categories solve the per-environment problem?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Partially. Cost Categories let you group resources by tag combinations and account structure into named buckets (like "Production" or "All-Staging"). They work on top of tags — so they don't fix the untaggable shared services. Useful, but not a full replacement for the hybrid model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I backfill tags to see historical per-environment cost?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. AWS does not backfill cost allocation tags. If you activated your &lt;code&gt;Environment&lt;/code&gt; tag on June 1, your May bill has no per-env breakdown — even if every resource had the tag. The data is gone. Activate early, before you actually need the answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I split NAT Gateway cost across environments that share a VPC?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can't, not perfectly. Best approximations: (1) by GB data processed per env, if you have VPC Flow Logs and can attribute traffic; (2) evenly, by number of envs sharing the VPC. Neither is exact, but both are better than ignoring the cost. The fixed-overhead model (Layer 2 above) sidesteps this by treating shared services as a single per-env allocation, not as something to split.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the difference between Cost Categories and Cost Allocation Tags?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cost allocation tags are resource-level labels you apply to individual resources. Cost Categories are higher-level groupings that combine tag values and account structure into named buckets. Tags are the raw input; Categories are derived views. Both depend on the underlying tag discipline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;p&gt;Specific to Fortem and the DIY approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Fortem replace AWS Cost Explorer?
&lt;/h3&gt;

&lt;p&gt;No. Fortem reads your AWS account and computes per-environment cost in real time from Fargate pricing, ECS task definitions, and running task counts. It doesn't ingest your Cost &amp;amp; Usage Reports. Cost Explorer remains the source of truth for total AWS spend and historical billing data. Fortem answers the question Cost Explorer can't: what does each environment cost right now, and how does that change when you turn environments on and off?&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does it take to set up per-environment cost reporting?
&lt;/h3&gt;

&lt;p&gt;Reading your fleet and computing cost from Fargate pricing takes about 5 minutes via Fortem's AI skill (download .md, run in your AI agent, get the report). Manual DIY with the bash script in this article takes 30 minutes if you have AWS CLI configured. Full Fortem onboarding with continuous cost tracking is 7 business days — but the first fleet report is instant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I export per-environment cost to a spreadsheet?
&lt;/h3&gt;

&lt;p&gt;Yes. Fortem's discovery report is a self-contained HTML file you can open in any browser — copy the table into Sheets or Excel. The YAML file is also importable. Cost Explorer data is exportable to CSV. The bash script in this article outputs to the terminal, but piping to &lt;code&gt;column -t -s $'\t'&lt;/code&gt; gives a paste-ready table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Fortem work without any cost allocation tags?
&lt;/h3&gt;

&lt;p&gt;Yes. Fortem reads cluster names, service names, and task definitions — not tags. It parses naming conventions like &lt;code&gt;use1-prod-main&lt;/code&gt; and &lt;code&gt;staging-cluster-1&lt;/code&gt; to infer stages. Tagging helps with edge cases (orphaned resources, unusual names), but a clean naming convention is enough for most fleets.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Map your fleet in 5 min:&lt;/strong&gt; &lt;a href="https://fortem.dev/audit" rel="noopener noreferrer"&gt;fortem.dev/audit&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ecs</category>
      <category>fargate</category>
      <category>cost</category>
    </item>
    <item>
      <title>How to Clone an ECS Environment Without Rewriting Terraform?</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Sun, 07 Jun 2026 21:09:24 +0000</pubDate>
      <link>https://dev.to/dspv/how-to-clone-an-ecs-environment-without-rewriting-terraform-4ief</link>
      <guid>https://dev.to/dspv/how-to-clone-an-ecs-environment-without-rewriting-terraform-4ief</guid>
      <description>&lt;h1&gt;
  
  
  How to Clone an ECS Environment Without Rewriting Terraform?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/ecs-environment-clone" rel="noopener noreferrer"&gt;https://fortem.dev/blog/ecs-environment-clone&lt;/a&gt;&lt;br&gt;
Cloning 15 ECS services, an ALB, RDS, and SSM params is a 12-step manual process. Terraform workspaces break at 10+ services. Here's the template approach — and a working Terraform module.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What "clone an environment" actually means
&lt;/h2&gt;

&lt;p&gt;The phrase "clone this environment" lands on you from three different directions, each with its own urgency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Compliance audit:&lt;/strong&gt; "We need an isolated clone of EU production to test the GDPR flows." Need it by Friday. Cannot share production data.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;New engineer onboarding:&lt;/strong&gt;"Can you spin up a copy of staging so the new hire can break things in private?" Need it today. Prefer it by lunch.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;QA isolation:&lt;/strong&gt;"QA needs a clone of staging, but with the test Stripe key and a read-only RDS replica." Need it now and three more times this week.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In each case, "clone" does not mean copy one ECS service. It means copy &lt;strong&gt;the ensemble&lt;/strong&gt;: 12-18 services, an Application Load Balancer with listener rules, target groups, RDS instances (or snapshots), SSM parameter paths, ECR repos, NAT Gateway routing, CloudWatch log groups, Secrets Manager entries. 15 things, each with its own copy strategy, no orchestration layer between them.&lt;/p&gt;

&lt;p&gt;AWS has &lt;code&gt;aws ecs copy-service&lt;/code&gt; — but it copies ONE service. Not the ALB. Not the RDS. Not the SSM params. The rest is a &lt;a href="https://dev.to/blog/ecs-multi-environment-strategy/"&gt;bundle of heterogeneous services&lt;/a&gt; that you hold in your head. No tool knows what belongs together except you and your Terraform.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; An ECS environment is a bundle of services you deploy together, not a single resource. Copying a service copies one brick. Cloning a building means replicating the wiring, plumbing, and foundation too — and those live in different AWS namespaces.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The manual approach — 12 steps, 4 hours
&lt;/h2&gt;

&lt;p&gt;Here's the full walkthrough. Real steps, real commands. This is what happens every time a clone request hits your Slack. The times are from doing this 20+ times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1. Copy Terraform, find-replace env names&lt;/strong&gt; (15-20 minutes)&lt;/p&gt;

&lt;p&gt;Open your Terraform repo. Copy the environment root module. Find &lt;code&gt;environment = "production"&lt;/code&gt; in &lt;code&gt;terraform.tfvars&lt;/code&gt; — change to &lt;code&gt;"clone-gdpr"&lt;/code&gt;. Then go through 8 files and change every reference to the old env name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2. Register cloned task definitions&lt;/strong&gt; (10 minutes)&lt;/p&gt;

&lt;p&gt;For each of the 15 services: &lt;code&gt;aws ecs describe-task-definition&lt;/code&gt; on the source,&lt;code&gt;aws ecs register-task-definition&lt;/code&gt;with a new family name. The family must include the clone env name so you don't accidentally deploy to production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3. Create the ECS cluster&lt;/strong&gt; (2 minutes)&lt;/p&gt;

&lt;p&gt;&lt;code&gt;aws ecs create-cluster --cluster-name clone-gdpr&lt;/code&gt;. Trivial.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4. Create 15 services&lt;/strong&gt; (20-25 minutes)&lt;/p&gt;

&lt;p&gt;For each service: &lt;code&gt;aws ecs create-service&lt;/code&gt; with the cloned task def, target group ARN, subnets, security group, and service discovery namespace. If you get the VPC config wrong, the service launches in the wrong network. If you get the IAM role wrong, it launches and fails 10 minutes later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5. Copy ALB listener rules&lt;/strong&gt; (25-30 minutes — hardest part)&lt;/p&gt;

&lt;p&gt;The ALB has listener rules that route traffic by host header: &lt;code&gt;production.api.example.com&lt;/code&gt;. The clone needs &lt;code&gt;clone-gdpr.api.example.com&lt;/code&gt;. You need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Add a certificate for the new subdomain in ACM (20 min for DNS validation)&lt;/li&gt;
&lt;li&gt;  Create host-based routing rules for each service&lt;/li&gt;
&lt;li&gt;  Point each rule at the cloned target group&lt;/li&gt;
&lt;li&gt;  Verify priority ordering (rules are evaluated top-to-bottom; a wrong priority blocks traffic)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Steps 6-8. RDS, SSM params, Secrets Manager&lt;/strong&gt; (20-30 minutes)&lt;/p&gt;

&lt;p&gt;RDS: restore a snapshot or create a new instance with the same config. SSM: copy all &lt;code&gt;/production/*&lt;/code&gt; params to &lt;code&gt;/clone-gdpr/*&lt;/code&gt;. This is the step everyone forgets on the first attempt — the cloned services start but the ECS tasks can't read their config. They fail silently and report "running" for 5 minutes before cycling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps 9-11. ECR, log groups, testing&lt;/strong&gt; (15-20 minutes)&lt;/p&gt;

&lt;p&gt;Update ECR repo policy if you use per-env repos. Create CloudWatch log groups for each service (ECS auto-creates them, but you want the right retention policy). Test: does service A connect to the right RDS? Does service B hit the right ElastiCache? Does the Stripe webhook fire in test mode, not production? The answer to one of these is usually "no" on the first try.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 12. Document the clone&lt;/strong&gt; (10 minutes)&lt;/p&gt;

&lt;p&gt;Write the env name, purpose, and expiry date in the team wiki. 50/50 chance this actually happens. Three months later, CTO asks "what's clone-gdpr?" and nobody remembers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total:&lt;/strong&gt; 4-8 hours depending on how many things break, whether ACM decides to take 45 minutes to validate DNS, and how many SSM params you forget on the first pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terraform workspaces — when they work, when they break
&lt;/h2&gt;

&lt;p&gt;Terraform workspaces are the official answer for "same infrastructure, different instances." Each workspace has its own state file. When you run &lt;code&gt;terraform apply&lt;/code&gt;, it provisions resources with the workspace name embedded in resource tags.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workspaces work when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  All your services are identical across environments (same task definitions, same desired counts)&lt;/li&gt;
&lt;li&gt;  You use a single VPC for non-production&lt;/li&gt;
&lt;li&gt;  You don't have external services (MongoDB Atlas, Vercel, Firebase) with environment-specific connection strings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Workspaces break when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;You have 10+ services with different configurations per env.&lt;/strong&gt;A staging service might run 2 replicas on 0.5 vCPU. A production clone of the same service runs 4 replicas on 2 vCPU. Workspaces reuse the same Terraform code, so you're writing conditional logic inside &lt;code&gt;locals&lt;/code&gt; blocks — which defeats the purpose.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;You have 20+ environments.&lt;/strong&gt; Terraform has a hard limit of 20 workspaces per configuration. A team with 15 environments and 5 compliance clones is at the limit.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;RDS cloning isn't a Terraform operation.&lt;/strong&gt;It's an AWS operation that happens outside of Terraform's lifecycle. You run it separately, then update your Terraform variables to point at the clone. The orchestration gap is still there.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the full picture of what &lt;a href="https://dev.to/blog/ecs-fargate-terraform/"&gt;Terraform handles well for ECS and what it doesn't&lt;/a&gt;, the Terraform-specific guide walks through the gaps in detail. Cloning is where the orchestration gap is most visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The template approach — 30 seconds, no Terraform
&lt;/h2&gt;

&lt;p&gt;A template defines the environment once — services, configs, dependencies, env vars, secrets references, external service connections. Cloning from a template means: pick the source env → give it a name → pick a region → done.&lt;/p&gt;

&lt;p&gt;Under the hood, the template engine calls 6 ECS APIs, copies task definitions, maps SSM parameter paths, sets up ALB listener rules, creates service connections, and points everything at the right resources. These are the same 12 manual steps from section 2. The difference is &lt;strong&gt;none of them are manual&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/proton/" rel="noopener noreferrer"&gt;AWS Proton was supposed to be this&lt;/a&gt; — AWS's environment templating service. Proton let you define a CloudFormation template for an environment and deploy instances of it. It was the right idea: define once, clone as many times as you need.&lt;/p&gt;

&lt;p&gt;Proton is deprecated October 7, 2026. The &lt;a href="https://dev.to/migrate-from-proton/"&gt;migration timeline is public&lt;/a&gt;. For teams with 15+ ECS environments and regular cloning needs, Fortem fills the gap — Proton's template engine, rebuilt for ECS Fargate specifically, without the CloudFormation layer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; The template approach works because it understands what an "environment" is — a bundle of services deployed together. The manual approach doesn't know this. You, the platform engineer, hold the bundle in your head. The script doesn't. When you clone an environment for the 30th time, the error rate converges to a function of how tired you are.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Ready-to-use: parameterized Terraform module
&lt;/h2&gt;

&lt;p&gt;If you want a DIY approach that's better than fully manual but doesn't require a template engine, here's a parameterized Terraform module. It clones an ECS service — for the full env, run it 15 times with different service names.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="c1"&gt;# for a full environment clone.&lt;/span&gt;
&lt;span class="c1"&gt;# Usage: module "clone_service" { source = "./ecs-service" env = "clone-gdpr" ... }&lt;/span&gt;

&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"env"&lt;/span&gt;              &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"service_name"&lt;/span&gt;     &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"cluster_name"&lt;/span&gt;     &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"task_family"&lt;/span&gt;      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"subnets"&lt;/span&gt;          &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"security_groups"&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"container_image"&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"vpc_id"&lt;/span&gt;           &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"alb_listener_arn"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_service"&lt;/span&gt; &lt;span class="s2"&gt;"service"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nx"&gt;cluster&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cluster_name&lt;/span&gt;
  &lt;span class="nx"&gt;task_definition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ecs_task_definition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
  &lt;span class="nx"&gt;desired_count&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="nx"&gt;launch_type&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"FARGATE"&lt;/span&gt;
  &lt;span class="nx"&gt;network_configuration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;subnets&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subnets&lt;/span&gt;
    &lt;span class="nx"&gt;security_groups&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;security_groups&lt;/span&gt;
    &lt;span class="nx"&gt;assign_public_ip&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_task_definition"&lt;/span&gt; &lt;span class="s2"&gt;"task"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;family&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;task_family&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nx"&gt;network_mode&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"awsvpc"&lt;/span&gt;
  &lt;span class="nx"&gt;requires_compatibilities&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"FARGATE"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;cpu&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"256"&lt;/span&gt;
  &lt;span class="nx"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"512"&lt;/span&gt;
  &lt;span class="nx"&gt;container_definitions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;([{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;
    &lt;span class="nx"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;container_image&lt;/span&gt;
    &lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ENV_PREFIX"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"SSM_PATH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ALB target group per service, host-based routing by env&lt;/span&gt;
&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lb_target_group"&lt;/span&gt; &lt;span class="s2"&gt;"tg"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nx"&gt;port&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
  &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"HTTP"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;
  &lt;span class="nx"&gt;target_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ip"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lb_listener_rule"&lt;/span&gt; &lt;span class="s2"&gt;"rule"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;listener_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;alb_listener_arn&lt;/span&gt;
  &lt;span class="nx"&gt;priority&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subnets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subnets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subnets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="err"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;target_group_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_lb_target_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"forward"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;condition&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;host_header&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;values&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.example.com"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This module works for standard ECS services. It requires you to parameterize your infrastructure — essentially, building a template engine by hand. For 3 services, this is clean. For 15 services, each with different vCPU/memory configs and different env vars, you end up with 15 copies of this module, each with its own values — and you've rebuilt Proton by accident.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to stop cloning manually
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;At 5 environments:&lt;/strong&gt;manual cloning is ~20 hours/year of copy-paste. The SSM params step is the bottleneck. You're still faster than setting up a template engine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At 10 environments:&lt;/strong&gt;manual cloning is ~40 hours/year. Terraform workspaces are hitting the 20-workspace limit. You're spending a work week per year on something a machine should do. The parameterized module above makes sense here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At 20+ environments:&lt;/strong&gt;you need a template engine. The module approach has become 15 near-identical copies of the same thing, each with slightly different values. You've rebuilt what Fortem ships, minus the UI, the RBAC, and the audit log of who cloned what and when.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When compliance is involved:&lt;/strong&gt;stop immediately. A single forgotten SSM parameter — a single service pointing at production DB instead of the cloned DB — means a failed audit. Not an extra hour. A failed audit. Template engines don't forget SSM params. Human beings do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Questions you probably have next
&lt;/h2&gt;

&lt;p&gt;Not product FAQ. The things you actually wonder about after reading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I clone between AWS accounts?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, but it's not a single operation. Cross-account cloning requires copying ECR images across accounts (using cross-account pull permissions), restoring RDS snapshots into the target account, and copying SSM parameters manually. A template engine with cross-account IAM roles (like Fortem on the Scale plan) does it in one operation. Manually, it's 8-12 hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the clone include RDS data, or just the schema?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;RDS cloning is a snapshot operation — it includes data. For GDPR compliance testing, you need the data to validate masking and access controls. For developer sandboxes, an empty schema is often enough. Template engines can do both (restore a snapshot OR create an empty instance with the same config). The manual Terraform module above creates an empty instance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about MongoDB Atlas / Vercel — can I clone external services too?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;External services (Atlas, Firebase, Vercel, Cloudflare) have their own cloning APIs. No cloud templating engine can touch them directly — they're outside AWS. The Fortem approach stores the external service connection details as parameters and points the cloned services at them automatically. The actual clone of the Atlas cluster or Vercel project is separate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is cloning different from a Blue/Green deployment?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Blue/Green is deploying a NEW version of an EXISTING service alongside the old version — same env, different code. Cloning is copying an ENTIRE environment — different env, same infrastructure pattern. Blue/Green is about deployment safety. Cloning is about environment replication. The tools are different, the lifecycle is different. Don't confuse them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does Fortem clone RDS instances?
&lt;/h3&gt;

&lt;p&gt;Fortem clones ECS Fargate services and orchestrates the surrounding infrastructure (ALB rules, target groups, SSM parameter paths, Secrets Manager references, CloudWatch log groups). RDS cloning is a separate operation — you'd restore a snapshot or use &lt;code&gt;pg_dump&lt;/code&gt;. Fortem handles the connection strings and env vars that point the cloned services at the right database instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does a clone take in Fortem?
&lt;/h3&gt;

&lt;p&gt;The clone itself takes 30-60 seconds. Fortem copies task definitions, creates new ECS services with the same config, duplicates ALB listener rules with host-based routing for the new env name, maps SSM parameter paths, and sets IAM roles. RDS snapshots take longer (10-30 minutes depending on size) but are started by Fortem and completed by AWS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I clone into a different AWS region?
&lt;/h3&gt;

&lt;p&gt;Yes. Fortem's template engine supports region-aware parameters. The task definitions are region-agnostic. Services are created in the target region. RDS snapshots need to be copied to the target region first (Fortem initiates the copy). Cross-account cloning is also supported on the Scale and Enterprise plans.&lt;/p&gt;

&lt;h3&gt;
  
  
  What permissions does cloning need?
&lt;/h3&gt;

&lt;p&gt;Fortem needs the same 6 read-only ECS permissions for discovery, plus ecs:RegisterTaskDefinition, ecs:CreateService, elasticloadbalancing:CreateRule, ssm:PutParameter, ecs:UpdateService, and iam:PassRole to create the cloned services. All scoped to the target environment's resources by IAM condition keys. The exact policy is published on the security page.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Map your fleet in 5 min:&lt;/strong&gt; &lt;a href="https://fortem.dev/audit" rel="noopener noreferrer"&gt;fortem.dev/audit&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ecs</category>
      <category>terraform</category>
      <category>devops</category>
    </item>
    <item>
      <title>Fortem vs Cortex: Which Tool Actually Operates Your ECS Fleet?</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 20:48:53 +0000</pubDate>
      <link>https://dev.to/dspv/fortem-vs-cortex-which-tool-actually-operates-your-ecs-fleet-4foo</link>
      <guid>https://dev.to/dspv/fortem-vs-cortex-which-tool-actually-operates-your-ecs-fleet-4foo</guid>
      <description>&lt;h1&gt;
  
  
  Fortem vs Cortex: Which Tool Actually Operates Your ECS Fleet?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/fortem-vs-cortex" rel="noopener noreferrer"&gt;https://fortem.dev/blog/fortem-vs-cortex&lt;/a&gt;&lt;br&gt;
Cortex and Fortem solve different problems. Cortex is an Engineering Operations Platform for org-wide visibility. Fortem operates your ECS Fargate fleet specifically. Here's which one you need — and when to use both.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Versus&lt;/p&gt;

&lt;p&gt;Cortex and Fortem solve different problems, even though both get called “engineering platforms.” Cortex gives you visibility across your whole stack — service catalog, scorecards, AI-driven insights. Fortem operates your ECS Fargate fleet specifically — scheduling, dev self-service, cost attribution. They're not direct competitors. Here's which one you actually need — and when the answer is “both.”&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Cortex is a general Engineering Operations Platform — 50+ integrations, org-wide service catalog, AI-driven insights&lt;/li&gt;
&lt;li&gt;  Fortem is an ECS Fargate operations layer — scheduling, fleet visibility, dev self-service, cost, AI diagnostics&lt;/li&gt;
&lt;li&gt;  Cortex is the right tool when you have 20+ services across multiple runtimes; Fortem when you have 10+ ECS Fargate environments&lt;/li&gt;
&lt;li&gt;  Cortex pricing is sales-led (no public price); Fortem has self-serve tiers from $799/mo with 7-day onboarding&lt;/li&gt;
&lt;li&gt;  At 200+ engineers with mixed runtimes, many teams use both — Cortex for visibility, Fortem for ECS ops&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Cortex actually is
&lt;/h2&gt;

&lt;p&gt;Cortex calls itself an “Engineering Operations Platform” — Y Combinator-backed, SOC 2 Type 2 and ISO 27001 certified. The product is organized around three pillars: &lt;strong&gt;Catalog&lt;/strong&gt; (centralize service ownership, dependencies, and metadata across all your tools), &lt;strong&gt;Scorecard&lt;/strong&gt; (production-readiness scoring and operational metrics across teams), and &lt;strong&gt;Workflows&lt;/strong&gt; (golden paths, migrations, and self-service actions).&lt;/p&gt;

&lt;p&gt;The integrations list tells the story: 50+ tools supported — GitHub, PagerDuty, Datadog, Jira, AWS, GCP, Kubernetes, Backstage, and more. ECS Fargate is one of dozens. The product is designed to sit at the center of a multi-runtime engineering org and give engineering leaders a single pane of glass.&lt;/p&gt;

&lt;p&gt;The customer list backs this up — Affirm, Canva, O'Reilly, Skyscanner, Xero, Bumble, Tripadvisor, Outreach, BigCommerce, Rapid7, Let's Get Checked, H&amp;amp;R Block, Archer. Most are 500+ person companies with mixed runtimes. Cortex's target persona is an engineering leader — a VP Eng or Director of Platform — who needs to see and improve the org, not the platform engineer who needs to operate ECS.&lt;/p&gt;

&lt;p&gt;Their tagline: &lt;em&gt;“Code is no longer the bottleneck. Everything else is.”&lt;/em&gt; The bet is that in an AI-accelerated world, the operational layer — ownership, reliability, standards — becomes the limiting factor. Cortex sells the tooling to fix that.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Fortem actually is
&lt;/h2&gt;

&lt;p&gt;Fortem is an ECS Fargate operations layer. It connects to your AWS account via read-only IAM, sees your existing ECS clusters, and adds the operational layer that's missing: per-environment scheduling, fleet-wide visibility, developer self-service for restart/redeploy/logs, cost attribution per environment, and AI diagnostics when tasks fail.&lt;/p&gt;

&lt;p&gt;Fortem doesn't provision infrastructure. It doesn't manage your CI/CD pipelines. It doesn't deploy code. It reads what you already have on ECS, adds operations on top, and gets out of the way. The typical onboarding is: grant IAM access, point Fortem at your clusters, set your schedules. No Terraform rewrite, no pipeline migration, no multi-month evaluation.&lt;/p&gt;

&lt;p&gt;The target persona is a platform engineer at a 30–200 person SaaS running 10+ ECS Fargate environments — someone who is the bottleneck for their team's dev/staging workflows and needs operational control over their specific runtime, not a tool to manage org-wide engineering metrics.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision matrix — which one do you need?
&lt;/h2&gt;

&lt;p&gt;The honest answer is that for most teams, it's not Cortex _or_Fortem — it's one of four scenarios, and only two of them are “pick one.”&lt;/p&gt;

&lt;p&gt;Which tool do you need?&lt;/p&gt;

&lt;p&gt;20+ services across ECS, K8s, Lambda&lt;/p&gt;

&lt;p&gt;Persona: VP Eng / Director&lt;/p&gt;

&lt;p&gt;Pain: "Can't see what teams are doing"&lt;/p&gt;

&lt;p&gt;→ Cortex&lt;/p&gt;

&lt;p&gt;Org-wide visibility &amp;gt; per-runtime depth&lt;/p&gt;

&lt;p&gt;15+ ECS Fargate environments&lt;/p&gt;

&lt;p&gt;Persona: Platform engineer&lt;/p&gt;

&lt;p&gt;Pain: "Platform team is a bottleneck"&lt;/p&gt;

&lt;p&gt;→ Fortem&lt;/p&gt;

&lt;p&gt;ECS-specific ops beats broad catalog&lt;/p&gt;

&lt;p&gt;50+ services, mostly ECS, need both&lt;/p&gt;

&lt;p&gt;Persona: Eng leader + platform team&lt;/p&gt;

&lt;p&gt;Pain: "Both org visibility AND ECS control"&lt;/p&gt;

&lt;p&gt;→ Both&lt;/p&gt;

&lt;p&gt;Different layers, complementary — not competing&lt;/p&gt;

&lt;p&gt;Starting from scratch, no platform team, ≤10 services&lt;/p&gt;

&lt;p&gt;Persona: Small team, no eng leadership yet&lt;/p&gt;

&lt;p&gt;Pain: "Need basic tooling"&lt;/p&gt;

&lt;p&gt;→ Neither&lt;/p&gt;

&lt;p&gt;Try Backstage or humanitec first — both tools are overkill at this size&lt;/p&gt;

&lt;p&gt;The answer is rarely "X vs Y" — it's "what's the dominant problem you're trying to solve."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; The mistake most comparison articles make is treating Cortex and Fortem as direct competitors. They aren't. They solve different problems at different layers. Picking one over the other is a category error — the real question is “which problem is more painful for us right now?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Cortex does that Fortem doesn't
&lt;/h2&gt;

&lt;p&gt;None of this is operational ECS management — it's org-level engineering visibility, which is Cortex's category.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Service catalog across all tools. Cortex pulls ownership and metadata from GitHub, PagerDuty, Datadog, Jira, AWS, GCP, Kubernetes, Backstage, and 40+ more. Fortem sees ECS only.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scorecards and maturity scoring. Production-readiness grading per service per team. Trendlines over time. Compliance with internal standards. Fortem has no equivalent.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Org-wide engineering metrics. DORA metrics, MTTR across teams, deployment frequency, ownership gaps. These are Cortex's bread and butter.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migrations. Cortex has a dedicated “Backstage migration helper” and “Break up with Backstage” content. They help teams move from spreadsheets, homegrown tools, or failed Backstage implementations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compliance and audit features. SOC 2 and ISO 27001 reporting, audit logs for org-level reviews. Cortex is positioned for compliance teams and eng leaders doing org-wide audits.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI for code review and AI adoption tracking. Cortex AI Assistant, AI Impact product, Context Graph. These target the AI-coding-tool proliferation problem — measuring whether Cursor/Copilot adoption is actually moving the needle.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Fortem does that Cortex doesn't
&lt;/h2&gt;

&lt;p&gt;Cortex is a &lt;em&gt;catalog&lt;/em&gt;. Fortem is an &lt;em&gt;operator&lt;/em&gt;. This is the gap most teams hit when they evaluate Cortex and realize it doesn't actually &lt;em&gt;do&lt;/em&gt; anything to their environments — it just tells them about them.&lt;/p&gt;

&lt;p&gt;For more on what Fortem covers day-to-day, see &lt;a href="https://dev.to/blog/ecs-fargate-best-practices/"&gt;our guide to running ECS Fargate at fleet scale&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Per-environment scheduling. Stop dev at 7pm, start at 9am. Different schedules per env, per timezone, per holiday calendar. See &lt;a href="https://dev.to/blog/ecs-environment-scheduling/"&gt;the full guide&lt;/a&gt;. Cortex doesn't operate environments — it has no scheduling concept.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real-time ECS fleet cost attribution. Per environment, per service, per task — the actual AWS bill, not a stitched-together estimate. Cortex's integrations surface AWS data but with lag and less granularity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Environment cloning across AWS accounts. Spin up a new dev or QA env from a known-good template, in a different account, with the right IAM, secrets, and service config. Cortex's catalog knows about services; it doesn't provision them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Developer self-service for ECS actions. Restart, redeploy, view logs — all from a Slack command or a web UI, with RBAC scoped to the developer's own environments. No ticket to the platform team. Cortex Workflows can do this in Cortex's UI; Fortem runs these in the platform team's existing tools.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ECS-specific AI diagnostics. When a task fails, Fortem reads CloudWatch, walks the task definition, checks IAM, and proposes a fix in 8 seconds. State only changes on your click. Cortex's AI is for org-level questions; Fortem's is for “why is this task failing right now.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Works with your existing Terraform. No migration. No state modification. No HCL parsing. Fortem reads the result of &lt;code&gt;terraform apply&lt;/code&gt; and operates on top.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pricing and onboarding
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cortex:&lt;/strong&gt;custom pricing, sales-led, no public number. Expect enterprise pricing — multi-month evaluation cycles are normal. You'll talk to a sales rep, go through procurement, do a 3-6 month rollout. That's fine for an enterprise org that's already decided; it's brutal for a 50-person SaaS that needs something this quarter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fortem:&lt;/strong&gt; self-serve tiers, Starter $799/mo (up to 20 environments), Scale $2,499/mo (up to 80 environments), Enterprise custom. Managed onboarding in 7 business days. No procurement cycle unless you want Enterprise.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; For most ECS-first teams under 200 engineers, the procurement-cycle difference matters more than the feature comparison. You can be live with Fortem before Cortex has scheduled your first demo call. That's not a feature — it's a constraint on your planning horizon.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  When to use both
&lt;/h2&gt;

&lt;p&gt;The honest answer for large orgs: Cortex and Fortem are complementary, not competing. They sit at different layers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cortex's role:&lt;/strong&gt; cross-team visibility, scorecards, migrations, compliance reporting, AI adoption tracking. The catalog of every service, the maturity scoring, the org-wide engineering metrics. The thing your eng leader wants to see at the all-hands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fortem's role:&lt;/strong&gt; ECS-specific operations. Scheduling, dev self-service, cost attribution per env, AI diagnostics, fleet visibility for the platform team. The thing the platform engineer uses daily to keep the ECS fleet running.&lt;/p&gt;

&lt;p&gt;They don't overlap. Cortex sees your services; Fortem operates your ECS environments. Some teams have Cortex surface Fortem-managed environments as a service in the catalog — read-only, no coupling required, both tools do their job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Side-by-side at a glance
&lt;/h2&gt;

&lt;p&gt;All claims verified June 2026. Cortex pricing is not public.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Cortex&lt;/th&gt;
&lt;th&gt;Fortem&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Focus&lt;/td&gt;
&lt;td&gt;Engineering org visibility&lt;/td&gt;
&lt;td&gt;ECS Fargate operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service catalog&lt;/td&gt;
&lt;td&gt;Yes (all tools)&lt;/td&gt;
&lt;td&gt;ECS only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scorecards / maturity&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Environment scheduling&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ECS cost attribution&lt;/td&gt;
&lt;td&gt;Limited (via integrations)&lt;/td&gt;
&lt;td&gt;Native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer self-service&lt;/td&gt;
&lt;td&gt;Via Workflows&lt;/td&gt;
&lt;td&gt;RBAC-scoped ECS restart/redeploy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI for code / AI ops&lt;/td&gt;
&lt;td&gt;Yes (Cortex AI Assistant)&lt;/td&gt;
&lt;td&gt;Yes (ECS diagnostics)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing&lt;/td&gt;
&lt;td&gt;Custom, sales-led&lt;/td&gt;
&lt;td&gt;Self-serve from $799/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Onboarding&lt;/td&gt;
&lt;td&gt;Multi-month&lt;/td&gt;
&lt;td&gt;7 business days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime support&lt;/td&gt;
&lt;td&gt;All (multi-runtime)&lt;/td&gt;
&lt;td&gt;ECS Fargate only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open source alternative&lt;/td&gt;
&lt;td&gt;Backstage&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can Cortex replace Fortem for ECS teams?
&lt;/h3&gt;

&lt;p&gt;Partially. Cortex gives you a service catalog, scorecards, and org-wide engineering metrics. It does not give you per-environment scheduling, ECS-specific cost attribution, or developer self-service for restart/redeploy/logs. For ECS teams under 50 engineers, Fortem covers more of what you actually need day-to-day. For orgs with 200+ engineers across multiple runtimes, Cortex covers what Fortem doesn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Cortex support ECS Fargate scheduling like Fortem?
&lt;/h3&gt;

&lt;p&gt;No. Cortex is a catalog and insights platform — it surfaces information about your services but doesn't directly operate them. Per-environment scheduling (start dev at 9am, stop at 7pm) is a Fortem feature. Cortex could integrate with Fortem to surface scheduling data, but it doesn't replace it.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does Cortex take to roll out vs Fortem?
&lt;/h3&gt;

&lt;p&gt;Cortex is sales-led with custom pricing — typical evaluation cycles are 3-6 months including procurement. Fortem has self-serve tiers starting at $799/mo with managed onboarding in 7 business days. For teams that need something working this quarter, this difference matters more than feature comparison.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need Cortex if I already have Backstage?
&lt;/h3&gt;

&lt;p&gt;If your Backstage instance is maintained and adoption is real, you probably don't. Cortex explicitly positions against Backstage with a 'Break up with Backstage' migration offer — meaning teams who tried Backstage but found adoption low are Cortex's target. If your Backstage is alive and used, the calculus is different. If it's a ghost town, Cortex or Fortem (for ECS-only) are worth evaluating.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is there overlap between Cortex's Workflows and Fortem's self-service?
&lt;/h3&gt;

&lt;p&gt;Some overlap, but the targets differ. Cortex Workflows run inside Cortex's UI for migrations, service creation, and standards enforcement across all tools. Fortem's self-service runs in your platform engineer's existing tools (Slack, web) for ECS-specific actions like restart, redeploy, and log access. They're complementary, not competing. Many teams run both — Cortex for org-wide golden paths, Fortem for ECS-specific operational actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What if my stack is mostly ECS but I have one or two K8s clusters?
&lt;/h3&gt;

&lt;p&gt;Fortem today is ECS-only. If K8s is a small fraction of your stack (1-2 clusters out of 20+ ECS environments), you probably want Fortem for the ECS majority plus K8s tooling from somewhere else. If K8s is growing fast and will become equal-to-ECS, Cortex's multi-runtime approach is a better long-term fit. The decision is really about 'how much of my world is ECS in 12 months' — if 80%+, Fortem wins; if 50/50 and shifting, Cortex wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you read this, you might also want to know
&lt;/h2&gt;

&lt;p&gt;If I'm comparing Cortex, shouldn't I also look at Backstage?&lt;/p&gt;

&lt;p&gt;Yes — if your team has the appetite to run an OSS platform, Backstage is the most flexible option. Cortex explicitly positions against Backstage (they have a 'Break up with Backstage' page). The calculus: Backstage is free but requires significant platform engineering to operate. Cortex is paid but managed. Fortem is paid and managed but ECS-specific. If you want a Backstage comparison, see our future guide — this article is focused on Cortex and Fortem.&lt;/p&gt;

&lt;p&gt;How do I convince my eng leadership that we need Cortex (or Fortem) at all?&lt;/p&gt;

&lt;p&gt;Run a one-week audit: count how many environments exist, who's responsible for each, and how the platform team spends their time. If the answer is 'we have 30 envs and the platform team spends 60% of their time on restarts and access requests' — Fortem ROI is obvious. If the answer is 'we don't know what teams are doing or whether our services are ready' — Cortex ROI is obvious. The tool choice follows from the audit, not the other way around.&lt;/p&gt;

&lt;p&gt;Does Fortem integrate with Cortex?&lt;/p&gt;

&lt;p&gt;Not directly today, but the integration is straightforward — Fortem exposes its fleet state via API, Cortex can ingest any HTTP source. A team running both typically surfaces Fortem-managed environments as services in the Cortex catalog (read-only) so the eng leader sees a single pane. We're not actively building a Cortex plugin right now, but if that's the blocker for your evaluation, talk to us — small customer requests move fast on the roadmap.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;### Comparing Cortex, Fortem, and Backstage for your stack? 20 minutes with a Fo&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;All ECS comparisons:&lt;/strong&gt; &lt;a href="https://fortem.dev/versus" rel="noopener noreferrer"&gt;fortem.dev/versus&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cortex</category>
      <category>ecs</category>
      <category>idp</category>
    </item>
    <item>
      <title>How to Cut AWS Costs Without Reserved Instances</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 14:00:39 +0000</pubDate>
      <link>https://dev.to/dspv/how-to-cut-aws-costs-without-reserved-instances-56p8</link>
      <guid>https://dev.to/dspv/how-to-cut-aws-costs-without-reserved-instances-56p8</guid>
      <description>&lt;h1&gt;
  
  
  How to Cut AWS Costs Without Reserved Instances
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/reduce-aws-costs-without-ri" rel="noopener noreferrer"&gt;https://fortem.dev/blog/reduce-aws-costs-without-ri&lt;/a&gt;&lt;br&gt;
RIs and Savings Plans are table stakes — they change how you pay, not what runs. Here are 5 methods that cut your actual AWS consumption, ranked by impact: scheduling, right-sizing, Spot, auto-stop, and killing orphans.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Guide&lt;/p&gt;

&lt;p&gt;You've already set up Reserved Instances and Savings Plans. You checked the boxes the FinOps team sent over. Your AWS bill is still too high — and it keeps climbing. That's because RIs and Savings Plans change &lt;em&gt;how you pay&lt;/em&gt; for compute. They don't change &lt;em&gt;how much compute you actually consume&lt;/em&gt;. If your dev and staging environments run 24/7 while your team works 40 hours a week, no pricing model optimization will fix that. Here are five things that will.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  RIs and Savings Plans change your pricing model — not your consumption. They're table stakes. Get them first, then keep reading.&lt;/li&gt;
&lt;li&gt;  Scheduling non-prod environments to business hours alone cuts compute spend by 60–70% — 3× the impact of a typical RI on non-prod workloads.&lt;/li&gt;
&lt;li&gt;  Right-sizing overprovisioned services costs $0 to implement and saves 10–30% immediately. Check p95 CloudWatch metrics before changing a single line of Terraform.&lt;/li&gt;
&lt;li&gt;  Fargate Spot drops compute costs ~70% for fault-tolerant workloads. Combined with scheduling, dev environments cost near-zero.&lt;/li&gt;
&lt;li&gt;  Most teams have 5–15% of environments that nobody owns. Finding and deleting 3 orphaned environments recovers $500–2,000/month.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reserved Instances are table stakes — what's next?
&lt;/h2&gt;

&lt;p&gt;If you don't have RIs or Savings Plans set up: stop reading. Go to the &lt;a href="https://aws.amazon.com/savingsplans/" rel="noopener noreferrer"&gt;AWS Savings Plans console&lt;/a&gt; and commit to a 1-year plan for your production workloads. It's a 30–50% discount on list price for zero engineering effort. This is the lowest-hanging fruit in AWS cost optimization. Do it first.&lt;/p&gt;

&lt;p&gt;Now here's the problem RIs don't solve: they change the price per unit, but not the number of units you consume. Your dev environments still run 168 hours a week. Your staging environment still sits idle at 3am on Sunday. Your three orphaned environments from last year's migration still bill by the second.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; On a $10,000/month AWS bill where 70% is non-production compute: a 40% RI discount saves $2,800/month. Scheduling those same non-production environments to business hours saves $4,900/month. RI addresses the pricing model. Scheduling addresses the consumption.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;$10,000/mo bill breakdown:&lt;/p&gt;

&lt;p&gt;Non-production compute (70%): $7,000/mo&lt;/p&gt;

&lt;p&gt;RI savings on non-prod (40%): −$2,800/mo&lt;/p&gt;

&lt;p&gt;Scheduling savings (70% of compute hrs): −$4,900/mo&lt;/p&gt;

&lt;p&gt;Scheduling captures 1.75× more savings than RIs on non-prod — and you can do both&lt;/p&gt;

&lt;h2&gt;
  
  
  Method 1: Schedule environments (60–70% savings)
&lt;/h2&gt;

&lt;p&gt;There are 168 hours in a week. Your team works roughly 50 of them (Mon–Fri 9am–7pm). The other 118 hours — nights, weekends, holidays — your non-production ECS services sit idle, billing by the second. Scheduling means stopping them during off-hours and restarting them at the start of the workday.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“AWS Fargate charges $0.04048 per vCPU-hour and $0.004445 per GB-hour for Linux/x86 on-demand pricing. Every hour a dev environment runs at 3am, every minute a staging cluster spins through the weekend — that's billing against this rate.”&lt;/p&gt;

&lt;p&gt;— &lt;a href="https://aws.amazon.com/fargate/pricing/" rel="noopener noreferrer"&gt;AWS Fargate Pricing&lt;/a&gt;, verified May 2026&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What to schedule:&lt;/strong&gt;dev environments, QA, demo environments, training sandboxes, branch preview environments. Anything that doesn't need to be available at 3am on Sunday.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What NOT to schedule:&lt;/strong&gt; production, customer-facing staging, on-call sandboxes that need 24/7 availability. Use per-environment configuration — not a single global schedule.&lt;/p&gt;

&lt;p&gt;$1,730/mo&lt;/p&gt;

&lt;p&gt;$515/mo&lt;/p&gt;

&lt;p&gt;24/7 — always on&lt;/p&gt;

&lt;p&gt;168 hrs/week&lt;/p&gt;

&lt;p&gt;Business hours&lt;/p&gt;

&lt;p&gt;50 hrs/week · Mon–Fri 9am–7pm&lt;/p&gt;

&lt;p&gt;Monthly cost — 12 environments, 8 services each−70% savings&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; Scheduling costs $0 to implement — it's purely an operational change. No Terraform modifications. No new resources. Just stopping services when nobody is using them. For most teams with 10+ non-prod environments, scheduling is the single largest savings lever by a wide margin.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Implementation: tag every non-production environment, then run a scheduler (EventBridge + Lambda, or a third-party tool) that sets desired counts to 0 during off-hours and back to N at the start of the workday. Per-timezone configuration matters — your EU team starts 6 hours before your US team.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method 2: Right-size your services (10–30% savings)
&lt;/h2&gt;

&lt;p&gt;When someone first deployed that dev API service, they picked 1 vCPU and 2 GB. It made sense at the time. Six months later, the service processes one request per minute during business hours and sits idle every other second. It's paying for capacity it never uses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to find overprovisioned services:&lt;/strong&gt; go to &lt;a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-insights.html" rel="noopener noreferrer"&gt;CloudWatch Container Insights&lt;/a&gt; → your ECS cluster → CPU Utilization and Memory Utilization per service. Look at the p95 over the last 14 days — not the average. A service with p95 CPU at 87 units on a 1024-unit allocation is using 8.5% of its provisioned capacity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; A common pattern: task definition requests 1024 CPU units. CloudWatch p95 over 14 days shows 87 CPU units. That service is paying for 12× more CPU than it actually needs. Right-size the task definition to 256 (p95 87 × 3 = 261 ≈ 256) and you cut its Fargate cost by 75%.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Right-sizing rule: p95 × 3, round to nearest Fargate increment&lt;/p&gt;

&lt;p&gt;1 vCPU (1024) → p95 = 87 → 87 × 3 = 261 → right-size to 256 = −75% cost&lt;/p&gt;

&lt;p&gt;0.5 vCPU (512) → p95 = 120 → 120 × 3 = 360 → keep at 512 = already right-sized&lt;/p&gt;

&lt;p&gt;2 GB memory → p95 = 310 MB → 310 × 3 = 930 MB → right-size to 1 GB = −50% cost&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Risk:&lt;/strong&gt; traffic spikes can overwhelm a right-sized service. Mitigate with ECS Service Auto Scaling — set a target tracking policy on CPU utilization at 70%. The service starts small, scales up when needed, scales down at night. Right-sizing without autoscaling is gambling. Right-sizing with autoscaling is engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method 3: Fargate Spot (up to 70% discount)
&lt;/h2&gt;

&lt;p&gt;Fargate Spot runs tasks on spare AWS capacity at roughly 70% off on-demand pricing, per &lt;a href="https://aws.amazon.com/fargate/pricing/" rel="noopener noreferrer"&gt;AWS Fargate pricing&lt;/a&gt; (verified May 2026). The tradeoff: AWS can reclaim that capacity with a 2-minute warning. ECS handles the drain and restart cleanly — your task gets SIGTERM, 30 seconds to drain connections, then the replacement task starts on either new Spot capacity or falls back to On-Demand.&lt;/p&gt;

&lt;p&gt;Fargate Spot vs On-Demand (0.5 vCPU + 1 GB, Linux/x86):&lt;/p&gt;

&lt;p&gt;On-Demand: $0.024685/hr → $18.02/service/mo&lt;/p&gt;

&lt;p&gt;Spot: $0.007872/hr → $5.75/service/mo&lt;/p&gt;

&lt;p&gt;−68% per service&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Good for Spot:&lt;/strong&gt; CI/CD test runners, batch jobs, dev environments for individual engineers, any workload that restarts cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad for Spot:&lt;/strong&gt;production, customer-facing staging, anything with an SLA. Use the capacity provider strategy to split — 80% Spot / 20% On-Demand — and interruptions don't cause downtime, just a brief shift to on-demand.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; Spot combined with scheduling creates a compound effect: a dev service on business hours (29.8% of the week) running on Spot (32% of on-demand price) costs just 9.5% of the original 24/7 on-demand cost. A $18.02/month service drops to $1.71/month. That's not a typo.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Method 4: Auto-stop idle environments
&lt;/h2&gt;

&lt;p&gt;This is different from scheduling. Scheduling is predictable — environments stop and start on a fixed calendar. Auto-stop targets environments that _should_be in use but aren't. An environment that hasn't seen a deployment in 10 days, has zero active connections, and generates no application logs — it's probably abandoned, even if someone forgot to tell you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt;monitor CloudTrail for ECS service updates (deployments) and CloudWatch Logs for application activity. If an environment has zero deploy events and zero log activity for a configurable threshold — say 6 consecutive days — automatically set its ECS service desired counts to 0. Send a Slack notification: “use1-dev-experiment stopped — idle 6 days. One-click restart here.”&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; The organizational question is harder than the technical implementation: who decides what “idle” means? 3 days? 7 days? 14 days? Define the policy with your team leads, document it, and give developers a 24-hour warning before auto-stop kicks in. The technical part is a Lambda function. The organizational part is a Slack thread.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Best practice: start conservative. 14-day idle threshold, 48-hour warning. Measure how many environments get auto-stopped and how many get immediately restarted. Tighten the threshold over time as the team builds trust in the process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method 5: Kill orphaned environments
&lt;/h2&gt;

&lt;p&gt;While auto-stop handles the recently-idle, this method handles the permanently-abandoned. Every team that's been running ECS for more than a year has environments that nobody claims. They were spun up for a migration, a hackathon, a departed engineer's experiment. Nobody deploys to them. Nobody knows who owns them. They just bill — quietly, every month.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Most teams we work with find 5–15% of their environments are completely abandoned — no deploys in 6+ months, no identifiable owner, no access logs. Three orphaned environments at $170/month each = $6,120/year of compute serving zero requests.”&lt;/p&gt;

&lt;p&gt;— Fortem fleet audit of 100+ ECS environments across 12 teams, 2026&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Audit approach:&lt;/strong&gt; pull the last deployment timestamp per environment. Cross-reference with the team directory (who owns what?). Environments with no deploy in 30+ days and no active team owner go on a review list. The platform team reviews the list, confirms abandonment, and deletes the infrastructure.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; Finding orphaned environments is a one-time audit that costs $0 and takes an afternoon. The savings compound every month. For a team with 50+ environments, the most common outcome is 2–5 orphans worth $500–$2,000/month. That's $6,000–$24,000/year — from a one-time afternoon of work.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Comparing the 5 methods
&lt;/h2&gt;

&lt;p&gt;Stack these in order. Start with the highest-impact, lowest-effort method and work down. Don't try to implement all five at once — that's how cost optimization projects die in committee. Do method 1 this week. Method 2 next week. See the savings compound.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;th&gt;Effort&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Scheduling&lt;/td&gt;
&lt;td&gt;60–70%&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Dev/staging envs stop outside business hours (50 hrs/wk instead of 168). Zero Terraform changes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Right-sizing&lt;/td&gt;
&lt;td&gt;10–30%&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Drop task CPU/memory to p95 + 50% headroom. One-time TF change per service.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Fargate Spot&lt;/td&gt;
&lt;td&gt;Up to 70%&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Switch capacity provider to FARGATE_SPOT. 2-min interruption notice from AWS.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Auto-stop idle&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Stop any env not deployed to or accessed in 6+ days. CloudTrail + Lambda.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Kill orphans&lt;/td&gt;
&lt;td&gt;$500–2,000/mo&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Find envs with no owner and no deploys in 30+ days. Delete them.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;$5,765/mo· $69,180/yr&lt;/p&gt;

&lt;p&gt;Combined impact on a $10,000/mo fleet: RI (−$2,800) + Scheduling (−$4,900) + Right-sizing (−$1,050 on remaining) + Spot on eligible dev envs (−$815). Total: $10,000 → $4,235/mo. 57% reduction without touching a single Reserved Instance.&lt;/p&gt;

&lt;p&gt;The specific numbers depend on your fleet composition. A team with 80% non-prod compute will see scheduling dominate. A team where everything runs at steady utilization will see right-sizing and Spot carry the weight. The framework is the same regardless: reduce consumption first, then optimize the pricing model on what remains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Do I need to change my Terraform to implement any of this?
&lt;/h3&gt;

&lt;p&gt;Not for scheduling or auto-stop — those are operational concerns handled outside Terraform. Right-sizing requires updating task definition files. Spot requires changing capacity provider strategy in your ECS service definition. Killing orphans requires no Terraform changes. Most teams start with scheduling (zero Terraform impact) and right-sizing (the small Terraform change with the second-largest impact).&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens to databases when an environment is scheduled off?
&lt;/h3&gt;

&lt;p&gt;ECS scheduling stops and starts compute tasks — it does not touch RDS, ElastiCache, or any other stateful services. Your databases keep running and billing. If you want to stop databases too, you need separate scheduling for each service type. Most teams leave databases running 24/7 and only schedule compute — the cost difference is usually worth the operational simplicity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Fargate Spot safe for staging environments?
&lt;/h3&gt;

&lt;p&gt;It depends on what staging is used for. If staging runs automated tests and can tolerate a 2-minute interruption, Spot is fine. If staging hosts customer demos or is expected to be reliably available during business hours, use On-Demand for those specific services. The capacity provider strategy lets you split — 80% Spot / 20% On-Demand — so interruptions don't cause downtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I find which environments are idling?
&lt;/h3&gt;

&lt;p&gt;Pull the last task run timestamp from CloudWatch Logs Insights — any service with no log events in the last 14 days is a candidate. Cross-reference with your deployment records (last deploy date). Environments with no deploys in 30+ days and no active owner are safe to stop. Fortem surfaces last deploy time, last access time, and owner for every environment — turning a 2-hour audit into a 2-minute filter.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;### Stop optimizing the pricing model. Start optimizing what runs. Fortem automa&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;See your real cost:&lt;/strong&gt; &lt;a href="https://fortem.dev/ecs-cost-calculator" rel="noopener noreferrer"&gt;fortem.dev/ecs-cost-calculator&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cost</category>
      <category>optimization</category>
      <category>fargate</category>
    </item>
    <item>
      <title>Fortem vs Flightcontrol: ECS Fleet Management vs Single-App PaaS</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/dspv/fortem-vs-flightcontrol-ecs-fleet-management-vs-single-app-paas-1ga3</link>
      <guid>https://dev.to/dspv/fortem-vs-flightcontrol-ecs-fleet-management-vs-single-app-paas-1ga3</guid>
      <description>&lt;h1&gt;
  
  
  Which ECS Platform Should You Choose: Fortem or Flightcontrol?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/fortem-vs-flightcontrol" rel="noopener noreferrer"&gt;https://fortem.dev/blog/fortem-vs-flightcontrol&lt;/a&gt;&lt;br&gt;
Flightcontrol is the right tool for 1–3 apps on AWS. Here's exactly where it stops making sense — and where the pricing math breaks.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Versus&lt;/p&gt;

&lt;p&gt;Flightcontrol and Fortem solve different problems. One of them is the wrong tool for you — and figuring out which one depends on how many environments you run, not which features look better in a comparison table. This article explains what each product actually does, where the pricing math breaks, and how to tell which side of the line you're on.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Flightcontrol is a PaaS for deploying apps to your AWS account — excellent for 1–3 services.&lt;/li&gt;
&lt;li&gt;  Per-service pricing ($30/service on Business) breaks at 10+ environments.&lt;/li&gt;
&lt;li&gt;  Flightcontrol has no environment scheduling, no fleet visibility, no developer self-service.&lt;/li&gt;
&lt;li&gt;  Fortem is not a deployment tool — it's a fleet operations layer for teams already on ECS.&lt;/li&gt;
&lt;li&gt;  At 20 environments × 8 services, Flightcontrol Business costs $4,897/mo vs Fortem plan at $2,499/mo flat.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Flightcontrol actually is (and what it's not)
&lt;/h2&gt;

&lt;p&gt;Flightcontrol is a managed PaaS that deploys your applications to your own AWS account. ECS, Lambda, static sites, RDS — they handle the infrastructure so your team doesn't have to. The value proposition is simplicity: connect your GitHub repo, define your services, and Flightcontrol manages the AWS complexity.&lt;/p&gt;

&lt;p&gt;It works well for exactly what it was designed for. A startup with 2–3 applications and a small engineering team that doesn't want a dedicated platform engineer gets a lot of value from Flightcontrol. Fast setup, good support, and AWS without the AWS complexity.&lt;/p&gt;

&lt;p&gt;What it doesn't market itself as, and isn't designed for: managing a fleet of 20+ environments across multiple teams, scheduling dev environments to stop at night, giving developers self-service access to restart their own environment without Slack messages, or showing you fleet-wide cost and activity data in one screen.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pricing math at scale
&lt;/h2&gt;

&lt;p&gt;Flightcontrol charges per service, not per environment. This is the right model for small deployments — you pay for what you use. It becomes the wrong model at fleet scale.&lt;/p&gt;

&lt;p&gt;Flightcontrol pricing (verified May 2026):&lt;/p&gt;

&lt;p&gt;Starter: $97/mo, 5 services included + $20/service overage&lt;/p&gt;

&lt;p&gt;Business: $397/mo, 10 services included + $30/service overage&lt;/p&gt;

&lt;p&gt;The math at 20 environments × 8 services = 160 billable services:&lt;/p&gt;

&lt;p&gt;Starter: $97 + 155 × $20 = $3,197/mo&lt;/p&gt;

&lt;p&gt;Business: $397 + 150 × $30 = $4,897/mo&lt;/p&gt;

&lt;p&gt;Fortem plan: $2,499/mo flat for up to 80 environments&lt;/p&gt;

&lt;p&gt;Monthly cost — Flightcontrol Business vs Fortem&lt;/p&gt;

&lt;p&gt;$547&lt;/p&gt;

&lt;p&gt;$2,499&lt;/p&gt;

&lt;p&gt;5 envs&lt;/p&gt;

&lt;p&gt;$2,197&lt;/p&gt;

&lt;p&gt;$2,499&lt;/p&gt;

&lt;p&gt;10 envs&lt;/p&gt;

&lt;p&gt;$4,897&lt;/p&gt;

&lt;p&gt;$2,499&lt;/p&gt;

&lt;p&gt;20 envs&lt;/p&gt;

&lt;p&gt;$11,297&lt;/p&gt;

&lt;p&gt;$2,499&lt;/p&gt;

&lt;p&gt;50 envs&lt;/p&gt;

&lt;p&gt;Flightcontrol Business&lt;/p&gt;

&lt;p&gt;Fortem plan ($2,499 flat)&lt;/p&gt;

&lt;p&gt;Crossover: ~7 environments&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; At 7 environments × 8 services = 56 services, Flightcontrol Business ($397 + 46 × $30 = $1,777/mo) and Fortem plan ($2,499/mo) are roughly equivalent. Above that breakpoint, Fortem is cheaper. Below it, Flightcontrol is.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For more on managing the ECS cost side of this equation, see &lt;a href="https://dev.to/blog/ecs-fargate-cost-optimization/"&gt;How to Cut AWS ECS Fargate Costs by 65%&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Flightcontrol doesn't do (and says so)
&lt;/h2&gt;

&lt;p&gt;Flightcontrol's product is honest about its scope. The gaps below aren't criticisms — they're category differences.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No environment scheduling. Environments run 24/7. There's no built-in way to stop all services in a dev environment at 7pm and restart them at 9am. If you want this, you'd build it yourself with EventBridge.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No fleet-wide visibility. If you have 15 environments across 3 teams, there's no single screen showing you all of them — their status, last deploy, activity, cost attribution, who owns them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No developer self-service. When a developer's environment gets stuck, there's no “restart my environment” button. They file a ticket or message the platform team.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No environment cloning. Spinning up a new dev or QA environment from a known-good template requires manual work, not a UI action.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Flightcontrol is the right choice
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  ≤3 applications, ≤5 services each — you're well within the pricing model where it works in your favor.&lt;/li&gt;
&lt;li&gt;  Moving off Heroku or Railway to AWS — Flightcontrol is the path of least resistance. You get AWS's infrastructure without AWS's complexity overhead.&lt;/li&gt;
&lt;li&gt;  Team without a dedicated platform engineer — the managed deployment model saves real time and eliminates a category of infrastructure decisions.&lt;/li&gt;
&lt;li&gt;  You need deployment management more than fleet management — your primary problem is getting code to AWS, not operating environments once they're running.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this is you, use Flightcontrol. Their product is well-designed for this use case, their support is responsive, and the setup time is genuinely low.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Fortem is the right choice
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  10+ environments across your fleet — dev, staging, QA, demo, per-team, per-feature. At this scale, per-service pricing breaks and fleet visibility becomes operationally necessary.&lt;/li&gt;
&lt;li&gt;  Dev/staging environments running 24/7 with no after-hours demand — see &lt;a href="https://dev.to/blog/aws-dev-environment-cost/"&gt;the cost breakdown&lt;/a&gt; on what that actually costs before you decide it's too small to fix.&lt;/li&gt;
&lt;li&gt;  Developers filing tickets to restart their own environment — that friction compounds. Every restart request is 15–30 minutes of a platform engineer's time that should be self-service.&lt;/li&gt;
&lt;li&gt;  You're already running your own Terraform and don't want a tool that manages your infrastructure — Fortem connects to your existing ECS clusters without taking over provisioning.&lt;/li&gt;
&lt;li&gt;  You need fleet visibility: all environments, their status, cost, owners, and last deploys in one view — something you currently piece together from CloudWatch, Cost Explorer, and Slack threads.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; Fortem is not a deployment tool. It sits on top of your existing ECS infrastructure — your Terraform, your CI/CD pipelines, your task definitions. If you're evaluating whether to replace deployments, that's a different question.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Migration path from Flightcontrol to Fortem
&lt;/h2&gt;

&lt;p&gt;Two paths depending on what you want to change.&lt;/p&gt;

&lt;p&gt;Path 1 — Run both in parallel&lt;/p&gt;

&lt;p&gt;Flightcontrol continues handling deployments. Fortem connects to the same ECS clusters and handles fleet operations — scheduling, visibility, self-service restarts. No migration of deployment pipelines needed. This is the lower-risk path for teams that are happy with Flightcontrol's deployment experience but need fleet management on top of it.&lt;/p&gt;

&lt;p&gt;Path 2 — Full migration&lt;/p&gt;

&lt;p&gt;Move deployment pipelines from Flightcontrol to GitHub Actions or your existing CI/CD. Fortem handles fleet operations.&lt;/p&gt;

&lt;p&gt;Week 1: Connect Fortem to your clusters via IAM&lt;/p&gt;

&lt;p&gt;Weeks 2–4: Migrate deployment pipelines from Flightcontrol&lt;/p&gt;

&lt;p&gt;Weeks 4–6: Parallel operation before fully deprecating Flightcontrol&lt;/p&gt;

&lt;p&gt;Fortem onboarding doesn't require changes to task definitions, ECS cluster config, or Terraform. It reads your existing infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does Fortem replace Flightcontrol's deployment pipelines?
&lt;/h3&gt;

&lt;p&gt;No. Fortem is a fleet operations layer — it manages running environments (scheduling, visibility, self-service, cost tracking) but does not deploy code or manage build pipelines. Many teams run Flightcontrol for deployments and Fortem for fleet ops simultaneously. If you eventually want to replace Flightcontrol's deployment pipelines, that's a separate decision typically involving GitHub Actions or AWS CodePipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I connect Fortem to environments that Flightcontrol currently manages?
&lt;/h3&gt;

&lt;p&gt;Yes. Fortem connects to your ECS clusters directly via your AWS account. It doesn't care how those clusters were provisioned — Flightcontrol, Terraform, manual console setup. As long as the ECS cluster exists, Fortem can read and manage it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the pricing break-even between Flightcontrol Business and Fortem?
&lt;/h3&gt;

&lt;p&gt;At 8 services per environment, the break-even is around 7 environments. Below that, Flightcontrol Business ($397/mo base) is cheaper. Above it, Fortem plan ($2,499/mo flat for up to 80 environments) is cheaper. At 20 environments, Flightcontrol Business comes to $4,897/mo versus Fortem's $2,499/mo.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Fortem handle deployments — pushing new container images to ECS?
&lt;/h3&gt;

&lt;p&gt;No. Fortem manages environment state — starting, stopping, scheduling, fleet visibility — but image builds and deployments happen in your CI/CD pipeline (GitHub Actions, CircleCI, AWS CodePipeline). Fortem sees the results of deployments (service status, running task count, last deploy time) but doesn't initiate them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need to rewrite any infrastructure to add Fortem?
&lt;/h3&gt;

&lt;p&gt;No. Fortem reads your existing ECS cluster configuration and connects via your AWS account credentials. No changes to task definitions, Terraform, or CloudFormation. The typical onboarding is: grant IAM access, point Fortem at your clusters, set your schedules.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you read this, you might also want to know
&lt;/h2&gt;

&lt;p&gt;What if I have both 1-app side projects AND a 30-env production fleet?&lt;/p&gt;

&lt;p&gt;You might use both. Flightcontrol for the 1–3 app side projects where simplicity wins, Fortem for the main fleet where per-service pricing breaks. They don't compete — they address different scales. The cost question is whether the ops overhead of two tools is worth the savings.&lt;/p&gt;

&lt;p&gt;Can I run Fortem alongside my existing Terraform CI/CD?&lt;/p&gt;

&lt;p&gt;Yes. Fortem reads AWS resources after Terraform provisions them — no HCL parsing, no repo access, no state modifications. Your terraform apply still provisions infrastructure. Fortem adds the operations layer on top without touching your pipeline.&lt;/p&gt;

&lt;p&gt;How do I calculate whether per-environment or per-service pricing is cheaper for my fleet?&lt;/p&gt;

&lt;p&gt;Count your services, multiply by the overage rate, and compare to the flat rate. Example: 30 envs × 8 services = 240 services. On Flightcontrol Business: $397 + (240–10) × $30 = $7,297/mo. On the Fortem plan: $2,499/mo flat. The crossover is at roughly 7 services per environment.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;### Running 10+ ECS environments? Talk to a Fortem engineer. We'll go through yo&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;All ECS comparisons:&lt;/strong&gt; &lt;a href="https://fortem.dev/versus" rel="noopener noreferrer"&gt;fortem.dev/versus&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ecs</category>
      <category>fargate</category>
      <category>comparison</category>
    </item>
    <item>
      <title>AWS Copilot is Deprecated: Alternatives for ECS Fargate Teams</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:59:58 +0000</pubDate>
      <link>https://dev.to/dspv/aws-copilot-is-deprecated-alternatives-for-ecs-fargate-teams-4dfc</link>
      <guid>https://dev.to/dspv/aws-copilot-is-deprecated-alternatives-for-ecs-fargate-teams-4dfc</guid>
      <description>&lt;h1&gt;
  
  
  AWS Copilot is Deprecated: Alternatives for ECS Fargate Teams
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/fortem-vs-aws-copilot" rel="noopener noreferrer"&gt;https://fortem.dev/blog/fortem-vs-aws-copilot&lt;/a&gt;&lt;br&gt;
AWS Copilot CLI reaches end-of-support June 12, 2026. Your ECS services keep running — but here's what breaks, what to do next, and how to migrate.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Timely&lt;/p&gt;

&lt;p&gt;AWS Copilot CLI reaches end-of-support on June 12, 2026. If your team uses it to deploy ECS Fargate services, here's what that actually means for you — what breaks, what doesn't, and what the migration paths look like.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  AWS Copilot CLI is end-of-support June 12, 2026 — no security patches or updates after that date&lt;/li&gt;
&lt;li&gt;  Your existing ECS services keep running — Copilot provisions them but doesn't run them&lt;/li&gt;
&lt;li&gt;  Copilot was a deployment CLI, never a fleet management tool&lt;/li&gt;
&lt;li&gt;  Two paths forward: migrate to raw Terraform + CI/CD, or add Fortem for fleet operations&lt;/li&gt;
&lt;li&gt;  AWS is deprecating both Copilot (June 12) and Proton (Oct 7) — they're exiting managed ECS tooling&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What AWS Copilot was (and wasn't)
&lt;/h2&gt;

&lt;p&gt;AWS Copilot is an open-source CLI that simplified deploying containerized applications on ECS Fargate. You defined your workloads in a manifest file — Load Balanced Web Service, Backend Service, Scheduled Job — and Copilot generated the CloudFormation, set up the ECS cluster, configured the load balancer, and handled deployments. For a team that wanted to get on Fargate without writing raw CloudFormation, it was genuinely useful.&lt;/p&gt;

&lt;p&gt;What it didn't do: manage a fleet of environments. There was no way to see all your environments in one view, schedule dev environments to stop at night, give developers self-service access, track costs per environment, or clone an environment for QA. Copilot solved deployment. Fleet operations were always out of scope.&lt;/p&gt;

&lt;p&gt;It was also free — open source, no subscription, you paid only your AWS bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks on June 12, 2026
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No new CLI releases. No bug fixes, no feature updates. The binary you have is the last version.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No security patches. If a vulnerability is found in the Copilot CLI after June 12, it won't be patched.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No AWS support. Open a support ticket referencing Copilot and you'll be redirected to community forums.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copilot commands may drift. Copilot calls AWS APIs. As those APIs evolve, Copilot commands that worked today may fail without notice.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What doesn't break: your ECS services. Copilot creates ECS clusters, services, and task definitions — but those resources live in your AWS account. They'll keep running after Copilot is gone. The risk is operational: you lose the ability to redeploy, update, or troubleshoot using Copilot commands.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; The infrastructure Copilot provisioned belongs to your AWS account, not to AWS Copilot. Your services won't go down on June 13. What you lose is the workflow for managing them going forward.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  AWS is exiting managed ECS tooling
&lt;/h2&gt;

&lt;p&gt;Copilot isn't an isolated case. &lt;a href="https://dev.to/blog/aws-proton-deprecated/"&gt;AWS Proton deprecation (October 7, 2026)&lt;/a&gt; — a managed service for deploying multi-service applications — is also on the calendar. Two managed ECS developer tools, deprecated within months of each other.&lt;/p&gt;

&lt;p&gt;The pattern is consistent with how AWS thinks about their platform: invest in primitives (ECS, Fargate, CloudFormation, EventBridge) and let third-party tooling handle the developer experience layer. AWS isn't going to maintain a fleet management UI for you. That's not their business.&lt;/p&gt;

&lt;p&gt;For teams that relied on Copilot, this means the durable path is either raw infrastructure primitives (Terraform, CDK) or a third-party tool with a vendor committed to ECS Fargate long-term — not another AWS managed tool that might be deprecated in 18 months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration option 1 — Terraform + CI/CD
&lt;/h2&gt;

&lt;p&gt;The DIY path: take the CloudFormation that Copilot generated and convert it to Terraform modules. Replace Copilot deployments with GitHub Actions workflows or AWS CodePipeline.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What you gain: full control, no tool dependency, infrastructure you understand completely.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What you lose: the convenience layer Copilot provided — copilot deploy, copilot svc logs, copilot env init.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Right for:&lt;/strong&gt; teams with a strong Terraform culture, fewer than 10 environments, and a platform engineer who has time for the migration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rough timeline:&lt;/strong&gt; 2–4 weeks to migrate a typical 3–5 service app. Most of the time is spent understanding what Copilot generated and translating it into Terraform idioms.&lt;/p&gt;

&lt;p&gt;For a deeper guide on the Terraform path, see &lt;a href="https://dev.to/blog/ecs-fargate-terraform/"&gt;Managing ECS Fargate environments with Terraform&lt;/a&gt; (coming soon).&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration option 2 — Fortem
&lt;/h2&gt;

&lt;p&gt;Fortem connects to your existing ECS clusters — whether Copilot provisioned them or not. No rewrite of infrastructure required. You grant IAM access, Fortem reads your clusters, and you're operational in 7 business days.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Fortem adds that Copilot never had:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Environment scheduling— stop dev environments at 7pm, restart at 9am, cut idle compute costs 60–70%&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fleet visibility— all environments across accounts and regions in one view&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Developer self-service— restart services, view logs, flush state without AWS Console access&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Environment cloning— spin up QA copies from a known-good template&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What Fortem doesn't do:&lt;/strong&gt; Copilot's deployment workflow. Image builds and copilot deploy equivalents stay in GitHub Actions or CodePipeline. Fortem is the fleet operations layer, not the deployment pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Right for:&lt;/strong&gt; teams with 10+ environments that need fleet management, not just a deployment CLI replacement.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;AWS Copilot&lt;/th&gt;
&lt;th&gt;Fortem&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ECS deployment&lt;/td&gt;
&lt;td&gt;✓ via copilot deploy&lt;/td&gt;
&lt;td&gt;Via your CI/CD pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Environment scheduling&lt;/td&gt;
&lt;td&gt;✗ Not supported&lt;/td&gt;
&lt;td&gt;✓ Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fleet visibility&lt;/td&gt;
&lt;td&gt;✗ Not supported&lt;/td&gt;
&lt;td&gt;✓ All envs in one view&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer self-service&lt;/td&gt;
&lt;td&gt;✗ Not supported&lt;/td&gt;
&lt;td&gt;✓ Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Environment cloning&lt;/td&gt;
&lt;td&gt;✗ Not supported&lt;/td&gt;
&lt;td&gt;✓ Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost attribution per env&lt;/td&gt;
&lt;td&gt;✗ Not supported&lt;/td&gt;
&lt;td&gt;✓ Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool cost&lt;/td&gt;
&lt;td&gt;Free (open source)&lt;/td&gt;
&lt;td&gt;$799–$2,499/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maintenance status&lt;/td&gt;
&lt;td&gt;Deprecated June 12, 2026&lt;/td&gt;
&lt;td&gt;Actively maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What happens to my ECS services when Copilot support ends?
&lt;/h3&gt;

&lt;p&gt;Nothing, immediately. Your ECS clusters, services, and task definitions live in your AWS account — they don't depend on the Copilot binary to keep running. What you lose is the ability to use Copilot commands to redeploy, update, or debug going forward. The services run; the tooling to manage them is what becomes unsupported.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I continue using Copilot after June 12, 2026?
&lt;/h3&gt;

&lt;p&gt;Technically yes — the binary still works until AWS APIs it relies on change. But you'll be running unsupported software with no security patches and no bug fixes. For most teams, the risk grows over time rather than appearing on day one. Plan to migrate within 30–60 days of the EOL date.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I export my Copilot app configuration to Terraform?
&lt;/h3&gt;

&lt;p&gt;Copilot stores its generated CloudFormation in your AWS account. Run &lt;code&gt;aws cloudformation get-template --stack-name [your-copilot-stack]&lt;/code&gt; to retrieve the templates, then use a tool like &lt;code&gt;cf2tf&lt;/code&gt; to convert them to Terraform HCL. Expect manual cleanup — the generated Terraform won't be idiomatic but it will be functional.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Fortem replace what Copilot did for deployments?
&lt;/h3&gt;

&lt;p&gt;No. Fortem is a fleet operations layer — it manages running ECS environments (scheduling, visibility, self-service, cost tracking) but doesn't handle image builds or deployments. Replace Copilot's deployment workflow with GitHub Actions or AWS CodePipeline. Fortem handles what comes after: operating the environments those deployments run in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is AWS deprecating both Copilot and Proton within months of each other?
&lt;/h3&gt;

&lt;p&gt;AWS's long-term strategy focuses on infrastructure primitives — ECS, Fargate, CloudFormation, EventBridge — not on managed developer experience tooling. Both Copilot and Proton were attempts to build convenience layers on top of those primitives. The deprecations signal that AWS expects the developer experience layer to come from third-party tools, not from AWS itself.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;### Migrating off Copilot? We've helped teams move from Copilot to Terraform + F&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;All ECS comparisons:&lt;/strong&gt; &lt;a href="https://fortem.dev/versus" rel="noopener noreferrer"&gt;fortem.dev/versus&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>githubcopilot</category>
      <category>ecs</category>
      <category>fargate</category>
    </item>
    <item>
      <title>It's Friday at 6pm. Your Developer Can't Restart Staging Without You.</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:59:20 +0000</pubDate>
      <link>https://dev.to/dspv/its-friday-at-6pm-your-developer-cant-restart-staging-without-you-3g1m</link>
      <guid>https://dev.to/dspv/its-friday-at-6pm-your-developer-cant-restart-staging-without-you-3g1m</guid>
      <description>&lt;h1&gt;
  
  
  It's Friday at 6pm. Your Developer Can't Restart Staging Without You
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/ecs-staging-self-service" rel="noopener noreferrer"&gt;https://fortem.dev/blog/ecs-staging-self-service&lt;/a&gt;&lt;br&gt;
Platform engineers become the single point of failure for staging ops when developers have no safe, scoped way to act. Here's how to fix it with ECS environment RBAC.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The 6pm Slack message
&lt;/h2&gt;

&lt;p&gt;It's Friday, 6:47pm. You're at dinner. Your phone buzzes.&lt;/p&gt;

&lt;p&gt;J&lt;/p&gt;

&lt;p&gt;Jamie6:47 PM&lt;/p&gt;

&lt;p&gt;hey — staging is down, orders-api won't start. i have a smoke test to finish before the monday deploy. can you take a look?&lt;/p&gt;

&lt;p&gt;You open the AWS Console on your phone. Fargate console on mobile is a special kind of awful — tiny text, nested dropdowns, a task definition ARN you have to scroll sideways to read. You find the service, stop the broken task, wait for it to restart. The new task fails to start too. You check the CloudWatch logs. Missing environment variable. You update the task definition, force a new deployment. Fifteen minutes. By the time the service is healthy, dinner is cold and you've lost the conversation.&lt;/p&gt;

&lt;p&gt;Monday, Jamie finishes the smoke test in 20 minutes and the deploy goes out fine.&lt;/p&gt;

&lt;p&gt;Jamie didn't need you to debug a config issue. Jamie needed to restart a service. The entire incident was a permission problem — and it happens on most teams with 10+ environments, at least twice a week.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this keeps happening
&lt;/h2&gt;

&lt;p&gt;AWS IAM doesn't have environment-scoped permissions for ECS. You can grant someone ecs:UpdateService— but that's access to every ECS service in the account, including production. You can try to scope it by resource ARN, but when your environments have 15 services each, maintaining those policies manually becomes its own full-time job.&lt;/p&gt;

&lt;p&gt;So most platform engineers made the only rational decision available to them: they kept the keys and became the gatekeeper. Developers file a Slack request, platform engineer handles it, developers wait.&lt;/p&gt;

&lt;p&gt;The platform engineer didn't choose to be a deployment gatekeeper. They became one because the alternative — handing over AWS Console access — was genuinely risky. The right answer is a permission layer that doesn't exist in native AWS.&lt;/p&gt;

&lt;p&gt;The cost is invisible because it's spread across the week in small increments. A Slack ping here, a 15-minute console task there. But count the interruptions in a month: 3–8 per week for a mid-sized team. Each one breaks a flow state that takes 20 minutes to rebuild. Each Friday or weekend message is unpaid on-call work for a non-incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5 ops that cause 80% of interruptions
&lt;/h2&gt;

&lt;p&gt;Most platform engineers, when they audit their staging-ops interruptions, find the same five actions accounting for nearly all of them:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;1.&lt;/p&gt;

&lt;p&gt;Restart a crashed or stuck service. A task died, maybe due to a failed health check or OOM. The developer knows it — they just can't restart it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;2.&lt;/p&gt;

&lt;p&gt;Redeploy the latest image. A new build was pushed to ECR. The developer wants to pick it up in staging without waiting for the next CI run to trigger a deployment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;3.&lt;/p&gt;

&lt;p&gt;Read logs. The service is behaving strangely. The developer needs to tail CloudWatch — not navigate five levels of AWS console to get there.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;4.&lt;/p&gt;

&lt;p&gt;Flush a Redis cache. Bad data got written. A key needs to be cleared so the service reads fresh state. One operation, one line of code if they had access.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;5.&lt;/p&gt;

&lt;p&gt;Run a one-off task. A database migration, a data backfill, a cleanup script. Not a deployment — a single-run task against staging data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of these require infrastructure knowledge. None of them should require a platform engineer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why raw AWS Console access is the wrong answer
&lt;/h2&gt;

&lt;p&gt;The obvious first instinct is: just give them limited AWS access. Create a developer IAM role with read and restart permissions.&lt;/p&gt;

&lt;p&gt;In practice, this goes wrong in predictable ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  IAM doesn't scope ECS permissions by environment — it scopes by account, region, and ARN. A policy that allows restarting staging services also allows restarting production services in the same account.&lt;/li&gt;
&lt;li&gt;  ARN-scoped policies break every time a service is renamed, a new environment is added, or an account is restructured. Someone has to maintain them.&lt;/li&gt;
&lt;li&gt;  AWS Console access gives visibility into things developers shouldn't see: secret ARNs, network config, IAM role names. Not a security catastrophe, but not ideal.&lt;/li&gt;
&lt;li&gt;  There's no audit trail per action. CloudTrail tells you which IAM user ran which API call — but not why, from what context, or what the environment state was before and after.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right answer isn't broader AWS access — it's a permission layer that understands environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Fortem solves it
&lt;/h2&gt;

&lt;p&gt;Fortem's self-service layer gives each developer a scoped view of environments they own. You assign ownership in the dashboard — takes about 15 minutes for a typical team. From that point, developers log in via SSO and see only their environments.&lt;/p&gt;

&lt;p&gt;Within their assigned environments, here's exactly what they can and cannot do:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Can do?&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Restart a service&lt;/td&gt;
&lt;td&gt;✓ Yes&lt;/td&gt;
&lt;td&gt;Scoped to assigned environments only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redeploy to latest image&lt;/td&gt;
&lt;td&gt;✓ Yes&lt;/td&gt;
&lt;td&gt;Uses the image already in the task definition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;View / tail CloudWatch logs&lt;/td&gt;
&lt;td&gt;✓ Yes&lt;/td&gt;
&lt;td&gt;Real-time and historical&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flush Redis keys (pattern-matched)&lt;/td&gt;
&lt;td&gt;✓ Yes&lt;/td&gt;
&lt;td&gt;Pattern input required — no wildcard delete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run one-off ECS tasks&lt;/td&gt;
&lt;td&gt;✓ Yes&lt;/td&gt;
&lt;td&gt;From pre-approved task definitions only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pause / resume environment schedule&lt;/td&gt;
&lt;td&gt;✓ Yes&lt;/td&gt;
&lt;td&gt;Operator permission required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Touch any production resource&lt;/td&gt;
&lt;td&gt;✗ No&lt;/td&gt;
&lt;td&gt;Prod is a separate environment class; disabled by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Access AWS credentials or secrets&lt;/td&gt;
&lt;td&gt;✗ No&lt;/td&gt;
&lt;td&gt;Fortem never exposes secrets to the UI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modify task definitions or IAM roles&lt;/td&gt;
&lt;td&gt;✗ No&lt;/td&gt;
&lt;td&gt;Infrastructure config is platform-team only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;See environments not assigned to them&lt;/td&gt;
&lt;td&gt;✗ No&lt;/td&gt;
&lt;td&gt;Environment scope is enforced server-side&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key point is the last four rows. Production is off. AWS credentials are off. Infrastructure config is off. The scope boundary is enforced server-side — it's not just UI hiding.&lt;/p&gt;

&lt;p&gt;No IAM changes required on your end. Fortem uses a cross-account role with the minimum permissions needed to perform ECS operations. Your developers authenticate via SSO — they never interact with AWS directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before and after
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Staging service crashes Friday at 6pm&lt;/td&gt;
&lt;td&gt;Developer Slacks platform team. Waits 2–14 hrs for someone to restart it.&lt;/td&gt;
&lt;td&gt;Developer clicks Restart in Fortem. Service is up in 40 seconds.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New engineer needs to read staging logs&lt;/td&gt;
&lt;td&gt;IAM ticket to security team. 1–3 business days. Maybe AWS Console access.&lt;/td&gt;
&lt;td&gt;Platform engineer assigns log-viewer role in Fortem. Done in 2 minutes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QA needs to flush Redis cache to test a bug&lt;/td&gt;
&lt;td&gt;Blocked. Can't flush Redis without console access. Creates a ticket.&lt;/td&gt;
&lt;td&gt;QA flushes specific key pattern in Fortem without touching AWS.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer wants to redeploy their branch to staging&lt;/td&gt;
&lt;td&gt;Asks platform engineer. Gets queued. Usually done same day, sometimes tomorrow.&lt;/td&gt;
&lt;td&gt;Developer triggers redeploy from Fortem. 3 clicks.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SOC 2 auditor asks who restarted staging last Tuesday&lt;/td&gt;
&lt;td&gt;CloudTrail search, cross-reference IAM user, 2 hours of work.&lt;/td&gt;
&lt;td&gt;Filter by environment and date in Fortem audit log. 30 seconds.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most common feedback from platform engineers after turning on self-service: the first week felt like they gave something up. The second week, they realized the thing they gave up was being woken up on Friday night.&lt;/p&gt;

&lt;h2&gt;
  
  
  What gets logged
&lt;/h2&gt;

&lt;p&gt;Every action through Fortem creates an audit entry: who, what environment, what action, what time, what the service state was before and after.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit log — staging / orders-api&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fri May 17 18:52  jamie@acme.co         Service restart         staging   HEALTHY after 38s
Fri May 17 16:30  sam@acme.co           Redeploy (latest)       qa-eu     Deployed sha:a3f2b1
Thu May 16 09:14  kai@acme.co           Redis flush             staging   Pattern: session:\* — 4 keys deleted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When your SOC 2 auditor asks who restarted staging last Tuesday, this is the answer — filtered and exported in under 30 seconds. You don't need to cross-reference CloudTrail against IAM users against a timezone conversion.&lt;/p&gt;

&lt;p&gt;Audit retention is configurable: 90 days on the Fortem plan, 365 days on Enterprise.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Map your fleet in 5 min:&lt;/strong&gt; &lt;a href="https://fortem.dev/ai-onboarding" rel="noopener noreferrer"&gt;fortem.dev/ai-onboarding&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ecs</category>
      <category>fargate</category>
      <category>selfservice</category>
    </item>
    <item>
      <title>ECS Multi-Environment Strategy: What Breaks at 10 That Worked Fine at 3</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:59:17 +0000</pubDate>
      <link>https://dev.to/dspv/ecs-multi-environment-strategy-what-breaks-at-10-that-worked-fine-at-3-2e4f</link>
      <guid>https://dev.to/dspv/ecs-multi-environment-strategy-what-breaks-at-10-that-worked-fine-at-3-2e4f</guid>
      <description>&lt;h1&gt;
  
  
  What Breaks When You Scale Past 10 ECS Environments?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/ecs-multi-environment-strategy" rel="noopener noreferrer"&gt;https://fortem.dev/blog/ecs-multi-environment-strategy&lt;/a&gt;&lt;br&gt;
Naming conventions, cluster structure, and the five AWS limits that surface when environments scale past 10. Written by platform engineers running 100+ ECS environments.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Three ECS environments are manageable with AWS-native tooling and reasonable discipline. Ten environments expose every naming shortcut, every IAM approximation, and every missing inventory tool. This guide covers what actually changes — and what to get right before you hit the wall.&lt;/p&gt;

&lt;h2&gt;
  
  
  The overhead nobody puts in the spreadsheet
&lt;/h2&gt;

&lt;p&gt;When engineers estimate ECS environment costs, they calculate compute: vCPU hours, memory hours, maybe RDS. What they miss is the fixed overhead that exists before a single container runs.&lt;/p&gt;

&lt;p&gt;Every environment needs its own ALB, NAT Gateway (ideally in each AZ for HA), and CloudWatch log groups. These costs are flat — they don't scale with usage, they don't go away when you stop tasks at night, and they don't appear on the compute line in Cost Explorer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ResourceMonthly costNotes — Application Load Balancer:&lt;/strong&gt; $22/mo$0.0225/hr base + $0.008/LCU-hr&lt;/p&gt;

&lt;p&gt;NAT Gateway (2 AZs)~$66/mo$0.045/hr × 2 AZs + $0.045/GB data&lt;/p&gt;

&lt;p&gt;CloudWatch log retention$3–15/moDepends on log volume + retention days&lt;/p&gt;

&lt;p&gt;SSM parameters, ECR storage$1–5/moUsually negligible, adds up at scale&lt;/p&gt;

&lt;p&gt;Total fixed overhead$85–100/moBefore first task runs&lt;/p&gt;

&lt;p&gt;At 3 environments, that's ~$300/month in overhead — noticeable but manageable. At 10 environments it's &lt;strong&gt;$850–1,000/month&lt;/strong&gt;before a single task runs. At 20 environments it's a $1,700–2,000/month line item that doesn't appear anywhere obvious.&lt;/p&gt;

&lt;p&gt;What you can actually do about it&lt;/p&gt;

&lt;p&gt;Share the ALB across non-prod environments using host-based routing rules (one ALB, multiple environments via different hostnames). This eliminates per-environment ALB cost for dev/staging. NAT Gateway is harder to share cleanly — teams that care about NAT cost switch non-prod environments to public subnet placement with no NAT. Slightly less secure, meaningfully cheaper. Prod always gets its own ALB and NAT.&lt;/p&gt;

&lt;h2&gt;
  
  
  Naming: the one convention that rules everything
&lt;/h2&gt;

&lt;p&gt;At 3 environments you can get away with ad-hoc names. At 10 you can't — because every AWS resource name is a billing dimension, an IAM scope, and a CloudWatch filter. Inconsistent names mean you can't attribute cost, can't write scoped IAM policies, and can't build dashboards without a lookup table.&lt;/p&gt;

&lt;p&gt;The convention that works at fleet scale encodes three things in every resource name: region, account (or account group), and environment name. In this order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;{&lt;span class="n"&gt;region_short&lt;/span&gt;}-{&lt;span class="n"&gt;account&lt;/span&gt;}-{&lt;span class="n"&gt;envname&lt;/span&gt;}

&lt;span class="c"&gt;# Examples
&lt;/span&gt;&lt;span class="n"&gt;use1&lt;/span&gt;-&lt;span class="n"&gt;prod&lt;/span&gt;-&lt;span class="n"&gt;main&lt;/span&gt;         &lt;span class="c"&gt;# us-east-1, prod account, primary production env
&lt;/span&gt;&lt;span class="n"&gt;use1&lt;/span&gt;-&lt;span class="n"&gt;prod&lt;/span&gt;-&lt;span class="n"&gt;stg1&lt;/span&gt;         &lt;span class="c"&gt;# us-east-1, prod account, staging env
&lt;/span&gt;&lt;span class="n"&gt;usw2&lt;/span&gt;-&lt;span class="n"&gt;dev&lt;/span&gt;-&lt;span class="n"&gt;dev1&lt;/span&gt;          &lt;span class="c"&gt;# us-west-2, dev account, first dev env
&lt;/span&gt;&lt;span class="n"&gt;usw2&lt;/span&gt;-&lt;span class="n"&gt;dev&lt;/span&gt;-&lt;span class="n"&gt;qa1&lt;/span&gt;           &lt;span class="c"&gt;# us-west-2, dev account, QA env
&lt;/span&gt;&lt;span class="n"&gt;usw2&lt;/span&gt;-&lt;span class="n"&gt;dev&lt;/span&gt;-&lt;span class="n"&gt;demo&lt;/span&gt;          &lt;span class="c"&gt;# us-west-2, dev account, demo env
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prefix becomes the &lt;strong&gt;root of every resource name&lt;/strong&gt; in that environment. One Terraform local generates everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;# e.g. "use1-prod-main" or "usw2-dev-qa1"&lt;/span&gt;
  &lt;span class="nx"&gt;env_prefix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region_short&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;envname&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ECS cluster&lt;/span&gt;
&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_cluster"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;
  &lt;span class="c1"&gt;# → "use1-prod-main"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ECS service (env already in cluster name — service is just the component)&lt;/span&gt;
&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_service"&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt;
  &lt;span class="nx"&gt;cluster&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ecs_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Task definition family (global per account — must carry full prefix)&lt;/span&gt;
&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_task_definition"&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;family&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-api-td"&lt;/span&gt;
  &lt;span class="c1"&gt;# → "use1-prod-main-api-td"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# SSM paths (hierarchy enables per-service IAM scoping)&lt;/span&gt;
&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ssm_parameter"&lt;/span&gt; &lt;span class="s2"&gt;"db_host"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/api/DB_HOST"&lt;/span&gt;
  &lt;span class="c1"&gt;# → "/use1-prod-main/api/DB_HOST"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# IAM roles (global per account — carry full prefix)&lt;/span&gt;
&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"task_role"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-api-task-role"&lt;/span&gt;
  &lt;span class="c1"&gt;# → "use1-prod-main-api-task-role"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# CloudWatch log group&lt;/span&gt;
&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_cloudwatch_log_group"&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/ecs/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-api"&lt;/span&gt;
  &lt;span class="nx"&gt;retention_in_days&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log_retention_days&lt;/span&gt;
  &lt;span class="c1"&gt;# → "/ecs/use1-prod-main-api"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why SSM paths matter specifically: the hierarchy &lt;code&gt;/use1-prod-main/api/*&lt;/code&gt; lets you write a single IAM policy statement that gives the API task access to exactly its own secrets — nothing else:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ssm:GetParameter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ssm:GetParameters"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:ssm:us-east-1:123456789012:parameter/use1-prod-main/api/*"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Flat SSM names (&lt;code&gt;USE1-PROD-MAIN-API-DB_HOST&lt;/code&gt;) lose this entirely. You end up with a wildcard &lt;code&gt;Resource: "*"&lt;/code&gt; or a list of 40 individual parameter ARNs. One team's migration from flat to hierarchical SSM naming took two weeks and three deployment freezes.&lt;/p&gt;

&lt;p&gt;ResourcePatternExample&lt;/p&gt;

&lt;p&gt;ECS Cluster{env_prefix}use1-prod-main&lt;/p&gt;

&lt;p&gt;ECS Service{service}api (inside cluster)&lt;/p&gt;

&lt;p&gt;Task Def Family{env_prefix}-{service}-tduse1-prod-main-api-td&lt;/p&gt;

&lt;p&gt;SSM Path/{env_prefix}/{service}/{PARAM}/use1-prod-main/api/DB_HOST&lt;/p&gt;

&lt;p&gt;IAM Task Role{env_prefix}-{service}-task-roleuse1-prod-main-api-task-role&lt;/p&gt;

&lt;p&gt;Log Group/ecs/{env_prefix}-{service}/ecs/use1-prod-main-api&lt;/p&gt;

&lt;p&gt;Target Group{service}-{envname}-tgapi-main-tg ⚠ 32 chars&lt;/p&gt;

&lt;p&gt;Service Connect NS{envname}.localmain.local&lt;/p&gt;

&lt;p&gt;The 32-character ALB target group limit&lt;/p&gt;

&lt;p&gt;This is the hardest constraint in the naming stack. A target group named &lt;code&gt;use1-prod-main-payments-api-tg&lt;/code&gt; is 30 characters — just inside the limit. Add a longer service name and you blow it. The fix: drop the region and account from target group names (they're already implied by the ALB, which lives in one region and one account), and use only envname + service + tg. Plan your abbreviation table before your first service, not after your fifteenth.&lt;/p&gt;

&lt;p&gt;Enforce naming in Terraform with a variable validation block — reject envnames that don't match your pattern before any resource gets created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"envname"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;validation&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;can&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"^[a-z][a-z0-9]{1,7}&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;envname&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nx"&gt;error_message&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"envname must be 2–8 lowercase alphanumeric chars (e.g. main, dev1, qa2)"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cluster structure at 10+ environments
&lt;/h2&gt;

&lt;p&gt;With the &lt;code&gt;{region}-{account}-{envname}&lt;/code&gt; scheme, the cluster structure decision is already mostly made: each envname gets its own ECS cluster. The cluster name &lt;em&gt;is&lt;/em&gt; the environment identifier. Everything else in that environment — services, task definitions, log groups, IAM roles — inherits from it.&lt;/p&gt;

&lt;p&gt;The practical question is how to organize these clusters across AWS accounts:&lt;/p&gt;

&lt;p&gt;One AWS account per environment groupRecommended&lt;/p&gt;

&lt;p&gt;Prod environments in one account, all non-prod in another. This is the most common pattern at 30–200 person companies. It keeps prod IAM boundaries hard, separates Fargate vCPU quota pools, and makes Cost Explorer attribution clean.&lt;/p&gt;

&lt;p&gt;use1-prod-mainuse1-prod-stg1usw2-prod-main← prod account&lt;/p&gt;

&lt;p&gt;usw2-dev-dev1usw2-dev-qa1usw2-dev-demousw2-dev-data1← dev account&lt;/p&gt;

&lt;p&gt;Single account, all environments&lt;/p&gt;

&lt;p&gt;All clusters in one AWS account. Simpler to start, but Fargate quota is shared — a dev load test can exhaust the regional quota and prevent prod from scaling. Works fine at 3 environments; becomes a risk at 10+.&lt;/p&gt;

&lt;p&gt;use1-prod-mainuse1-dev-dev1use1-dev-qa1use1-dev-stg1← single account&lt;/p&gt;

&lt;p&gt;ECS clusters are free. The cost of having more clusters is management overhead, not AWS billing. At 10+ environments that overhead is real — which is the case for using tooling that treats the environment as the unit of management, not individual services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five problems that appear at 10 environments
&lt;/h2&gt;

&lt;p&gt;These don't show up at 3 environments. They all show up, roughly simultaneously, somewhere between environment 8 and environment 12.&lt;/p&gt;

&lt;p&gt;01&lt;/p&gt;

&lt;p&gt;Fargate quota exhaustion in prodquota&lt;/p&gt;

&lt;p&gt;Fargate vCPU quota is per-region, per-account. Dev and prod share the same pool if they share an account. A developer running load tests against a dev environment can exhaust the regional Fargate quota and prevent production from scaling up during a traffic spike. AWS has no native mechanism to reserve quota for production — the only fix is account separation.&lt;/p&gt;

&lt;p&gt;02&lt;/p&gt;

&lt;p&gt;ENI exhaustion before compute limitsnetworking&lt;/p&gt;

&lt;p&gt;Every Fargate task in awsvpc mode (the only Fargate mode) gets its own ENI. A fleet of 10 environments × 8 services × 2 tasks each = 160 ENIs. Default regional ENI limits can become a hard ceiling before you hit any compute limit. File a support ticket to raise the limit before you need it — AWS processes these routinely but not instantly.&lt;/p&gt;

&lt;p&gt;03&lt;/p&gt;

&lt;p&gt;IAM role proliferationIAM&lt;/p&gt;

&lt;p&gt;The correct pattern — one task execution role + one task role per service per environment — generates 2 × N services × M environments IAM roles. At 10 services and 4 environments that's 80 IAM roles. The temptation is to share roles across environments to reduce the number. Don't. Sharing means a misconfigured dev task can access prod secrets. Generate roles programmatically from your Terraform module; the number stops being a problem when you stop counting them manually.&lt;/p&gt;

&lt;p&gt;04&lt;/p&gt;

&lt;p&gt;Cloud Map namespace limitservice discovery&lt;/p&gt;

&lt;p&gt;AWS Cloud Map limits a single namespace to 100 ECS services. If you use ECS Service Connect and point multiple clusters at the same namespace (e.g., prod.local), you'll hit this ceiling sooner than expected. At 10 environments × 10 services = 100 services in one namespace — exactly at the limit. This is a hard limit and cannot be increased. Fix: per-cluster namespaces. Each envname gets its own: main.local, stg1.local, dev1.local.&lt;/p&gt;

&lt;p&gt;05&lt;/p&gt;

&lt;p&gt;ALB listener rule ceilingload balancing&lt;/p&gt;

&lt;p&gt;An ALB supports 100 listener rules per listener by default. If you share one ALB across non-prod environments using host-based routing (recommended for cost), you'll have roughly N environments × M services rules. At 8 environments × 12 services = 96 rules — right at the limit. The adjustable workaround (multiple listeners, multiple ALBs) adds cost and complexity. The simpler fix is dedicated listener rules per environment namespace rather than per service.&lt;/p&gt;

&lt;h2&gt;
  
  
  The environment inventory problem
&lt;/h2&gt;

&lt;p&gt;At 3 environments everyone knows what's running. At 10, someone asks "is anyone still using &lt;code&gt;usw2-dev-data1&lt;/code&gt;?" and nobody knows for certain.&lt;/p&gt;

&lt;p&gt;There is no AWS-native tool that shows you all environments, their owners, their running task counts, their last deployment time, and their monthly cost in one view. What teams actually do — and why each falls short:&lt;/p&gt;

&lt;p&gt;AWS Cost Explorer with tags✓Cost attribution if tagging is consistent✗No real-time status, no task counts, 24-hour lag on cost data&lt;/p&gt;

&lt;p&gt;ECS console, cluster by cluster✓Real-time task counts✗No cost, no ownership, no cross-account view&lt;/p&gt;

&lt;p&gt;Slack channel where people announce environments✓Ownership context✗Immediately out of date, no automation, ignored&lt;/p&gt;

&lt;p&gt;Spreadsheet / wiki page✓Good intentions✗Stale within a week, nobody updates it after incidents&lt;/p&gt;

&lt;p&gt;AWS's ECS Split Cost Allocation Data (launched 2023) partially closes the cost visibility gap — it attributes Fargate spend per task using &lt;code&gt;aws:ecs:clusterName&lt;/code&gt; and &lt;code&gt;aws:ecs:serviceName&lt;/code&gt; system tags as billing dimensions. This works well — but only if your cluster and service names are consistent. Which is why naming comes first.&lt;/p&gt;

&lt;p&gt;The real cost of invisible environments&lt;/p&gt;

&lt;p&gt;Orphaned environments — ones nobody is actively using but nobody has turned off — are the most expensive line in any ECS bill. At $85–100/month fixed overhead plus compute, a forgotten environment running 24/7 costs $200–400/month. Teams with 10+ environments typically have 1–3 orphaned environments at any given time. The inventory problem isn't just inconvenient — it's expensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scheduling at fleet scale
&lt;/h2&gt;

&lt;p&gt;Non-prod environments run 168 hours a week. Your team works ~55. Scheduling environments offline outside business hours cuts compute cost by &lt;strong&gt;60–70%&lt;/strong&gt;— for most teams it's the single largest ECS cost lever available.&lt;/p&gt;

&lt;p&gt;The problem: AWS-native scheduling operates at the service level. To schedule one environment with 8 services, you need 16 Auto Scaling actions (stop + start per service). At 10 environments that's 160 actions to create, maintain, and update when schedules change.&lt;/p&gt;

&lt;p&gt;EnvironmentsServices eachAuto Scaling actionsSchedule change cost&lt;/p&gt;

&lt;p&gt;38488 updates&lt;/p&gt;

&lt;p&gt;1081608–16 updates&lt;/p&gt;

&lt;p&gt;201040010–20 updates&lt;/p&gt;

&lt;p&gt;There are three additional problems that emerge specifically at fleet scale:&lt;/p&gt;

&lt;p&gt;✗&lt;/p&gt;

&lt;p&gt;Timezone complexity&lt;/p&gt;

&lt;p&gt;EU teams want environments down at 18:00 CET. US East wants 20:00 EST. US West wants 20:00 PST. Each requires separate cron expressions that account for DST. At 10+ environments with multiple team timezones, maintaining these expressions is a part-time job.&lt;/p&gt;

&lt;p&gt;✗&lt;/p&gt;

&lt;p&gt;No developer overrides&lt;/p&gt;

&lt;p&gt;A developer working late on a deadline wants to keep their environment up past the scheduled stop time. With AWS-native scheduling, that requires either platform engineer access or IAM permissions broad enough to be a security concern. The friction means developers stop requesting overrides — and start asking to remove scheduling entirely.&lt;/p&gt;

&lt;p&gt;✗&lt;/p&gt;

&lt;p&gt;Silent failed starts&lt;/p&gt;

&lt;p&gt;The scheduled start fires. Lambda runs. Desired count updates. But a service fails to start — image pull error, IAM issue, resource limit. The cron job succeeded; the environment didn't come up. AWS doesn't surface this. You need separate health checking or developers start their morning debugging an environment that's half-running.&lt;/p&gt;

&lt;p&gt;What teams actually end up doing&lt;/p&gt;

&lt;p&gt;Teams start with EventBridge + Lambda at 3–5 environments. By 10 environments they're maintaining a scheduling codebase. By 15–20 environments, the maintenance burden outweighs the savings — and environments quietly go back to running 24/7. The savings disappear not because scheduling doesn't work, but because the tooling to maintain it at scale doesn't exist in AWS natively.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;See what 10+ envs really cost:&lt;/strong&gt; &lt;a href="https://fortem.dev/ecs-cost-calculator" rel="noopener noreferrer"&gt;fortem.dev/ecs-cost-calculator&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ecs</category>
      <category>fargate</category>
      <category>platform</category>
    </item>
    <item>
      <title>Managing ECS Fargate with Terraform: What Works and What Doesn't</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:58:39 +0000</pubDate>
      <link>https://dev.to/dspv/managing-ecs-fargate-with-terraform-what-works-and-what-doesnt-4280</link>
      <guid>https://dev.to/dspv/managing-ecs-fargate-with-terraform-what-works-and-what-doesnt-4280</guid>
      <description>&lt;h1&gt;
  
  
  Managing ECS Fargate with Terraform: What Works and What Doesn't
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/ecs-fargate-terraform" rel="noopener noreferrer"&gt;https://fortem.dev/blog/ecs-fargate-terraform&lt;/a&gt;&lt;br&gt;
Terraform is the right tool for provisioning ECS Fargate infrastructure. But at 10+ environments, state sprawl and the operations gap catch every team. Here's what to build, what to buy, and the patterns that scale.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Guide&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Terraform is the correct tool for provisioning ECS Fargate infrastructure — this article won't try to replace it.&lt;/li&gt;
&lt;li&gt;  Module-per-environment works for ≤10 environments; past that, Terragrunt or a layered directory structure become necessary.&lt;/li&gt;
&lt;li&gt;  A consistent tagging strategy (Environment, ManagedBy, Product, ManagedWith, Component) solves cost attribution and makes automation possible at any scale.&lt;/li&gt;
&lt;li&gt;  At 50+ environments, you'll write 1,500+ lines of custom code for scheduling, cloning, and self-service — or you can accept that Terraform needs an operations partner.&lt;/li&gt;
&lt;li&gt;  Fortem reads your Terraform-provisioned resources and adds the ops layer: scheduling, cloning, fleet visibility, and developer self-service — without touching your HCL.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Terraform does well for ECS Fargate
&lt;/h2&gt;

&lt;p&gt;Terraform is the right tool for provisioning ECS Fargate infrastructure. It's declarative — you describe the desired state, and Terraform makes it happen. You get task definitions, ECS services, IAM roles, security groups, load balancers, and VPC configuration all in one place, versioned in git.&lt;/p&gt;

&lt;p&gt;What matters more than the HCL syntax is the workflow it enables. Infrastructure changes go through the same PR process as application code. Your CI pipeline runs terraform plan on every pull request. A senior engineer reviews the diff before merge. If something goes wrong, you roll back by applying the previous commit. This is the gold standard for infrastructure management, and nothing in this article suggests replacing it.&lt;/p&gt;

&lt;p&gt;Here's a realistic module definition for an ECS environment — the basic building block your team is probably using or something close to it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"dev_ecs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"./modules/ecs-environment"&lt;/span&gt;

  &lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;

  &lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.1.0.0/16"&lt;/span&gt;
  &lt;span class="nx"&gt;public_subnets&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.1.1.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.1.2.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;private_subnets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.1.10.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.1.11.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;services&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;cpu&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;
      &lt;span class="nx"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;
      &lt;span class="nx"&gt;image&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"123456789012.dkr.ecr.us-east-1.amazonaws.com/api:latest"&lt;/span&gt;
      &lt;span class="nx"&gt;port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;
      &lt;span class="nx"&gt;env_vars&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;LOG_LEVEL&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"debug"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;worker&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;cpu&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;
      &lt;span class="nx"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;
      &lt;span class="nx"&gt;image&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"123456789012.dkr.ecr.us-east-1.amazonaws.com/worker:latest"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;rds_instance_class&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"db.t3.micro"&lt;/span&gt;
  &lt;span class="nx"&gt;redis_node_type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cache.t3.micro"&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt;
    &lt;span class="nx"&gt;Team&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"backend"&lt;/span&gt;
    &lt;span class="nx"&gt;ManagedBy&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is clean, reviewable, and reproducible. One module call = one fully provisioned environment with networking, compute, and data stores. For a single environment or a handful, this is the right pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terraform patterns that actually scale
&lt;/h2&gt;

&lt;p&gt;Teams adopt one of three patterns as they grow. There's also a fourth — Terraform workspaces per environment — but the community has largely moved past it. Workspaces aren't true state isolation, the naming is fragile (apply to the wrong workspace and you provision dev where staging should be), and HashiCorp themselves recommend against using them for environment separation. We'll skip it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 1: Module per environment
&lt;/h3&gt;

&lt;p&gt;A separate directory for each environment, each calling the same shared module with different variables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform/
├── modules/
│   └── ecs-environment/     # shared module
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
├── dev/
│   └── main.tf             # module "dev_ecs" { ... }
├── staging/
│   └── main.tf             # module "staging_ecs" { ... }
├── qa/
│   └── main.tf
├── demo/
│   └── main.tf
└── prod/
    └── main.tf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;dead simple. Anyone on the team can open a directory and understand what's deployed. No hidden state, no Terraform workspace tricks. CI can run plan/apply independently per environment — you can deploy dev without touching staging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt; every new environment means copying a 15-line directory. At 30 environments, you have 30 almost-identical main.tf files. If you add a required variable to the shared module, you update 30 files. Teams outgrow this around 10–15 environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: Terragrunt + shared modules
&lt;/h3&gt;

&lt;p&gt;Terragrunt wraps Terraform, keeping configurations DRY while maintaining separate state per environment. Each environment directory contains only a terragrunt.hcl file with environment-specific values — the module source points to a shared Git ref.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# terragrunt.hcl in environments/dev/&lt;/span&gt;
&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"git::git@github.com:acme/terraform-modules.git
            //ecs-environment?ref=v2.3.0"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;inputs&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.1.0.0/16"&lt;/span&gt;
  &lt;span class="nx"&gt;services&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cpu&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;remote_state&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s3"&lt;/span&gt;
  &lt;span class="nx"&gt;config&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"acme-terraform-state"&lt;/span&gt;
    &lt;span class="nx"&gt;key&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ecs/dev/terraform.tfstate"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; explicit dependencies, multi-account-friendly, strong state isolation. Each environment has its own S3 state key — corruption stays contained. Pin modules to versioned Git tags for reproducible deploys.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt; another tool to learn and maintain. Your team now needs to understand both Terraform and Terragrunt. Debugging failures means tracing through two layers of indirection. Not worth it below 15 environments — the overhead outweighs the benefit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: Layered (accounts → regions → environments)
&lt;/h3&gt;

&lt;p&gt;The repo mirrors your cloud topology. Shared infrastructure lives at higher layers and cascades down. Each environment is a directory with subdirectories per resource type — datastores, ECS services, secrets — so a single environment change is a single terraform apply in one directory, not a full fleet-wide plan.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform/
├── deployment/
│   ├── accounts/
│   │   ├── dev/
│   │   │   ├── global/              # account-wide: IAM, S3, route53
│   │   │   └── regions/
│   │   │       ├── us-east-1/
│   │   │       │   ├── network/     # VPC, subnets, security groups
│   │   │       │   ├── shared/      # ECR, CloudTrail, ECS events
│   │   │       │   └── wenvs/       # environments
│   │   │       │       ├── api-dev/
│   │   │       │       │   ├── datastores/   # RDS, ElastiCache
│   │   │       │       │   ├── ecs/          # task defs, services
│   │   │       │       │   ├── secrets/      # Secrets Manager
│   │   │       │       │   └── services/     # SQS, SNS, Lambda
│   │   │       │       └── api-qa/
│   │   │       │           └── ...same layers
│   │   │       └── eu-west-2/
│   │   │           └── ...same structure
│   │   └── prod/
│   │       └── ...same structure
│   └── variables/
│       ├── accounts/{dev,prod}/     # per-account tfvars
│       └── global/                  # org-wide tfvars
└── lib/                             # shared Terraform modules
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;each layer owns its resources and nothing else. terraform apply runs against a single directory — a security group change doesn't trigger a plan across 60 environments. Adding a new environment copies a directory and overrides variables. The structure is self-documenting: anyone on the team can navigate the repo and understand the fleet topology without opening a diagram.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;the repo itself is the configuration mechanism — there's no single file that describes what exists. New team members need to learn the directory tree. Some duplication between nearly-identical environments unless you lean on shared variables and modules. Best for 20+ environments where operational benefit of isolated state outweighs the duplication cost.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Scale limit&lt;/th&gt;
&lt;th&gt;State isolation&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Module per env&lt;/td&gt;
&lt;td&gt;~10 envs&lt;/td&gt;
&lt;td&gt;Strong (per-directory)&lt;/td&gt;
&lt;td&gt;Getting started; small fleet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terragrunt&lt;/td&gt;
&lt;td&gt;15–50 envs&lt;/td&gt;
&lt;td&gt;Strong (per-env key)&lt;/td&gt;
&lt;td&gt;Multi-account; explicit deps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layered&lt;/td&gt;
&lt;td&gt;50+ envs&lt;/td&gt;
&lt;td&gt;Strong (per-layer, per-env)&lt;/td&gt;
&lt;td&gt;Fleet scale; multi-region&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workspaces&lt;/td&gt;
&lt;td&gt;~5 envs&lt;/td&gt;
&lt;td&gt;Weak (shared backend)&lt;/td&gt;
&lt;td&gt;Not recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; There's no universally correct pattern. A team of two managing 8 environments doesn't need Terragrunt. A team of eight managing 60 environments across three AWS accounts probably does. Pick the simplest structure your team can maintain at your current scale — you can refactor later when you need to.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The tagging strategy that makes everything easier
&lt;/h2&gt;

&lt;p&gt;Before scaling past 10 environments, the single highest-leverage thing you can do is standardize your tags. Tags feed AWS Cost Explorer, automation scripts, and every operations tool in the chain. If your tags are inconsistent, every downstream system that uses them produces wrong answers.&lt;/p&gt;

&lt;p&gt;The simplest way to enforce tags is through the Terraform provider itself — apply them once at the provider level and every resource inherits them automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"aws"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;

  &lt;span class="nx"&gt;default_tags&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt;
      &lt;span class="nx"&gt;ManagedBy&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"platform-team"&lt;/span&gt;
      &lt;span class="nx"&gt;Product&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"acme-saas"&lt;/span&gt;
      &lt;span class="nx"&gt;ManagedWith&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tags set here cascade to every resource — ECS services, RDS instances, ALBs, security groups. No per-resource duplication. Override individual resources only when a specific resource genuinely needs a different value.&lt;/p&gt;

&lt;p&gt;Here's the minimal set that pays for itself the first time you open a bill:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tag&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Environment&lt;/td&gt;
&lt;td&gt;dev, staging, qa, prod&lt;/td&gt;
&lt;td&gt;Cost grouping; scheduling policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ManagedBy&lt;/td&gt;
&lt;td&gt;platform-team, backend&lt;/td&gt;
&lt;td&gt;Who owns it; who to ping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Product&lt;/td&gt;
&lt;td&gt;acme-saas, acme-ml&lt;/td&gt;
&lt;td&gt;Bill attribution per product&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ManagedWith&lt;/td&gt;
&lt;td&gt;terraform, pulumi, cdk&lt;/td&gt;
&lt;td&gt;IaC tool; filters what to automate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Component&lt;/td&gt;
&lt;td&gt;ecs, rds, elasticache&lt;/td&gt;
&lt;td&gt;AWS service type; per-service filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With these tags, Cost Explorer can answer any question: spend per environment, per team, per product, per AWS service. Without them, you get one aggregate compute number and a spreadsheet nobody maintains.&lt;/p&gt;

&lt;p&gt;The naming convention matters too. A predictable pattern like {region}-{account}-{env} — e.g. use1-dev-qa1, usw2-prod-main — is both human-readable and machine-parseable. You can grep it in logs, script it in bash, and join it with billing data. The convention itself doesn't matter as much as the consistency: pick one and automate enforcement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terraform provisions.
&lt;/h2&gt;

&lt;p&gt;An operations layer manages.&lt;/p&gt;

&lt;p&gt;Terraform — provision&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  ECS services &amp;amp; task definitions&lt;/li&gt;
&lt;li&gt;  IAM roles &amp;amp; policies&lt;/li&gt;
&lt;li&gt;  VPC, subnets, security groups&lt;/li&gt;
&lt;li&gt;  ALB, target groups, listeners&lt;/li&gt;
&lt;li&gt;  RDS, ElastiCache, S3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;→&lt;/p&gt;

&lt;p&gt;adds&lt;/p&gt;

&lt;p&gt;Operations layer — manage&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Start/stop on a schedule&lt;/li&gt;
&lt;li&gt;  Clone to any region or account&lt;/li&gt;
&lt;li&gt;  One-screen fleet visibility&lt;/li&gt;
&lt;li&gt;  Developer self-service (RBAC)&lt;/li&gt;
&lt;li&gt;  Cost attribution per environment&lt;/li&gt;
&lt;li&gt;  AI diagnostics &amp;amp; anomaly detection&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where Terraform starts to break down at scale
&lt;/h2&gt;

&lt;p&gt;Around 15–20 environments, teams hit the same walls. Not because Terraform is bad — because it was designed for provisioning, not operations. The distinction matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  State sprawl
&lt;/h3&gt;

&lt;p&gt;An ECS environment with VPC, subnets, security groups, ALB, target groups, ECS services, task definitions, IAM roles, RDS, and ElastiCache clocks in at about 30 resources. At 50 environments, that's 1,500 resources in state. A terraform plan across the full fleet takes 4+ minutes. Partial applies become necessary, and state drifts out of sync with reality.&lt;/p&gt;

&lt;h3&gt;
  
  
  The operations gap
&lt;/h3&gt;

&lt;p&gt;Terraform provisions environments. It doesn't operate them. Every team eventually hits these six gaps and starts building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Start/stop environments on a schedule — Write your own Lambda + EventBridge + CloudWatch cron, per environment, per timezone. Maintain it. Debug it when the Lambda silently fails.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Clone an environment — Write a new module call, copy all variable values, remember which 3 things are different between the source and the clone. Hope you didn't miss an env var.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Developer self-service — Build a web UI, or accept that developers will open PRs to the infra repo for restarts. Either way, you're now maintaining application code that isn't your product.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost per environment — Tag everything consistently. Wait 24 hours for Cost Explorer to update. Export to CSV. Build a spreadsheet. Repeat monthly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Orphan detection — Write Cost Explorer queries, cross-reference with your Terraform state, and hope the tags on the orphaned resources are correct. They probably aren't — that's why the environment got orphaned.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; None of this is Terraform's fault. It's not what Terraform is for. The same way you wouldn't use Terraform to monitor application health or send Slack alerts, you shouldn't expect it to operate a fleet of running environments. You need a separate operations layer — built or bought.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What the operations layer needs to do
&lt;/h2&gt;

&lt;p&gt;If you're going to build the operations layer yourself — or evaluate something that provides it — here's the concrete list of what it needs to handle. This is the specification for the layer that sits above Terraform, reads the resources it provisions, and manages what happens after terraform apply finishes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Environment scheduling.&lt;/strong&gt; Start and stop environments on a configurable schedule — per environment, per timezone, per team. Dev environments run Mon–Fri 9am–7pm. QA runs Mon–Fri 8am–8pm. Production ignores the scheduler. The system must handle the edge cases: what happens when someone manually starts a scheduled-off environment on a Saturday — does it auto-stop after the override period?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Environment cloning.&lt;/strong&gt; Take any environment and create a copy in a different region or account, with variable overrides. Not a new Terraform module — a one-click operation that copies networking, compute, data stores, and external service config, then deploys. QA needs an isolated copy of EU production to test a compliance flow. That should be a 30-second operation, not a day of writing HCL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fleet visibility.&lt;/strong&gt;One screen showing every environment: status (running/scheduled/stopped), region, services count, current monthly cost, CI/CD pipeline state, and last activity timestamp. No AWS Console tab switching. No ssh-ing into a box to find out what's running there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Developer self-service.&lt;/strong&gt; Developers can restart their environments, redeploy services, and view logs — for environments they own. They cannot touch production. They cannot see secrets. They cannot change infrastructure. This requires RBAC scoped to the environment level, not the AWS account level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost attribution and savings tracking.&lt;/strong&gt;Cost per environment, cost per team, total fleet savings from scheduling. Not an estimate — actual numbers from AWS billing data, updated daily. When the CTO asks “what are we spending on staging this quarter?” you answer in under 30 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Fortem works with your existing Terraform
&lt;/h2&gt;

&lt;p&gt;Fortem is the operations layer described above. It reads the resources Terraform provisions — ECS services, task definitions, IAM roles, RDS instances — through AWS tags and naming conventions. No HCL parsing. No access to your Terraform repository. No state modifications.&lt;/p&gt;

&lt;p&gt;You run terraform apply. Fortem detects the new or changed resources, and the environment appears in the fleet view with its services, cost breakdown, and scheduling status. You didn't register anything — the tags your Terraform already applies are how Fortem discovers what exists.&lt;/p&gt;

&lt;p&gt;Scheduling is opt-in: add a tag like schedule = "business-hours" to an environment, and Fortem stops it outside working hours and starts it before the workday begins. Remove the tag, scheduling stops. Your Terraform state was never involved.&lt;/p&gt;

&lt;p&gt;Uninstall Fortem and everything keeps running. Your terraform apply still works. Your infrastructure was never dependent on the operations layer — it was just reading it. &lt;a href="https://dev.to/security/"&gt;Full IAM model on the security page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does Fortem modify my Terraform state?
&lt;/h3&gt;

&lt;p&gt;No. Fortem reads the resources Terraform provisions — it never writes to your state, pushes to your repo, or modifies HCL. Your infrastructure runs exactly the same whether Fortem is connected or not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I still use terraform apply to change infrastructure when using Fortem?
&lt;/h3&gt;

&lt;p&gt;Yes. terraform apply and terraform destroy work exactly as before. Fortem detects the changes and updates its view automatically. You don't need to notify Fortem of infrastructure changes — it picks them up via tags and naming conventions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Fortem need access to my Terraform repository?
&lt;/h3&gt;

&lt;p&gt;No. Fortem never touches your Terraform repo or state files. It connects to your AWS account through a cross-account IAM role and reads resources directly — the same resources your Terraform provisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens to Fortem if I destroy an environment with Terraform?
&lt;/h3&gt;

&lt;p&gt;Fortem detects the resources are gone and removes the environment from its dashboard. No stuck state, no sync errors. One environment disappearing doesn't affect anything else in the Fortem fleet view.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Fortem handle environments provisioned by Terragrunt or Pulumi instead of vanilla Terraform?
&lt;/h3&gt;

&lt;p&gt;The same way — through tags and naming conventions. Fortem doesn't care which tool provisioned the resources.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;### See what the operations layer looks like for your Terraform-provisioned flee&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;See your fleet cost:&lt;/strong&gt; &lt;a href="https://fortem.dev/ecs-cost-calculator" rel="noopener noreferrer"&gt;fortem.dev/ecs-cost-calculator&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>terraform</category>
      <category>ecs</category>
      <category>fargate</category>
    </item>
    <item>
      <title>How to Cut AWS ECS Fargate Costs by 60–70%</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:58:37 +0000</pubDate>
      <link>https://dev.to/dspv/how-to-cut-aws-ecs-fargate-costs-by-60-70-5ale</link>
      <guid>https://dev.to/dspv/how-to-cut-aws-ecs-fargate-costs-by-60-70-5ale</guid>
      <description>&lt;h1&gt;
  
  
  How to Cut AWS ECS Fargate Costs by 60–70%
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/ecs-fargate-cost-optimization" rel="noopener noreferrer"&gt;https://fortem.dev/blog/ecs-fargate-cost-optimization&lt;/a&gt;&lt;br&gt;
Your ECS Fargate dev and staging environments run 168 hours a week. Your team works 40. Here's the math on what that costs — and four methods to fix it.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Guide&lt;/p&gt;

&lt;p&gt;Your ECS Fargate bill is higher than it needs to be. The AWS documentation will tell you to buy Savings Plans and right-size your instances. That's not wrong — but it misses the biggest lever by a wide margin. This guide covers four methods, starting with the one that cuts 60–70% before you touch a single task definition.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Dev/staging environments run 168 hrs/week. Your team works ~40. You're paying for the other 128.&lt;/li&gt;
&lt;li&gt;  Scheduling environments to stop during off-hours cuts dev/staging spend by 60–70% — no infrastructure changes.&lt;/li&gt;
&lt;li&gt;  Right-sizing vCPU and memory adds another 10–20% on top.&lt;/li&gt;
&lt;li&gt;  Fargate Spot gives 40–70% discount on interruption-tolerant workloads.&lt;/li&gt;
&lt;li&gt;  Real example: 12 environments, $1,730/mo → $380/mo. 78% reduction. $16,200/yr saved.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where the money goes — Fargate pricing breakdown
&lt;/h2&gt;

&lt;p&gt;AWS Fargate charges for two resources per task: $0.04048 per vCPU-hour and $0.004445 per GB-hour (us-east-1, Linux/x86, on-demand), per the &lt;a href="https://aws.amazon.com/fargate/pricing/" rel="noopener noreferrer"&gt;AWS Fargate pricing page&lt;/a&gt; (verified May 2026).&lt;/p&gt;

&lt;p&gt;A single service running 0.5 vCPU and 1 GB costs:&lt;/p&gt;

&lt;p&gt;0.5 × $0.04048 + 1 × $0.004445 = $0.024685/hr&lt;/p&gt;

&lt;p&gt;× 730 hrs/month = $18.02/service/month&lt;/p&gt;

&lt;p&gt;× 8 services/environment = $144/environment/month&lt;/p&gt;

&lt;p&gt;× 12 environments = $1,730/month&lt;/p&gt;

&lt;p&gt;That's for a conservative fleet — 12 environments, 8 services each, half a vCPU per service. Most teams have more. The math compounds: it's not any single expensive environment causing the bill. It's 12 small ones, each billing quietly around the clock.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 24/7 problem — what you're actually paying for
&lt;/h2&gt;

&lt;p&gt;There are 168 hours in a week. A typical engineering team works 40–50 of them. The rest — nights, weekends, holidays — those 12 dev and staging environments are sitting idle, billing AWS by the second.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://info.flexera.com/cm-report-state-of-the-cloud" rel="noopener noreferrer"&gt;Flexera State of the Cloud 2025 report&lt;/a&gt; puts average cloud waste at 32% across organizations. For ECS Fargate development fleets, the number is higher — because dev environments are structurally different from production. Nobody's on-call for them at 3am, but they're running anyway.&lt;/p&gt;

&lt;p&gt;$1,730/mo&lt;/p&gt;

&lt;p&gt;$515/mo&lt;/p&gt;

&lt;p&gt;24/7 (always on)&lt;/p&gt;

&lt;p&gt;168 hrs/week&lt;/p&gt;

&lt;p&gt;Business hours only&lt;/p&gt;

&lt;p&gt;50 hrs/week · Mon–Fri 9am–7pm&lt;/p&gt;

&lt;p&gt;Monthly AWS Fargate cost — 12 environments−70% savings&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; The biggest Fargate cost driver for most teams isn't their largest environment. It's the 12 small ones running overnight and on weekends — each individually invisible, collectively expensive.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Business-hours scheduling (Mon–Fri 9am–7pm = 50 hrs/week) reduces active compute time to 50 ÷ 168 = 29.8% of the 24/7 baseline. On our 12-environment example: $1,730 → $515/month. Before touching a single task definition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method 1 — Environment scheduling (60–70% reduction)
&lt;/h2&gt;

&lt;p&gt;Scheduling means stopping all ECS services in an environment during off-hours and restarting them at the start of the workday. The environment is unavailable overnight and on weekends — which is fine for anything that isn't on-call.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Mon–Fri 9am–7pm = 50 hours/week = 29.8% of baseline cost. Weekend default: off. One-click override for ad-hoc work.”&lt;/p&gt;

&lt;p&gt;— Fortem scheduling model, per-environment, per-timezone&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Two implementation paths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;AWS EventBridge Scheduler— native, no extra cost. Write a Lambda or Step Functions rule per environment that sets each ECS service's desired count to 0 (stop) or N (start). Requires code per environment; gets tedious past 10–15 environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fortem — set a schedule per environment in the UI. Fortem stops and starts all services atomically, handles per-timezone configuration, and lets developers request one-click overrides for ad-hoc work without touching the schedule.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Per-timezone matters:&lt;/strong&gt;your EU team's workday starts 6 hours before your US team's. A single UTC schedule shuts down environments while one team is still working. Configure schedules per team, not globally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method 2 — Right-sizing vCPU and memory (10–20% additional)
&lt;/h2&gt;

&lt;p&gt;Most development services are over-provisioned. When a service was first deployed, someone picked a reasonable allocation — 1 vCPU, 2 GB — and never revisited it. In production, that allocation might be justified. In a dev environment processing one request per minute from a developer doing manual testing, it's paying for four times what's needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to check:&lt;/strong&gt; CloudWatch → ECS → your cluster → CPU and Memory Utilization per service. Look at the 7-day average. A service averaging under 30% CPU utilization on 1 vCPU can be safely dropped to 0.5 vCPU for dev. Under 15%: try 0.25 vCPU.&lt;/p&gt;

&lt;p&gt;Savings per environment: 1 vCPU → 0.5 vCPU, per service&lt;/p&gt;

&lt;p&gt;Before: 1 × $0.04048 × 730 hrs = $29.55/service/mo&lt;/p&gt;

&lt;p&gt;After: 0.5 × $0.04048 × 730 hrs = $14.78/service/mo&lt;/p&gt;

&lt;p&gt;8 services × $14.78 saved = $118 saved/environment/mo&lt;/p&gt;

&lt;p&gt;Apply right-sizing only to dev and staging. Keep separate task definition files for dev and prod so changes don't drift. Never right-size production without load testing under realistic traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method 3 — Fargate Spot for non-production (40–70% discount)
&lt;/h2&gt;

&lt;p&gt;Fargate Spot runs tasks on spare AWS capacity at roughly a 70% discount versus on-demand, per &lt;a href="https://aws.amazon.com/fargate/pricing/" rel="noopener noreferrer"&gt;AWS Fargate pricing&lt;/a&gt;. The tradeoff: AWS can interrupt your tasks with a 2-minute warning when that capacity is reclaimed.&lt;/p&gt;

&lt;p&gt;For many dev workloads, a 2-minute interruption is completely tolerable — especially combined with scheduling that already stops environments overnight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Right for Spot:&lt;/strong&gt; CI/CD test runners, batch jobs, dev environments for individual engineers, any workload that restarts cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong for Spot:&lt;/strong&gt; staging used for customer demos, environments with stateful in-memory state, anything with a guaranteed uptime requirement during business hours.&lt;/p&gt;

&lt;p&gt;To enable: update your capacity provider strategy to FARGATE_SPOT. You can split — 80% Spot / 20% On-Demand — to maintain capacity during interruptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method 4 — Kill orphaned environments
&lt;/h2&gt;

&lt;p&gt;Every team has them. An environment was spun up for a feature branch three quarters ago. The engineer who owned it left. The project was deprioritized. The environment is still running, billing $200–$400/month, and nobody has noticed because it doesn't appear in any deployment dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to find them:&lt;/strong&gt; pull the last task run timestamp from CloudWatch Logs Insights — any service with no log events in the last 30 days is a candidate. Cross-reference with your deployment records. No deploy in 60+ days and no active owner: safe to stop.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; In a fleet of 20+ environments, most teams find 2–3 orphaned environments when they look seriously. At $300/month each, that's $900/month — $10,800/year — for compute serving exactly zero requests.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Fortem surfaces last deploy time, last access time, and environment owner for every environment in your fleet. Orphan identification goes from a 2-hour CloudWatch archaeology project to a 2-minute filter. Without tooling, most teams never do this audit — the environments just keep billing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it together — $1,730 → $380/month
&lt;/h2&gt;

&lt;p&gt;Same fleet throughout: 12 environments, 8 services each, 0.5 vCPU, 1 GB, AWS us-east-1 on-demand rates. Each method applied cumulatively.&lt;/p&gt;

&lt;p&gt;Step by step:&lt;/p&gt;

&lt;p&gt;Baseline (24/7): $1,730/mo&lt;/p&gt;

&lt;p&gt;+ Business-hours scheduling (29.8% of baseline): $515/mo −70%&lt;/p&gt;

&lt;p&gt;+ Right-sizing (0.5→0.25 vCPU on 8 dev envs): ~$440/mo −15%&lt;/p&gt;

&lt;p&gt;+ Fargate Spot on 4 eligible environments: ~$380/mo −14%&lt;/p&gt;

&lt;p&gt;Total: $380/mo · 78% reduction · $16,200/yr saved&lt;/p&gt;

&lt;p&gt;This is conservative — zero orphaned environments assumed, lowest Fargate size, Spot applied to only 4 of 12 environments. Larger fleets, bigger services, and multiple AWS accounts scale these numbers proportionally.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/#roi-calculator"&gt;Fortem ROI calculator&lt;/a&gt; lets you plug in your actual fleet size — number of environments, services, vCPU, memory — and see the number for your specific bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does stopping ECS environments lose any data?
&lt;/h3&gt;

&lt;p&gt;No. Stopping an ECS service terminates the running tasks — it does not delete your databases, volumes, or any persistent state. RDS, ElastiCache, and S3 are unaffected. The environment is exactly as you left it when it starts back up.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does an ECS environment take to start up after scheduling?
&lt;/h3&gt;

&lt;p&gt;Typically 60–120 seconds for most ECS Fargate services. Cold start time depends on your image size and application startup logic. Teams that care about fast restarts keep images small and use health check grace periods appropriately.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I schedule only some services within an environment?
&lt;/h3&gt;

&lt;p&gt;Yes, but it's usually not worth the complexity. Scheduling an entire environment atomically (all services together) avoids dependency issues — if you stop only some services, others may fail waiting for dependencies. Start with full-environment scheduling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Fargate Spot safe for staging environments?
&lt;/h3&gt;

&lt;p&gt;It depends on what staging is used for. If staging is purely for automated test runs and can tolerate a 2-minute interruption, Spot is fine. If staging hosts customer demos or is expected to be reliably available during business hours, use On-Demand for those environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I find out which of my environments are costing the most?
&lt;/h3&gt;

&lt;p&gt;AWS Cost Explorer with resource-level cost allocation tags gives you per-service cost breakdowns. You need to tag your ECS services with Environment and Team tags first. Without tags, Cost Explorer shows compute costs in aggregate with no way to attribute them.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;### See what your fleet would save Run the calculator in 30 seconds, then book 2&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;See your fleet cost:&lt;/strong&gt; &lt;a href="https://fortem.dev/ecs-cost-calculator" rel="noopener noreferrer"&gt;fortem.dev/ecs-cost-calculator&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>fargate</category>
      <category>cost</category>
      <category>ecs</category>
    </item>
    <item>
      <title>ECS Fargate Best Practices: Running a Fleet of 10+ Environments Without the Pain</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:57:42 +0000</pubDate>
      <link>https://dev.to/dspv/ecs-fargate-best-practices-running-a-fleet-of-10-environments-without-the-pain-5hka</link>
      <guid>https://dev.to/dspv/ecs-fargate-best-practices-running-a-fleet-of-10-environments-without-the-pain-5hka</guid>
      <description>&lt;h1&gt;
  
  
  ECS Fargate Best Practices: 10+ Environments Without the Pain
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/ecs-fargate-best-practices" rel="noopener noreferrer"&gt;https://fortem.dev/blog/ecs-fargate-best-practices&lt;/a&gt;&lt;br&gt;
Seven ECS Fargate best practices for teams running 10+ environments. Fix hidden costs, Terraform state sprawl, Fargate quota sharing, and scheduling before they break your fleet.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Guide&lt;/p&gt;

&lt;p&gt;Most ECS Fargate best practices guides tell you what to do. This one tells you what breaks between environment 5 and environment 20 — and gives you the exact fix for each. The numbers come from AWS published pricing, service quotas, and patterns we've seen managing fleets at scale. If you're running fewer than 5 environments, most of this won't matter yet. Bookmark it.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Name everything consistently from day one; retrofitting naming across 10+ environments takes weeks.&lt;/li&gt;
&lt;li&gt;  Fixed overhead is $85–100/mo per environment before a single container runs — at 50 envs that's $4,250–5,000/mo invisible spend.&lt;/li&gt;
&lt;li&gt;  Schedule dev/staging off-hours first. It cuts compute cost 60–70% and requires zero infrastructure changes.&lt;/li&gt;
&lt;li&gt;  Set CloudWatch log retention before ingestion hits 15 TB/mo and you get a $7,500 bill.&lt;/li&gt;
&lt;li&gt;  Isolate Terraform state per environment before the 25 MB threshold makes plans take 30+ minutes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Start with naming and account structure
&lt;/h2&gt;

&lt;p&gt;At 3 environments you can get away with ad-hoc names. At 10 you can't — because every AWS resource name is simultaneously a billing dimension, an IAM scope, and a CloudWatch filter. Inconsistent names mean you can't attribute cost, can't write scoped IAM policies, and can't build dashboards without a lookup table.&lt;/p&gt;

&lt;p&gt;The convention that scales: &lt;code&gt;{region_short}-{account}-{envname}&lt;/code&gt;. Applied to every resource from day one. One Terraform local generates every downstream resource name — ECS cluster, task definition, SSM parameter path, IAM role, CloudWatch log group — all from one source.&lt;/p&gt;

&lt;p&gt;Ready to use — copy this today&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;env_prefix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region_short&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;envname&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_cluster"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;  &lt;span class="c1"&gt;# → "use1-prod-main"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_task_definition"&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;family&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-api-td"&lt;/span&gt;
  &lt;span class="c1"&gt;# → "use1-prod-main-api-td"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ssm_parameter"&lt;/span&gt; &lt;span class="s2"&gt;"db_host"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/api/DB_HOST"&lt;/span&gt;
  &lt;span class="c1"&gt;# → "/use1-prod-main/api/DB_HOST"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"task_role"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-api-task-role"&lt;/span&gt;
  &lt;span class="c1"&gt;# → "use1-prod-main-api-task-role"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_cloudwatch_log_group"&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/ecs/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-api"&lt;/span&gt;
  &lt;span class="nx"&gt;retention_in_days&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log_retention_days&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Map naming to account structure. The most common pattern that works at 10+ environments: one AWS account for production, one for all non-prod. This separates Fargate vCPU quota pools, hardens IAM boundaries, and makes Cost Explorer attribution clean.&lt;/p&gt;

&lt;p&gt;One constraint your naming convention must handle: ALB target group names are capped at &lt;strong&gt;32 characters&lt;/strong&gt;, and each ALB has a hard limit of &lt;strong&gt;100 target groups&lt;/strong&gt;. At 20 environments with 6 services each, you're at 120 target groups — past the limit. This forces per-environment ALBs sooner than you think, which increases your fixed overhead. A short naming prefix (&lt;code&gt;use1-prod-api&lt;/code&gt; — 12 chars) leaves room for the target group suffix.&lt;/p&gt;

&lt;p&gt;For the full naming pattern table, including the 32-character target group constraint and per-resource examples, see the dedicated section on &lt;a href="https://dev.to/blog/ecs-multi-environment-strategy/"&gt;consistent naming conventions for ECS environments&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Know your fixed overhead per environment
&lt;/h2&gt;

&lt;p&gt;When engineers estimate ECS costs, they calculate compute: vCPU hours, memory hours, maybe RDS. What they miss is the fixed overhead that exists before a single container runs.&lt;/p&gt;

&lt;p&gt;Every environment needs its own ALB and NAT Gateway. These costs are flat — they don't scale with usage, they don't go away when you stop tasks at night, and they don't appear on the compute line in Cost Explorer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ResourceMonthly costNotes — Application Load Balancer:&lt;/strong&gt; $22/mo$0.0225/hr base + $0.008/LCU-hr&lt;/p&gt;

&lt;p&gt;NAT Gateway (2 AZs)~$66/mo$0.045/hr × 2 + $0.045/GB data&lt;/p&gt;

&lt;p&gt;CloudWatch log basics$3–15/moDepends on log volume + retention&lt;/p&gt;

&lt;p&gt;SSM, ECR, other$1–5/moSmall but additive at scale&lt;/p&gt;

&lt;p&gt;Total fixed overhead$85–100/moBefore first task runs&lt;/p&gt;

&lt;p&gt;At 10 environments, that's &lt;strong&gt;$850–1,000/mo&lt;/strong&gt; invisible spend. At 50 environments, it's &lt;strong&gt;$4,250–5,000/mo&lt;/strong&gt; before a single task runs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; NAT Gateway is the single most expensive fixed line item in any ECS environment — and the easiest to eliminate for non-prod. Teams that care about NAT cost switch non-prod environments to public subnet placement with strict security group rules and Network ACLs instead of private subnets with a NAT. This is meaningfully cheaper but does reduce your network boundary — regulated environments (PCI, HIPAA) and prod should keep the NAT. Evaluate your compliance posture before cutting this corner.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One more lever: VPC Endpoints. If your containers only need to reach AWS services (S3, ECR, CloudWatch, SSM), a VPC Endpoint costs &lt;strong&gt;~$7.20/mo per endpoint&lt;/strong&gt; — roughly 1/5th of one NAT Gateway. For ECR pulls and CloudWatch pushes, Gateway Endpoints (S3, DynamoDB) are free. Combined with the public-subnet approach above, this is the cheapest path to eliminating NAT entirely for non-prod. Strategy: use VPC Endpoints for AWS dependencies and public subnets for outbound internet, and you drop NAT from non-prod without sacrificing functionality.&lt;/p&gt;

&lt;p&gt;We broke down the full per-environment cost — including ALB, NAT Gateway, CloudWatch, and data transfer — in our guide to &lt;a href="https://dev.to/blog/aws-fargate-pricing-real-costs/"&gt;how much an ECS environment actually costs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Schedule dev/staging before the bill bleeds
&lt;/h2&gt;

&lt;p&gt;Your environments run 168 hours a week. Your team works 40–55. Scheduling alone cuts compute cost by &lt;strong&gt;60–70%&lt;/strong&gt;— for most teams it's the single largest ECS cost lever available, and it requires zero code changes. The spread: 70% savings on a strict 40-hour Mon–Fri schedule, 60–65% on a 55-hour week. The exact number depends on your team's working hours, but either way it's the fastest path to a lower AWS bill.&lt;/p&gt;

&lt;p&gt;The problem: AWS-native scheduling operates at the service level. To schedule one environment with 8 services, you need 16 Auto Scaling actions (stop + start per service). At 10 environments that's 160 actions to create, maintain, and update when schedules change.&lt;/p&gt;

&lt;p&gt;EnvironmentsServices eachAuto Scaling actionsSchedule change cost&lt;/p&gt;

&lt;p&gt;38488 updates&lt;/p&gt;

&lt;p&gt;1081608–16 updates&lt;/p&gt;

&lt;p&gt;201040010–20 updates&lt;/p&gt;

&lt;p&gt;$1,730/mo&lt;/p&gt;

&lt;p&gt;$515/mo&lt;/p&gt;

&lt;p&gt;12 envs, 24/7&lt;/p&gt;

&lt;p&gt;$1,730/mo&lt;/p&gt;

&lt;p&gt;12 envs, business hours schedule (55 hrs/week)&lt;/p&gt;

&lt;p&gt;$515/mo&lt;/p&gt;

&lt;p&gt;Monthly AWS Fargate cost−70% savings&lt;/p&gt;

&lt;p&gt;What teams actually do&lt;/p&gt;

&lt;p&gt;Teams start with EventBridge + Lambda at 3–5 environments and it works beautifully. By 10 environments they're maintaining a scheduling codebase with a full test suite. By 15–20 environments, the maintenance burden outweighs the savings — and environments quietly drift back to 24/7. The economics of scheduling are sound; the tooling to maintain it at scale is the bottleneck.&lt;/p&gt;

&lt;p&gt;For a deep dive on the AWS-native approach and a comparison with environment-level scheduling, read &lt;a href="https://dev.to/blog/ecs-environment-scheduling/"&gt;the complete guide to ECS environment scheduling&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Isolate Terraform state before it isolates you
&lt;/h2&gt;

&lt;p&gt;A single Terraform state file containing all environments starts fast. At 25–50 MB, plans take 30+ minutes. At the HCP Terraform hard limit of ~100 MB (from base64 encoding), Terraform stops working entirely.&lt;/p&gt;

&lt;p&gt;The blast radius is worse than the speed problem: one module bug in a shared state file can take down every environment in a single apply. A typo in a variable that propagates to 10 environments creates 10 simultaneous incidents.&lt;/p&gt;

&lt;p&gt;The fix is per-environment state, applied independently. One folder per environment, each with its own S3 backend. No shared state files, no workspaces, no extra tooling — just directories you can see and reason about:&lt;/p&gt;

&lt;p&gt;Folder-per-environment pattern&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Directory structure — one folder per environment, independent state&lt;/span&gt;
&lt;span class="c1"&gt;# terraform/environments/&lt;/span&gt;
&lt;span class="c1"&gt;#   prod/&lt;/span&gt;
&lt;span class="c1"&gt;#     backend.tf        → prod's own S3 backend (separate state file)&lt;/span&gt;
&lt;span class="c1"&gt;#     main.tf           → calls the shared module&lt;/span&gt;
&lt;span class="c1"&gt;#     terraform.tfvars&lt;/span&gt;
&lt;span class="c1"&gt;#   staging/&lt;/span&gt;
&lt;span class="c1"&gt;#     backend.tf        → staging's own S3 backend&lt;/span&gt;
&lt;span class="c1"&gt;#     main.tf&lt;/span&gt;
&lt;span class="c1"&gt;#     terraform.tfvars&lt;/span&gt;
&lt;span class="c1"&gt;#   dev-01/&lt;/span&gt;
&lt;span class="c1"&gt;#     ...&lt;/span&gt;

&lt;span class="c1"&gt;# environments/prod/backend.tf — each environment has its own state&lt;/span&gt;
&lt;span class="k"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="s2"&gt;"s3"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;bucket&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tfstate-org"&lt;/span&gt;
    &lt;span class="nx"&gt;key&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"envs/prod/terraform.tfstate"&lt;/span&gt;
    &lt;span class="nx"&gt;region&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
    &lt;span class="nx"&gt;encrypt&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;dynamodb_table&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-locks"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# environments/prod/main.tf — thin, calls the shared module&lt;/span&gt;
&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"environment"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../../modules/ecs-environment"&lt;/span&gt;

  &lt;span class="nx"&gt;env_name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"prod"&lt;/span&gt;
  &lt;span class="nx"&gt;account_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"111111111111"&lt;/span&gt;
  &lt;span class="c1"&gt;# Plans run independently, blast radius is one environment&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;environments/&amp;lt;name&amp;gt;/&lt;/code&gt; folder is self-contained: its own backend, its own tfvars, its own plan/apply lifecycle. You can see the entire fleet structure by looking at the directory tree — no jumping between files to trace configuration inheritance. Adding an environment means copying one folder and changing three lines. This is the pattern teams converge on after workspaces stop scaling, and it works with vanilla Terraform — no extra tooling required.&lt;/p&gt;

&lt;p&gt;How to know when to split — check your state file size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform state pull | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under 5 MB — fine. 10–25 MB — start planning the migration. Over 25 MB — plans take 30+ minutes and locking contention becomes noticeable. The 3-minute plan threshold is also a strong signal: if a plan against one environment takes longer than 3 minutes, your state file is too large regardless of its byte count.&lt;/p&gt;

&lt;p&gt;Practical guidance: teams managing 10+ environments should move to per-environment state before hitting 25 MB, not after. The migration is mechanical — extract each environment into its own directory, run one init per directory, and verify with a plan. It takes an afternoon and prevents a week of incidents. For the full implementation guide, see &lt;a href="https://dev.to/blog/ecs-fargate-terraform/"&gt;managing ECS Fargate with Terraform: what works and what doesn't&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Set CloudWatch retention on day one
&lt;/h2&gt;

&lt;p&gt;The default CloudWatch log group setting is “never expire.” Teams routinely forget to change this. At &lt;strong&gt;$0.50/GB ingested&lt;/strong&gt;, a fleet of 50 containers writing 5 GB/day generates $75/mo in ingestion costs alone — before storage, before metrics.&lt;/p&gt;

&lt;p&gt;CloudWatch Logs at scale&lt;/p&gt;

&lt;p&gt;50 containers × 5 GB/day: 7,500 GB/mo × $0.50/GB = $3,750/mo&lt;/p&gt;

&lt;p&gt;Double the fleet to 100 containers: 15 TB/mo = $7,500/mo. We've seen this.&lt;/p&gt;

&lt;p&gt;Add Container Insights: $0.21/hr per cluster&lt;/p&gt;

&lt;p&gt;The fix: set &lt;code&gt;retention_in_days&lt;/code&gt;in Terraform. 30 days for dev/staging, 90 for prod. Never “never expire.”&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_cloudwatch_log_group"&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/ecs/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_prefix&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nx"&gt;retention_in_days&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_type&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"prod"&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;

  &lt;span class="c1"&gt;# Optional: switch non-prod to Infrequent Access — 50% cheaper storage&lt;/span&gt;
  &lt;span class="c1"&gt;# for logs read less than once a week&lt;/span&gt;
  &lt;span class="nx"&gt;log_group_class&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env_type&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"prod"&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;"STANDARD"&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"INFREQUENT_ACCESS"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also: SSM parameters at &lt;strong&gt;$0.05/parameter/month&lt;/strong&gt; creep unnoticed. At 10 environments × 8 services × 5 parameters each = 400 parameters = $20/mo. Small, but nobody accounts for it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; We've seen teams discover a $7,500/mo CloudWatch bill six months after launching their 15th environment. The Terraform was deployed with default retention, and nobody looked at the CloudWatch line in Cost Explorer until the CFO asked. Set retention in your module defaults. It costs nothing to set and thousands to miss.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;CloudWatch is one piece of the ECS cost puzzle. For the full picture — Fargate compute, data transfer, load balancing, and the 65% savings playbook — see &lt;a href="https://dev.to/blog/ecs-fargate-cost-optimization/"&gt;how to cut AWS ECS Fargate costs by 65%&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Fargate Spot where it belongs
&lt;/h2&gt;

&lt;p&gt;Fargate Spot offers a &lt;strong&gt;68% discount&lt;/strong&gt; over on-demand: $0.01291/vCPU-hr vs $0.04048. The trade-off is a 2-minute interruption notice when AWS reclaims capacity, per the &lt;a href="https://aws.amazon.com/fargate/pricing/" rel="noopener noreferrer"&gt;AWS Fargate pricing page&lt;/a&gt; (verified May 2026).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Fargate Spot runs tasks on spare AWS EC2 capacity at up to a 70% discount compared to Fargate On-Demand. If AWS needs the capacity back, your running tasks will be given a two-minute warning and then stopped.”&lt;/p&gt;

&lt;p&gt;— &lt;a href="https://aws.amazon.com/fargate/pricing/" rel="noopener noreferrer"&gt;AWS Fargate Pricing&lt;/a&gt;, verified May 2026&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Real interruption rates: large instance families see under 5% interruption; common instance types see 5–15%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best practice:&lt;/strong&gt; use a capacity provider strategy with a 70/30 or 80/20 Spot/On-Demand split. Spot for CI/CD runners, staging, automated tests, and non-interactive batch jobs. On-Demand for production, customer-facing staging, and demo environments.&lt;/p&gt;

&lt;p&gt;To enable: create a capacity provider strategy that includes both FARGATE_SPOT and FARGATE with a weighted base. AWS distributes tasks proportionally. The base weight (first number) is the minimum On-Demand count; the weight determines the split for additional tasks.&lt;/p&gt;

&lt;p&gt;Capacity provider strategy with weighted split&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Define capacity providers for the ECS cluster&lt;/span&gt;
&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_cluster_capacity_providers"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cluster_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ecs_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;

  &lt;span class="nx"&gt;capacity_providers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"FARGATE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"FARGATE_SPOT"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;default_capacity_provider_strategy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;capacity_provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"FARGATE_SPOT"&lt;/span&gt;
    &lt;span class="nx"&gt;weight&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="nx"&gt;base&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;# 0 On-Demand tasks minimum for non-prod&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;default_capacity_provider_strategy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;capacity_provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"FARGATE"&lt;/span&gt;
    &lt;span class="nx"&gt;weight&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;# Use On-Demand only when Spot unavailable&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Per-service: adjust weights based on workload criticality&lt;/span&gt;
&lt;span class="c1"&gt;# Prod services use base=2 + more FARGATE weight&lt;/span&gt;
&lt;span class="c1"&gt;# Non-prod services use base=0 + FARGATE_SPOT only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One operational note: Fargate Spot provides a 2-minute SIGTERM window before SIGKILL. Your containers must handle graceful shutdown within this window — drain connections, flush buffers, checkpoint state. If your app takes 3+ minutes to shut down, Spot tasks will be force-killed mid-flight. For CI/CD runners and stateless workers this is fine; for anything with in-flight state, On-Demand is the safer choice. For more on Spot savings strategy, see &lt;a href="https://dev.to/blog/ecs-fargate-cost-optimization/"&gt;how to cut ECS Fargate costs by 65%&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Split your Fargate quota before dev takes down prod
&lt;/h2&gt;

&lt;p&gt;Fargate vCPU quota is per-region, per-account. If dev and prod share an account, they share the same quota pool. A developer running load tests against a dev environment can exhaust the regional Fargate quota — and production can't scale up during a traffic spike.&lt;/p&gt;

&lt;p&gt;AWS has no native mechanism to reserve quota for production. The default Fargate On-Demand vCPU quota is 6 vCPUs per region (soft limit, increaseable to 10,000+ via support ticket). Dev and prod compete for the same pool.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;KEY INSIGHT:&lt;/strong&gt; Fargate quota sharing is invisible until it bites you. You won't know it happened until prod fails to scale during an incident. At that point, the fix takes hours — filing a support ticket and waiting for the quota increase to propagate. Account-level separation (prod in one account, non-prod in another) eliminates this class of incident.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The fix: separate accounts for prod vs non-prod. If that's not immediately feasible, monitor quota utilization proactively. Go to &lt;strong&gt;Service Quotas → AWS Fargate → Running On-Demand Fargate vCPUs&lt;/strong&gt; in the AWS Console. Set a CloudWatch alarm at 70% utilization so you have time to react before hitting the limit. Quota increase requests can take 24–72 hours — at 70% you have days of runway; at 95% you have hours.&lt;/p&gt;

&lt;p&gt;Two more constraints that hit at fleet scale: (1) &lt;strong&gt;Fargate launch rate&lt;/strong&gt; — 20 tasks/second sustained in older regions, 5/second in newer ones. If your scheduler tries to start 100 tasks across 10 environments simultaneously, you'll hit the throttle. Add jitter to scheduled starts. (2) &lt;strong&gt;ECS API throttle&lt;/strong&gt; — 10 burst requests/second, 1 sustained. Scripts that poll &lt;code&gt;DescribeServices&lt;/code&gt; across 50 services will get rate-limited. Add exponential backoff and batch calls.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/blog/ecs-multi-environment-strategy/"&gt;ECS multi-environment strategy guide&lt;/a&gt; covers account structure patterns in detail, including when to split further and how to set up cross-account IAM for Fortem-style tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I track per-environment costs in AWS?
&lt;/h3&gt;

&lt;p&gt;Enable cost allocation tags in the Billing console and consistently tag every resource with an environment key. AWS Cost Explorer can then filter and group by environment. For real-time cost, use ECS Split Cost Allocation Data which attributes Fargate spend per task using system tags.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use one ECS cluster or one per environment?
&lt;/h3&gt;

&lt;p&gt;One cluster per environment is the right default. It keeps IAM boundaries clean, makes Cost Explorer filtering simple, and prevents quota sharing between environments. Shared clusters only make sense if you have strict naming discipline and no compliance requirements for isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the fastest way to start saving on ECS Fargate costs?
&lt;/h3&gt;

&lt;p&gt;Schedule non-prod environments to run only during business hours. This cuts compute spend by 60-70% without changing any code. Next: set CloudWatch log retention to 30 days for dev/staging. Third: use Fargate Spot for CI/CD runners and staging workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I prevent developers from leaving environments running 24/7?
&lt;/h3&gt;

&lt;p&gt;Automated scheduling with developer override. Set a default schedule (e.g. Mon-Fri 9am-7pm) and let developers extend via Slack command or a simple self-service UI. AWS-native requires maintaining EventBridge rules per service; at 10+ environments this becomes a maintenance burden.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Fortem replace Terraform?
&lt;/h3&gt;

&lt;p&gt;No. Fortem manages your running ECS environments — scheduling, monitoring, cloning — but does not replace your IaC tooling. Your Terraform continues to define infrastructure. Fortem operates on top of what Terraform builds.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;### See what your fleet would save Run the calculator in 30 seconds, then book 2&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;See your fleet cost:&lt;/strong&gt; &lt;a href="https://fortem.dev/ecs-cost-calculator" rel="noopener noreferrer"&gt;fortem.dev/ecs-cost-calculator&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ecs</category>
      <category>fargate</category>
      <category>devops</category>
    </item>
    <item>
      <title>ECS Environment Scheduling: The Complete Guide</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:57:41 +0000</pubDate>
      <link>https://dev.to/dspv/ecs-environment-scheduling-the-complete-guide-295b</link>
      <guid>https://dev.to/dspv/ecs-environment-scheduling-the-complete-guide-295b</guid>
      <description>&lt;h1&gt;
  
  
  How Do You Stop Paying for Idle ECS Environments?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://fortem.dev/blog/ecs-environment-scheduling" rel="noopener noreferrer"&gt;https://fortem.dev/blog/ecs-environment-scheduling&lt;/a&gt;&lt;br&gt;
Stop paying for ECS dev and staging compute when nobody's using it. Every scheduling approach — AWS-native options, trade-offs, and what teams at fleet scale actually do.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Your dev and staging ECS environments run 168 hours a week. Your team works 40. The other 128 hours are pure waste. This guide covers every approach to scheduling ECS environments — from AWS-native options you can set up today to what actually works when you're managing 20+ environments across multiple accounts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The math: what you're actually paying
&lt;/h2&gt;

&lt;p&gt;A typical dev environment on ECS Fargate — 8 services, 0.5 vCPU and 1GB memory each — costs around &lt;strong&gt;$144/month&lt;/strong&gt;running 24/7. That's $1,728/year for one environment that your developers use 50 hours a week at most.&lt;/p&gt;

&lt;p&gt;ScheduleHours/weekHours/monthMonthly cost&lt;/p&gt;

&lt;p&gt;24/7 (current)168730$144&lt;/p&gt;

&lt;p&gt;Mon–Fri 9am–7pm50217$43&lt;/p&gt;

&lt;p&gt;Mon–Fri 8am–8pm60260$51&lt;/p&gt;

&lt;p&gt;Mon–Sun 8am–10pm98425$84&lt;/p&gt;

&lt;p&gt;Switching one 8-service environment from 24/7 to business hours saves &lt;strong&gt;$101/month&lt;/strong&gt;. At 10 environments that's $1,010/month — $12,120/year — without changing a single line of application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  How ECS scheduling works
&lt;/h2&gt;

&lt;p&gt;ECS doesn't have a native "scheduled environment" concept. What you're actually doing is setting the &lt;strong&gt;desired count&lt;/strong&gt; of each ECS service to 0 on a schedule (stop) and back to its normal value on another schedule (start).&lt;/p&gt;

&lt;p&gt;When desired count hits 0, ECS drains existing tasks and stops billing for vCPU and memory. Your service definition, load balancer, security groups, and networking remain intact. The environment is "off" — not deleted. Starting it is setting desired count back to 1 (or whatever your normal value is).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key principle — You pay for running tasks, not for service definitions.&lt;/strong&gt; Desired count = 0 means no tasks running means no Fargate billing. The service configuration costs nothing — only the compute does.&lt;/p&gt;

&lt;p&gt;Ready to use — copy this today&lt;/p&gt;

&lt;p&gt;This EventBridge + Lambda setup stops and starts an ECS service on a schedule. Replace the cluster name, service name, and region — it works today with zero additional tools.&lt;/p&gt;

&lt;p&gt;pythonCopy&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;ecs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ecs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;CLUSTER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;SERVICE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SERVICE_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_desired_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_service&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CLUSTER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SERVICE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;desiredCount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Set &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; desired count to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# "stop" or "start"
&lt;/span&gt;    &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NORMAL_COUNT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;set_desired_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy this as a Lambda, then create two EventBridge Scheduler rules — one that invokes it with { "action": "stop" } on weekdays at 7PM, another with { "action": "start" } at 9AM Mon–Fri. Total cost: zero beyond the Lambda invocations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 1: Application Auto Scaling scheduled actions
&lt;/h2&gt;

&lt;p&gt;Best for: 1–3 environments, simple schedules&lt;/p&gt;

&lt;p&gt;Application Auto Scaling supports scheduled scaling actions on ECS services. You define a cron expression and a min/max/desired capacity. AWS handles the rest — no Lambda, no EventBridge rules to manage.&lt;/p&gt;

&lt;p&gt;Register your ECS service as a scalable target, then create two scheduled actions — one to stop (desired = 0) and one to start (desired = your normal count):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Register the service as a scalable target&lt;/span&gt;
aws application-autoscaling register-scalable-target &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-namespace&lt;/span&gt; ecs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-id&lt;/span&gt; service/my-cluster/my-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scalable-dimension&lt;/span&gt; ecs:service:DesiredCount &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--min-capacity&lt;/span&gt; 0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-capacity&lt;/span&gt; 3

&lt;span class="c"&gt;# Stop at 7pm UTC (Mon–Fri)&lt;/span&gt;
aws application-autoscaling put-scheduled-action &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-namespace&lt;/span&gt; ecs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-id&lt;/span&gt; service/my-cluster/my-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scalable-dimension&lt;/span&gt; ecs:service:DesiredCount &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scheduled-action-name&lt;/span&gt; stop-evenings &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--schedule&lt;/span&gt; &lt;span class="s2"&gt;"cron(0 19 ? * MON-FRI *)"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scalable-target-action&lt;/span&gt; &lt;span class="nv"&gt;MinCapacity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0,MaxCapacity&lt;span class="o"&gt;=&lt;/span&gt;0

&lt;span class="c"&gt;# Start at 8am UTC (Mon–Fri)&lt;/span&gt;
aws application-autoscaling put-scheduled-action &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-namespace&lt;/span&gt; ecs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-id&lt;/span&gt; service/my-cluster/my-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scalable-dimension&lt;/span&gt; ecs:service:DesiredCount &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scheduled-action-name&lt;/span&gt; start-mornings &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--schedule&lt;/span&gt; &lt;span class="s2"&gt;"cron(0 8 ? * MON-FRI *)"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scalable-target-action&lt;/span&gt; &lt;span class="nv"&gt;MinCapacity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1,MaxCapacity&lt;span class="o"&gt;=&lt;/span&gt;3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Limitations&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  • One command per service — 8 services × 2 actions = 16 CLI calls per environment&lt;/li&gt;
&lt;li&gt;  • No concept of "environment" — you schedule individual services&lt;/li&gt;
&lt;li&gt;  • Schedule changes require updating each service individually&lt;/li&gt;
&lt;li&gt;  • No visibility into scheduled state across services or environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Option 2: EventBridge Scheduler + Lambda
&lt;/h2&gt;

&lt;p&gt;Best for: multiple environments, custom logic, per-timezone schedules&lt;/p&gt;

&lt;p&gt;EventBridge Scheduler triggers a Lambda function on a cron schedule. The Lambda iterates over all services in an environment (identified by a tag) and sets their desired count. This is the most flexible AWS-native approach — you can handle timezones, environment grouping, and custom logic.&lt;/p&gt;

&lt;p&gt;The Lambda function itself is straightforward — iterate over tagged services and update desired count:&lt;/p&gt;

&lt;p&gt;pythonCopy&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="n"&gt;ecs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ecs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;desired_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;desired_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# 0 to stop, 1 to start
&lt;/span&gt;    &lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;env_tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;environment&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;          &lt;span class="c1"&gt;# e.g. "staging"
&lt;/span&gt;
    &lt;span class="c1"&gt;# List all services in the cluster
&lt;/span&gt;    &lt;span class="n"&gt;paginator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_paginator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;list_services&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;paginator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;paginate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;arn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;serviceArns&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="c1"&gt;# Describe to get tags
&lt;/span&gt;            &lt;span class="n"&gt;svc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe_services&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;services&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TAGS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;services&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;svc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])}&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Environment&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;env_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;svc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;desiredCount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;desired_count&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c1"&gt;# Store current count before stopping
&lt;/span&gt;                    &lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tag_resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;resourceArn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ScheduledDesiredCount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                               &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;)}]&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_service&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;desiredCount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c1"&gt;# Restore previous count
&lt;/span&gt;                    &lt;span class="n"&gt;restore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ScheduledDesiredCount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                    &lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_service&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;desiredCount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;restore&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create two EventBridge Scheduler rules — one for stop, one for start — each passing the appropriate &lt;code&gt;desired_count&lt;/code&gt; in the input.&lt;/p&gt;

&lt;p&gt;What this doesn't solve&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  • No UI — schedule changes require code or CLI changes&lt;/li&gt;
&lt;li&gt;  • Per-timezone logic gets complex fast (US-east vs EU-west teams)&lt;/li&gt;
&lt;li&gt;  • Error handling and alerting on failed starts is your problem&lt;/li&gt;
&lt;li&gt;  • At 10+ environments, you're maintaining a scheduling system, not using one&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Option 3: Terraform-managed schedules
&lt;/h2&gt;

&lt;p&gt;Best for: teams with strong Terraform discipline and few environments&lt;/p&gt;

&lt;p&gt;You can manage scheduled scaling actions directly in Terraform using the &lt;code&gt;aws_appautoscaling_scheduled_action&lt;/code&gt; resource. This keeps scheduling configuration version-controlled alongside your infrastructure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_appautoscaling_target"&lt;/span&gt; &lt;span class="s2"&gt;"ecs_target"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;service_namespace&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ecs"&lt;/span&gt;
  &lt;span class="nx"&gt;resource_id&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"service/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cluster_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nx"&gt;scalable_dimension&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ecs:service:DesiredCount"&lt;/span&gt;
  &lt;span class="nx"&gt;min_capacity&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="nx"&gt;max_capacity&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;max_capacity&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_appautoscaling_scheduled_action"&lt;/span&gt; &lt;span class="s2"&gt;"stop"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-stop"&lt;/span&gt;
  &lt;span class="nx"&gt;service_namespace&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ecs"&lt;/span&gt;
  &lt;span class="nx"&gt;resource_id&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_appautoscaling_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ecs_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resource_id&lt;/span&gt;
  &lt;span class="nx"&gt;scalable_dimension&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_appautoscaling_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ecs_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;scalable_dimension&lt;/span&gt;
  &lt;span class="nx"&gt;schedule&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cron(0 19 ? * MON-FRI *)"&lt;/span&gt;

  &lt;span class="nx"&gt;scalable_target_action&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;min_capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;max_capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_appautoscaling_scheduled_action"&lt;/span&gt; &lt;span class="s2"&gt;"start"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-start"&lt;/span&gt;
  &lt;span class="nx"&gt;service_namespace&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ecs"&lt;/span&gt;
  &lt;span class="nx"&gt;resource_id&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_appautoscaling_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ecs_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resource_id&lt;/span&gt;
  &lt;span class="nx"&gt;scalable_dimension&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_appautoscaling_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ecs_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;scalable_dimension&lt;/span&gt;
  &lt;span class="nx"&gt;schedule&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cron(0 8 ? * MON-FRI *)"&lt;/span&gt;

  &lt;span class="nx"&gt;scalable_target_action&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;min_capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;desired_count&lt;/span&gt;
    &lt;span class="nx"&gt;max_capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;max_capacity&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean and auditable — but it still operates at the service level. Changing a schedule for an environment with 8 services means updating 8 Terraform resources and running apply. For teams where schedules change rarely, this is fine. For teams where developers want to adjust their own environment hours, it becomes a bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks at fleet scale
&lt;/h2&gt;

&lt;p&gt;Every approach above works at 1–3 environments. Here's what teams discover when they try to scale it to 15–50 environments across multiple AWS accounts:&lt;/p&gt;

&lt;p&gt;✗&lt;/p&gt;

&lt;p&gt;Per-service configuration doesn't scale&lt;/p&gt;

&lt;p&gt;At 20 environments × 8 services, you have 160 individual Auto Scaling targets to manage. A schedule change for one environment touches 8 resources. A timezone change for one team requires finding and updating those 8 resources across potentially multiple accounts.&lt;/p&gt;

&lt;p&gt;✗&lt;/p&gt;

&lt;p&gt;No environment-level visibility&lt;/p&gt;

&lt;p&gt;None of the AWS-native approaches give you a view of 'which environments are running, which are scheduled, and what their current cost is.' You're looking at individual services in CloudWatch and Cost Explorer, not environments as units.&lt;/p&gt;

&lt;p&gt;✗&lt;/p&gt;

&lt;p&gt;Timezone complexity multiplies&lt;/p&gt;

&lt;p&gt;EU teams want environments to stop at 18:00 CET. US East teams want 19:00 EST. US West teams want 19:00 PST. Each requires separate cron expressions — and those expressions need to account for DST. A single Lambda managing this across 20 environments becomes a meaningful maintenance burden.&lt;/p&gt;

&lt;p&gt;✗&lt;/p&gt;

&lt;p&gt;Developer self-service breaks down&lt;/p&gt;

&lt;p&gt;Developers want to override their environment schedule occasionally — stay late on a sprint, work a weekend. In every AWS-native approach, that override requires console access or a platform engineer intervention. The friction is high enough that teams just leave environments running 24/7 to avoid the hassle.&lt;/p&gt;

&lt;p&gt;✗&lt;/p&gt;

&lt;p&gt;Failed starts are silent&lt;/p&gt;

&lt;p&gt;If an ECS service fails to start after a scheduled start (image pull error, IAM issue, resource limits), the EventBridge rule fires, Lambda runs, desired count updates — but nobody knows the environment didn't come up. You need separate health checking and alerting to catch this.&lt;/p&gt;

&lt;p&gt;The pattern we see&lt;/p&gt;

&lt;p&gt;Teams start with EventBridge + Lambda at 3 environments. By 10 environments they're spending 2–4 hours a month maintaining the scheduling system. By 20 environments they've either given up and gone back to 24/7, or a platform engineer owns a growing codebase that does nothing except stop and start ECS services on a schedule.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to track
&lt;/h2&gt;

&lt;p&gt;Regardless of which approach you use, these are the metrics worth monitoring:&lt;/p&gt;

&lt;p&gt;Baseline vs. actual spend per environment&lt;/p&gt;

&lt;p&gt;Tag all ECS services with Environment and use Cost Explorer with resource-level tags. Baseline = what you'd pay at 24/7. Actual = what you paid. The delta is your scheduling savings.&lt;/p&gt;

&lt;p&gt;Schedule adherence&lt;/p&gt;

&lt;p&gt;CloudWatch metric: ECS service DesiredCount. If an environment should be at 0 from 19:00–08:00 but DesiredCount is 1, your schedule isn't firing. Set an alarm on non-zero DesiredCount during expected off-hours.&lt;/p&gt;

&lt;p&gt;Start latency&lt;/p&gt;

&lt;p&gt;Time from scheduled start to all services healthy. ECS RunningTaskCount = DesiredCount AND target group healthy host count = DesiredCount. Anything over 3 minutes warrants investigation.&lt;/p&gt;

&lt;p&gt;Failed starts&lt;/p&gt;

&lt;p&gt;ECS StoppedTaskCount increasing after a scheduled start usually means image pull errors or resource exhaustion. CloudWatch alarm on StoppedTaskCount &amp;gt; 0 for environments in scheduled-start window.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;See your scheduling savings:&lt;/strong&gt; &lt;a href="https://fortem.dev/ecs-cost-calculator" rel="noopener noreferrer"&gt;fortem.dev/ecs-cost-calculator&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ecs</category>
      <category>fargate</category>
      <category>scheduling</category>
    </item>
  </channel>
</rss>
