<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rick Wise</title>
    <description>The latest articles on DEV Community by Rick Wise (@cloudwiseteam).</description>
    <link>https://dev.to/cloudwiseteam</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3582447%2Fe7a88946-c7a3-4aad-9242-6d52380c09f1.png</url>
      <title>DEV Community: Rick Wise</title>
      <link>https://dev.to/cloudwiseteam</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cloudwiseteam"/>
    <language>en</language>
    <item>
      <title>ElastiCache Pricing Breakdown: Where the Money Actually Goes</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 16 Apr 2026 14:29:01 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/elasticache-pricing-breakdown-where-the-money-actually-goes-1jc5</link>
      <guid>https://dev.to/cloudwiseteam/elasticache-pricing-breakdown-where-the-money-actually-goes-1jc5</guid>
      <description>&lt;p&gt;ElastiCache looks straightforward on the bill. You pick a node type, maybe add a replica for high availability, and move on. Then the invoice arrives and the number is bigger than the mental math suggested.&lt;/p&gt;

&lt;p&gt;The gap usually comes from one of five places: engine choice, replication topology, extended support surcharges, idle clusters, or oversized nodes nobody ever right-sized. Let's break down exactly how ElastiCache charges — and where teams get surprised.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Engines, Three Price Points
&lt;/h2&gt;

&lt;p&gt;ElastiCache supports three engines: Valkey, Redis OSS, and Memcached. They don't cost the same.&lt;/p&gt;

&lt;p&gt;Valkey is &lt;strong&gt;20% cheaper&lt;/strong&gt; than Redis OSS and Memcached for node-based clusters, and &lt;strong&gt;33% cheaper&lt;/strong&gt; on ElastiCache Serverless. This isn't a promotional rate — it's the permanent pricing structure AWS launched with Valkey.&lt;/p&gt;

&lt;p&gt;For context, a cache.r7g.xlarge in us-east-1:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Engine&lt;/th&gt;
&lt;th&gt;Hourly Rate&lt;/th&gt;
&lt;th&gt;Monthly (730 hrs)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Valkey&lt;/td&gt;
&lt;td&gt;$0.3496&lt;/td&gt;
&lt;td&gt;~$255&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redis OSS&lt;/td&gt;
&lt;td&gt;$0.437&lt;/td&gt;
&lt;td&gt;~$319&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memcached&lt;/td&gt;
&lt;td&gt;$0.437&lt;/td&gt;
&lt;td&gt;~$319&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Prices shown for us-east-1, On-Demand.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's a $64/month difference per node on a single instance type. Multiply that across a 12-node cluster and you're looking at $768/month — just from engine choice. If you're running Redis OSS and don't need Redis-specific features that Valkey doesn't support, the migration saves real money.&lt;/p&gt;

&lt;h2&gt;
  
  
  Node-Based Pricing: You Pay Whether the Cache Is Hit or Not
&lt;/h2&gt;

&lt;p&gt;ElastiCache charges per node-hour from the moment a node is launched until it's terminated. Partial hours are billed as full hours. There is no scale-to-zero.&lt;/p&gt;

&lt;p&gt;A few common node types and what they cost:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Node Type&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;th&gt;Hourly Rate&lt;/th&gt;
&lt;th&gt;Monthly (730 hrs)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;cache.t3.micro&lt;/td&gt;
&lt;td&gt;0.5 GiB&lt;/td&gt;
&lt;td&gt;$0.017&lt;/td&gt;
&lt;td&gt;~$12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cache.m5.large&lt;/td&gt;
&lt;td&gt;6.38 GiB&lt;/td&gt;
&lt;td&gt;$0.156&lt;/td&gt;
&lt;td&gt;~$114&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cache.r7g.xlarge&lt;/td&gt;
&lt;td&gt;26.32 GiB&lt;/td&gt;
&lt;td&gt;$0.437&lt;/td&gt;
&lt;td&gt;~$319&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cache.r6g.16xlarge&lt;/td&gt;
&lt;td&gt;419.09 GiB&lt;/td&gt;
&lt;td&gt;$5.254&lt;/td&gt;
&lt;td&gt;~$3,835&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Prices shown for Redis OSS / Memcached in us-east-1, On-Demand. Valkey is 20% lower.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The important thing to internalize: a cache.t3.micro sitting idle costs the same $12/month as one handling thousands of requests per second. The meter runs on time, not usage.&lt;/p&gt;

&lt;p&gt;AWS recommends reserving 25% of a node's memory for non-data use (replication buffers, OS overhead, etc.), so the usable capacity of a cache.r7g.xlarge is roughly 19.74 GiB, not 26.32 GiB.&lt;/p&gt;

&lt;h2&gt;
  
  
  Replication Multiplies the Bill
&lt;/h2&gt;

&lt;p&gt;Most production deployments use replication for high availability. With Redis OSS or Valkey, you configure a replication group with a primary node and one or more replica nodes per shard.&lt;/p&gt;

&lt;p&gt;Every replica is a full node charged at the same hourly rate.&lt;/p&gt;

&lt;p&gt;A three-shard cluster with one replica per shard using cache.r7g.xlarge (Valkey):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3 shards × 2 nodes per shard = 6 nodes
6 × $0.3496/hr = $2.10/hr → ~$1,531/month
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add a second replica for read scaling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3 shards × 3 nodes per shard = 9 nodes
9 × $0.3496/hr = $3.15/hr → ~$2,297/month
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plus, multi-AZ replication generates cross-AZ data transfer at $0.01/GiB in each direction. For a high-throughput cache doing 100,000 requests/second with 500-byte objects, that's roughly 167 GiB/hour of traffic. If 50% crosses AZ boundaries, that's an extra $0.84/hour — about $613/month in data transfer alone.&lt;/p&gt;

&lt;p&gt;Teams often enable multi-AZ replication on dev and staging environments where a single node would be fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serverless: Simpler, But Not Always Cheaper
&lt;/h2&gt;

&lt;p&gt;ElastiCache Serverless removes the node sizing decision entirely. You pay for two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data stored&lt;/strong&gt; — billed in GB-hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ElastiCache Processing Units (ECPUs)&lt;/strong&gt; — a unit combining vCPU time and data transferred&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Valkey&lt;/th&gt;
&lt;th&gt;Redis OSS&lt;/th&gt;
&lt;th&gt;Memcached&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data storage&lt;/td&gt;
&lt;td&gt;$0.084/GB-hr&lt;/td&gt;
&lt;td&gt;$0.125/GB-hr&lt;/td&gt;
&lt;td&gt;$0.125/GB-hr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ECPUs&lt;/td&gt;
&lt;td&gt;$0.0023/M&lt;/td&gt;
&lt;td&gt;$0.0034/M&lt;/td&gt;
&lt;td&gt;$0.0034/M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimum data stored&lt;/td&gt;
&lt;td&gt;100 MB&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Prices shown for us-east-1.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A simple GET or SET transferring under 1 KB consumes 1 ECPU. A command transferring 3.2 KB consumes 3.2 ECPUs. Commands that use more vCPU time (like SORT or ZADD) consume proportionally more.&lt;/p&gt;

&lt;p&gt;Serverless can be cheaper for spiky workloads because you don't over-provision for peaks. But for stable, high-throughput workloads, node-based clusters are often significantly cheaper. AWS's own Example 2 shows a spiky workload costing $2.92/hour serverless vs. $5.66/hour on-demand nodes — but for steady traffic, the math can flip the other way.&lt;/p&gt;

&lt;p&gt;The minimum charge matters too. A Serverless cache for Redis OSS or Memcached is metered for at least 1 GB of data stored — roughly $91/month minimum even if you're storing almost nothing. Valkey's 100 MB minimum brings that floor down to about $6/month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extended Support: The Surcharge Nobody Budgets For
&lt;/h2&gt;

&lt;p&gt;When a Redis OSS or Memcached engine version reaches end-of-life, AWS continues providing security patches through Extended Support — at a steep premium.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Period&lt;/th&gt;
&lt;th&gt;Surcharge&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Year 1–2 after EOL&lt;/td&gt;
&lt;td&gt;80% premium on node-hour rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Year 3 after EOL&lt;/td&gt;
&lt;td&gt;160% premium on node-hour rate&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A cache.m5.large running Redis 5 (EOL January 31, 2026) at $0.156/hour becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Year 1–2:&lt;/strong&gt; $0.156 + ($0.156 × 80%) = &lt;strong&gt;$0.281/hour&lt;/strong&gt; (~$205/month)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Year 3:&lt;/strong&gt; $0.156 + ($0.156 × 160%) = &lt;strong&gt;$0.406/hour&lt;/strong&gt; (~$296/month)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's nearly triple the base cost by year three. Teams that don't track engine versions can drift into Extended Support without realizing their bill just jumped 80%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backup Storage and Data Transfer
&lt;/h2&gt;

&lt;p&gt;Two cost categories that don't appear under the main "ElastiCache" line:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backup storage:&lt;/strong&gt; $0.085/GiB per month for all regions. No data transfer charges for creating or restoring backups. This is generally small unless you're snapshotting large clusters frequently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data transfer:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Same AZ (EC2 ↔ ElastiCache)&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-AZ (same Region)&lt;/td&gt;
&lt;td&gt;$0.01/GiB each way&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-Region (Global Datastore)&lt;/td&gt;
&lt;td&gt;$0.02/GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The cross-AZ charge is easy to miss because it shows up as EC2 data transfer on the bill, not ElastiCache. You're only charged for the EC2 side — there's no ElastiCache data transfer charge for traffic in or out of the node itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Tiering: The Cost Saver Most Teams Don't Know About
&lt;/h2&gt;

&lt;p&gt;R6gd nodes combine memory and NVMe SSD, automatically moving least-frequently-accessed data to SSD. You get nearly 5× the total storage capacity compared to memory-only R6g nodes.&lt;/p&gt;

&lt;p&gt;AWS's example: a 1 TiB dataset needs 1 cache.r6gd.16xlarge node ($9.98/hour) vs. 4 cache.r6g.16xlarge nodes ($21.01/hour) — a 52% cost reduction.&lt;/p&gt;

&lt;p&gt;The trade-off: SSD-resident data has slightly higher latency on first access. If your workload regularly accesses less than 20% of the dataset, data tiering is worth evaluating.&lt;/p&gt;

&lt;p&gt;Data tiering is not available with ElastiCache Serverless.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reserved Nodes: Up to 55% Off
&lt;/h2&gt;

&lt;p&gt;If your ElastiCache usage is stable, reserved nodes offer steep discounts:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Commitment&lt;/th&gt;
&lt;th&gt;Discount vs. On-Demand&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1-year, No Upfront&lt;/td&gt;
&lt;td&gt;Up to 48.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year, Partial Upfront&lt;/td&gt;
&lt;td&gt;Up to 52%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3-year, All Upfront&lt;/td&gt;
&lt;td&gt;Up to 55%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Reserved nodes are size-flexible — you can apply the discount across different node sizes within the same family. If you buy a reservation for cache.r7g.xlarge, it can cover cache.r7g.large nodes proportionally.&lt;/p&gt;

&lt;p&gt;One useful detail: Redis OSS reservations automatically apply to Valkey nodes in the same family and region. Since Valkey is 20% cheaper, you get 20% more value from existing reservations after migrating.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem: Idle Caches
&lt;/h2&gt;

&lt;p&gt;Here's what actually burns money: caches nobody is using.&lt;/p&gt;

&lt;p&gt;ElastiCache has no scale-to-zero for node-based clusters. A cache with zero hits costs exactly the same as one handling millions of requests. This is the pattern we see most often:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A team provisions a cache for a microservice, then the service is deprecated&lt;/li&gt;
&lt;li&gt;Dev/staging caches left running after the project ends&lt;/li&gt;
&lt;li&gt;A "temporary" cache for a migration that became permanent infrastructure&lt;/li&gt;
&lt;li&gt;A replicated cluster in non-production where a single node would suffice&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A three-node cache.r7g.xlarge cluster running idle for a year at Valkey on-demand rates: &lt;strong&gt;$9,186 wasted&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Over-Provisioned Caches Are Nearly as Bad
&lt;/h2&gt;

&lt;p&gt;Beyond idle caches, oversized nodes are the second biggest source of waste. Teams pick a large node type during initial setup, the workload stabilizes at a fraction of capacity, and nobody revisits the sizing.&lt;/p&gt;

&lt;p&gt;A cache.r6g.xlarge running at 6% CPU with active connections is doing real work — but it's doing it on a node that's 3–4× larger than needed. Downsizing from cache.r6g.xlarge to cache.r6g.large can cut costs by 40–50% with no performance impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Spot the Waste
&lt;/h2&gt;

&lt;p&gt;Check these CloudWatch metrics for each cluster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CacheHits:&lt;/strong&gt; Zero for 14+ days means nothing is reading from this cache&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CurrConnections:&lt;/strong&gt; Zero means nothing is even connecting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EngineCPUUtilization:&lt;/strong&gt; Consistently under 10% with active connections means the node is oversized&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quick CLI inventory of all your ElastiCache clusters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws elasticache describe-cache-clusters &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--show-cache-node-info&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'CacheClusters[*].{
    ClusterId:CacheClusterId,
    Engine:Engine,
    EngineVersion:EngineVersion,
    NodeType:CacheNodeType,
    NumNodes:NumCacheNodes,
    Status:CacheClusterStatus
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If any of those clusters show an engine version approaching EOL, you're on the clock for an Extended Support surcharge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloudcostwise.io/" rel="noopener noreferrer"&gt;CloudWise&lt;/a&gt; detects idle ElastiCache clusters by analyzing CloudWatch cache hit metrics over 14 days, flags oversized nodes running under 10% CPU, and alerts you when clusters are approaching or already incurring Extended Support surcharges. Three detectors, one scan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloudcostwise.io/" rel="noopener noreferrer"&gt;CloudWise&lt;/a&gt; automates AWS cost analysis across 180+ waste detectors. Try it at &lt;a href="https://cloudcostwise.io/" rel="noopener noreferrer"&gt;cloudcostwise.io&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>elasticache</category>
      <category>redis</category>
      <category>valkey</category>
    </item>
    <item>
      <title>How Timestream Actually Bills: A Breakdown for Engineers</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:56:59 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/how-timestream-actually-bills-a-breakdown-for-engineers-56b3</link>
      <guid>https://dev.to/cloudwiseteam/how-timestream-actually-bills-a-breakdown-for-engineers-56b3</guid>
      <description>&lt;p&gt;Timestream can look simple on the bill until you break down the line items. Most teams think in terms of "stored data," but Amazon Timestream for LiveAnalytics is billed across multiple meters that move independently.&lt;/p&gt;

&lt;p&gt;If you only watch one number, you can miss where most of the spend actually comes from.&lt;/p&gt;

&lt;h2&gt;
  
  
  First: Know Which Timestream Product You Are Using
&lt;/h2&gt;

&lt;p&gt;AWS now has two Timestream offerings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Timestream for LiveAnalytics&lt;/strong&gt; (serverless, billed by writes, query compute, memory store, magnetic store)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Timestream for InfluxDB&lt;/strong&gt; (managed InfluxDB, billed by DB instance-hours and storage)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post focuses on &lt;strong&gt;Timestream for LiveAnalytics&lt;/strong&gt;, where most billing misunderstandings happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  The LiveAnalytics Billing Model (What Actually Ticks)
&lt;/h2&gt;

&lt;p&gt;For Timestream for LiveAnalytics, AWS charges separately for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Writes&lt;/strong&gt;: billed by amount of data written (rounded to nearest KiB), often shown in pricing examples as a per-million 1 KiB write unit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queries&lt;/strong&gt;: billed by &lt;strong&gt;Timestream Compute Units (TCUs)&lt;/strong&gt; consumed over time (TCU-hours), not by a flat per-query fee.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory store&lt;/strong&gt;: billed by &lt;strong&gt;GB-hour&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Magnetic store&lt;/strong&gt;: billed by &lt;strong&gt;GB-month&lt;/strong&gt; (with account/region minimums for magnetic storage usage).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So the statement "Timestream is $0.10/GB-month" is not accurate for LiveAnalytics. That kind of single storage rate framing is incomplete and often wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Teams Get Surprised
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Query charges are compute-time based, not per query count
&lt;/h3&gt;

&lt;p&gt;A dashboard running one heavy query every few seconds can cost more than many lightweight queries. Query cost follows TCU consumption and duration.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Memory retention is expensive relative to magnetic retention
&lt;/h3&gt;

&lt;p&gt;Keeping a long retention period in memory store drives GB-hour charges. Moving older data to magnetic store usually lowers storage cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) High write frequency amplifies ingestion cost fast
&lt;/h3&gt;

&lt;p&gt;Small records at high frequency still add up, especially without batching and schema optimization.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) "Idle table" thinking can be misleading
&lt;/h3&gt;

&lt;p&gt;In LiveAnalytics, empty or unused tables are not the main cost driver. Spend usually comes from write volume, query compute, and retained data in memory/magnetic tiers.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Better Mental Model for Retention
&lt;/h2&gt;

&lt;p&gt;Retention decisions directly shape spend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Short memory retention&lt;/strong&gt; for hot, low-latency workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Longer magnetic retention&lt;/strong&gt; for historical analysis&lt;/li&gt;
&lt;li&gt;Keep only what is needed in memory store&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your query patterns are mostly historical and not sub-second operational reads, memory retention is often set too high.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Cost Review Checklist
&lt;/h2&gt;

&lt;p&gt;Run this monthly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Writes&lt;/strong&gt;: are records batched efficiently? Are you writing unnecessary dimensions/measures?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queries&lt;/strong&gt;: which workloads consume most TCU time? Any dashboards refreshing too frequently?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory store&lt;/strong&gt;: is hot retention longer than real operational need?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Magnetic store&lt;/strong&gt;: is long-term retention aligned with compliance and analytics requirements?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Table lifecycle&lt;/strong&gt;: are stale datasets still retained without business need?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  CLI: Inventory Retention Settings Across Tables
&lt;/h2&gt;

&lt;p&gt;Use this to review memory/magnetic retention quickly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws timestream-write list-databases &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Databases[].DatabaseName'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s1"&gt;'\t'&lt;/span&gt; &lt;span class="s1"&gt;'\n'&lt;/span&gt; | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;read &lt;/span&gt;db&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$db&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;continue
  &lt;/span&gt;aws timestream-write list-tables &lt;span class="nt"&gt;--database-name&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$db&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Tables[].TableName'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s1"&gt;'\t'&lt;/span&gt; &lt;span class="s1"&gt;'\n'&lt;/span&gt; | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;read &lt;/span&gt;tbl&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$tbl&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;continue
    &lt;/span&gt;aws timestream-write describe-table &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--database-name&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$db&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--table-name&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$tbl&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'{Database:Table.DatabaseName,Table:Table.TableName,MemoryHours:Table.RetentionProperties.MemoryStoreRetentionPeriodInHours,MagneticDays:Table.RetentionProperties.MagneticStoreRetentionPeriodInDays}'&lt;/span&gt;
  &lt;span class="k"&gt;done
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does not prove usage by itself, but it gives you the retention map you need before optimizing query and write behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;Timestream for LiveAnalytics billing is multi-dimensional:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;writes&lt;/li&gt;
&lt;li&gt;query compute (TCUs)&lt;/li&gt;
&lt;li&gt;memory store&lt;/li&gt;
&lt;li&gt;magnetic store&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you treat it as a single storage bill, you will miss the biggest optimization levers.&lt;/p&gt;

&lt;p&gt;CloudWise helps teams surface these hidden cost patterns and prioritize the fastest savings opportunities across AWS data services.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloudcostwise.io/" rel="noopener noreferrer"&gt;CloudWise&lt;/a&gt; automates AWS cost analysis and waste detection. Try it at &lt;a href="https://cloudcostwise.io/" rel="noopener noreferrer"&gt;cloudcostwise.io&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>finops</category>
      <category>cloudcostoptimization</category>
      <category>timestream</category>
    </item>
    <item>
      <title>The Hidden Costs of Idle EMR Clusters (And How to Stop the Bleed)</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 02 Apr 2026 14:36:43 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/the-hidden-costs-of-idle-emr-clusters-and-how-to-stop-the-bleed-5g2g</link>
      <guid>https://dev.to/cloudwiseteam/the-hidden-costs-of-idle-emr-clusters-and-how-to-stop-the-bleed-5g2g</guid>
      <description>&lt;p&gt;EMR looks simple on the bill. You spin up a cluster, run your Spark jobs, and shut it down. But most teams don't shut it down — and that's where the money disappears.&lt;/p&gt;

&lt;h2&gt;
  
  
  EMR Has Two Price Tags
&lt;/h2&gt;

&lt;p&gt;Every EMR instance carries &lt;strong&gt;two charges&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;EC2 instance cost&lt;/strong&gt; — the standard on-demand rate for the instance type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EMR surcharge&lt;/strong&gt; — an additional per-instance-hour fee, typically 20–25% of the EC2 price&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For a common analytics instance like &lt;code&gt;m5.xlarge&lt;/code&gt; (4 vCPUs, 16 GB RAM) in us-east-1:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Hourly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EC2&lt;/td&gt;
&lt;td&gt;$0.192&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EMR surcharge&lt;/td&gt;
&lt;td&gt;$0.048&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.240/hr&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A 5-node cluster of &lt;code&gt;m5.xlarge&lt;/code&gt; instances costs &lt;strong&gt;$1.20/hr&lt;/strong&gt; — roughly &lt;strong&gt;$876/month&lt;/strong&gt; if left running. That's just compute. Storage is extra.&lt;/p&gt;

&lt;p&gt;Most teams focus on the EC2 line item and completely miss the EMR surcharge. It doesn't show up as a separate line — it's bundled into the "Amazon Elastic MapReduce" charge on your bill, and it adds up fast across multiple clusters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The EBS Trap
&lt;/h2&gt;

&lt;p&gt;Every EMR node gets EBS volumes attached. The default root volume is typically 10–15 GB, but core and task nodes often get larger volumes for HDFS or local shuffle storage.&lt;/p&gt;

&lt;p&gt;Current EBS pricing in us-east-1:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Volume Type&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gp3 (default for new clusters)&lt;/td&gt;
&lt;td&gt;$0.08/GB-month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gp2 (legacy default)&lt;/td&gt;
&lt;td&gt;$0.10/GB-month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;io1 (provisioned IOPS)&lt;/td&gt;
&lt;td&gt;$0.125/GB-month + $0.065/IOPS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A 5-node cluster with 100 GB gp3 per node adds &lt;strong&gt;$40/month&lt;/strong&gt; in storage alone. Not huge — but it never stops charging, even when the cluster is idle.&lt;/p&gt;

&lt;p&gt;The real problem isn't the per-GB rate. It's that &lt;strong&gt;EBS charges continue as long as the cluster exists&lt;/strong&gt;, regardless of whether any jobs are running.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idle Cluster Problem
&lt;/h2&gt;

&lt;p&gt;Here's the scenario that burns money: a cluster in &lt;code&gt;WAITING&lt;/code&gt; state.&lt;/p&gt;

&lt;p&gt;EMR clusters have three relevant states:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RUNNING&lt;/strong&gt; — actively executing steps (Spark, Hive, Presto jobs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WAITING&lt;/strong&gt; — cluster is up, all steps are complete, waiting for new work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TERMINATED&lt;/strong&gt; — shut down, no charges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;WAITING&lt;/code&gt; state is the silent budget killer. The cluster is fully provisioned — all EC2 instances running, all EBS volumes attached, EMR surcharge ticking — but doing zero work. It's an idle engine burning fuel in a parked car.&lt;/p&gt;

&lt;p&gt;This happens more often than you'd think:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dev/test clusters&lt;/strong&gt; spun up for debugging, never terminated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduled pipelines&lt;/strong&gt; where the cluster outlives the job by hours or days&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Keep-alive" clusters&lt;/strong&gt; left running for ad-hoc queries that happen once a week&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failed termination&lt;/strong&gt; where auto-termination was configured but a step error left the cluster hanging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A 10-node &lt;code&gt;r5.2xlarge&lt;/code&gt; cluster in WAITING state costs roughly &lt;strong&gt;$5,256/month&lt;/strong&gt; — EC2 ($0.504/hr × 10 × 730) plus EMR surcharge ($0.126/hr × 10 × 730) plus EBS. For processing zero bytes of data.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Actually Check
&lt;/h2&gt;

&lt;p&gt;If you want to audit your EMR spend, focus on three things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Clusters in WAITING state for more than 24 hours&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws emr list-clusters &lt;span class="nt"&gt;--active&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Clusters[?Status.State==`WAITING`]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any cluster that's been waiting more than a day is almost certainly forgotten. Check the &lt;code&gt;ReadyDateTime&lt;/code&gt; in the timeline to see how long it's been idle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Long-running clusters with no recent steps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some teams run "persistent" EMR clusters for interactive workloads (Jupyter, Presto). These are valid — but they should be right-sized. Check &lt;code&gt;list-steps&lt;/code&gt; to see when the last step actually ran.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws emr list-steps &lt;span class="nt"&gt;--cluster-id&lt;/span&gt; j-XXXXX &lt;span class="nt"&gt;--step-states&lt;/span&gt; COMPLETED &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Steps[0].Status.Timeline.EndDateTime'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the last step completed weeks ago, the cluster is waste.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Auto-termination configuration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;EMR supports auto-termination after idle timeout. If your clusters don't have this enabled, you're one forgotten SSH session away from a surprise bill.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws emr describe-cluster &lt;span class="nt"&gt;--cluster-id&lt;/span&gt; j-XXXXX &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Cluster.AutoTerminationPolicy'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;

&lt;p&gt;For batch workloads, the answer is straightforward: use &lt;strong&gt;transient clusters&lt;/strong&gt;. Spin up, process, terminate. EMR's step execution mode does this automatically — the cluster terminates after the last step completes.&lt;/p&gt;

&lt;p&gt;For interactive workloads, set aggressive auto-termination policies (1–2 hours of idle time) and right-size instance types based on actual utilization, not peak estimates from six months ago.&lt;/p&gt;

&lt;p&gt;And tag everything. You can't optimize what you can't attribute. Use &lt;code&gt;aws:elasticmapreduce:editor-id&lt;/code&gt; and custom cost allocation tags to tie clusters back to teams and projects.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;CloudWise detects idle and long-running EMR clusters automatically, flags the monthly waste, and generates remediation plans. Try it at &lt;a href="https://cloudcostwise.io" rel="noopener noreferrer"&gt;cloudcostwise.io&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>costoptimization</category>
      <category>devops</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>Amazon MQ Pricing: What's Really on Your Bill</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 26 Mar 2026 12:37:23 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/amazon-mq-pricing-whats-really-on-your-bill-48j0</link>
      <guid>https://dev.to/cloudwiseteam/amazon-mq-pricing-whats-really-on-your-bill-48j0</guid>
      <description>&lt;p&gt;Amazon MQ looks simple on the bill until you pull it apart. Most teams spin up a managed ActiveMQ or RabbitMQ broker, send a few messages, and move on. Then the invoice arrives with line items they didn't expect.&lt;/p&gt;

&lt;p&gt;Let's break down exactly where the money goes — and what catches people off guard.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Broker Instance: It's Not a Flat Rate
&lt;/h2&gt;

&lt;p&gt;Amazon MQ charges per broker instance-hour based on the instance type and engine. There's no single "broker fee." The range is wide:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance Type&lt;/th&gt;
&lt;th&gt;Engine&lt;/th&gt;
&lt;th&gt;Hourly Rate&lt;/th&gt;
&lt;th&gt;Monthly (730 hrs)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;mq.t3.micro&lt;/td&gt;
&lt;td&gt;ActiveMQ&lt;/td&gt;
&lt;td&gt;$0.034&lt;/td&gt;
&lt;td&gt;$24.82&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mq.t3.micro&lt;/td&gt;
&lt;td&gt;RabbitMQ&lt;/td&gt;
&lt;td&gt;$0.034&lt;/td&gt;
&lt;td&gt;$24.82&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mq.m5.large&lt;/td&gt;
&lt;td&gt;ActiveMQ&lt;/td&gt;
&lt;td&gt;$0.276&lt;/td&gt;
&lt;td&gt;$201.48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mq.m5.large&lt;/td&gt;
&lt;td&gt;RabbitMQ&lt;/td&gt;
&lt;td&gt;$0.276&lt;/td&gt;
&lt;td&gt;$201.48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mq.m5.xlarge&lt;/td&gt;
&lt;td&gt;ActiveMQ&lt;/td&gt;
&lt;td&gt;$0.552&lt;/td&gt;
&lt;td&gt;$402.96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mq.m5.2xlarge&lt;/td&gt;
&lt;td&gt;ActiveMQ&lt;/td&gt;
&lt;td&gt;$1.104&lt;/td&gt;
&lt;td&gt;$805.92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mq.m5.4xlarge&lt;/td&gt;
&lt;td&gt;ActiveMQ&lt;/td&gt;
&lt;td&gt;$2.208&lt;/td&gt;
&lt;td&gt;$1,611.84&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Prices shown for us-east-1, On-Demand.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A common mistake is assuming the smallest broker is "cheap." Even an mq.t3.micro runs 24/7, costing roughly &lt;strong&gt;$25/month&lt;/strong&gt; before anything else. An mq.m5.large — the default many teams pick — is over &lt;strong&gt;$200/month per broker&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  High Availability Doubles the Compute Cost
&lt;/h2&gt;

&lt;p&gt;ActiveMQ supports active/standby deployment for high availability. This means &lt;strong&gt;two broker instances&lt;/strong&gt; in different Availability Zones. Your compute cost doubles immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single-instance&lt;/strong&gt; mq.m5.large: $201.48/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Active/standby&lt;/strong&gt; mq.m5.large: $402.96/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RabbitMQ achieves HA through a three-node cluster deployment, which means you're paying for &lt;strong&gt;three instances&lt;/strong&gt;. An mq.m5.large RabbitMQ cluster costs roughly &lt;strong&gt;$604/month&lt;/strong&gt; in compute alone.&lt;/p&gt;

&lt;p&gt;Teams often enable HA for production brokers and forget about it on dev/staging environments — where a single instance would be fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage: EBS vs. EFS Is a 3× Price Difference
&lt;/h2&gt;

&lt;p&gt;ActiveMQ brokers on Amazon MQ offer two storage backends:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage Type&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EBS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.10/GB-month&lt;/td&gt;
&lt;td&gt;Standard durability, faster throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EFS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.30/GB-month&lt;/td&gt;
&lt;td&gt;Shared storage for active/standby pairs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;EFS is required for active/standby deployments since both brokers need access to the same persistent store. That's a &lt;strong&gt;3× premium&lt;/strong&gt; over EBS.&lt;/p&gt;

&lt;p&gt;A broker with 50 GB of message storage on EFS costs $15/month in storage alone. Not huge, but it adds up across multiple brokers and environments.&lt;/p&gt;

&lt;p&gt;RabbitMQ brokers use EBS exclusively — no EFS option.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Transfer: The Silent Multiplier
&lt;/h2&gt;

&lt;p&gt;Standard AWS data transfer charges apply to Amazon MQ:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Same AZ&lt;/strong&gt;: Free&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-AZ&lt;/strong&gt; (typical for HA): $0.01/GB each way&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-Region&lt;/strong&gt;: $0.02/GB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internet outbound&lt;/strong&gt;: $0.09/GB (first 10 TB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For active/standby deployments, replication traffic between AZs is cross-AZ data transfer. High-throughput brokers processing millions of messages per day can accumulate meaningful cross-AZ charges that don't appear under the "AmazonMQ" line item on your bill — they show up under general data transfer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem: Idle Brokers
&lt;/h2&gt;

&lt;p&gt;Here's what actually burns money with Amazon MQ: &lt;strong&gt;brokers that nobody is using.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's remarkably common. A team provisions a broker for a proof of concept, connects a few services, then the project pivots. The broker sits there at $200+/month with zero messages flowing through it. No consumers, no producers, no connections — just a running instance charging by the hour.&lt;/p&gt;

&lt;p&gt;Unlike Lambda or SQS, Amazon MQ has no scale-to-zero capability. A broker with zero messages costs the same as one processing thousands per second.&lt;/p&gt;

&lt;p&gt;The pattern we see most often:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Dev/staging brokers left running after the sprint ends&lt;/li&gt;
&lt;li&gt;Migration brokers kept "just in case" after switching to SQS or EventBridge&lt;/li&gt;
&lt;li&gt;HA-enabled brokers in non-production environments&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to Spot the Waste
&lt;/h2&gt;

&lt;p&gt;Look at these CloudWatch metrics for each broker:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TotalMessageCount&lt;/strong&gt;: If this is zero over 7–14 days, the broker is idle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CurrentConnectionsCount&lt;/strong&gt;: Zero connections means nothing is even trying to talk to it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TotalConsumerCount / TotalProducerCount&lt;/strong&gt;: Both at zero confirms no active clients&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If all three are zero for more than a week, you're paying for a parking spot nobody is using.&lt;/p&gt;




&lt;p&gt;CloudWise detects idle Amazon MQ brokers automatically by analyzing CloudWatch connection and message metrics. If your broker has had zero activity for 14 days, CloudWise flags it with the full monthly cost so you can decide whether to keep it or shut it down.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;CloudWise automates AWS cost analysis across 145+ waste detectors. Try it at &lt;a href="https://cloudcostwise.io" rel="noopener noreferrer"&gt;cloudcostwise.io&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>serverless</category>
      <category>devops</category>
    </item>
    <item>
      <title>Your OpenSearch Bill Is Bigger Than You Think: A Technical Cost Breakdown</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 19 Mar 2026 13:02:09 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/your-opensearch-bill-is-bigger-than-you-think-a-technical-cost-breakdown-20gj</link>
      <guid>https://dev.to/cloudwiseteam/your-opensearch-bill-is-bigger-than-you-think-a-technical-cost-breakdown-20gj</guid>
      <description>&lt;p&gt;OpenSearch can look cheap at first glance, then surprise you in the monthly bill.&lt;/p&gt;

&lt;p&gt;Most teams look at one line item and assume that is “the OpenSearch cost.” In reality, OpenSearch spend is a composite of compute, storage, backups, and data movement, and each part scales differently under real workloads.&lt;/p&gt;

&lt;p&gt;If you are trying to reduce spend without breaking search performance, you need to model the full stack, not just one rate card.&lt;/p&gt;

&lt;h2&gt;
  
  
  1) Start with the right mental model
&lt;/h2&gt;

&lt;p&gt;There are two major OpenSearch consumption models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenSearch Service domains (provisioned clusters)&lt;/li&gt;
&lt;li&gt;OpenSearch Serverless collections&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post focuses on &lt;strong&gt;domain-based OpenSearch&lt;/strong&gt;, where cost generally includes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data node instance-hours (often the biggest component)&lt;/li&gt;
&lt;li&gt;Dedicated master node instance-hours (if enabled)&lt;/li&gt;
&lt;li&gt;Warm/cold tier node-hours (if used)&lt;/li&gt;
&lt;li&gt;EBS storage attached to nodes&lt;/li&gt;
&lt;li&gt;EBS performance dimensions (for gp3: provisioned IOPS/throughput beyond baseline)&lt;/li&gt;
&lt;li&gt;Snapshot storage&lt;/li&gt;
&lt;li&gt;Data transfer and network-related charges&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your team uses OpenSearch Serverless, the billing dimensions are different (OCUs and storage), so avoid applying domain formulas to serverless workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  2) Why “simple price per GB” is misleading
&lt;/h2&gt;

&lt;p&gt;A common mistake is saying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“OpenSearch data is $X/GB-month”&lt;/li&gt;
&lt;li&gt;“EBS is $Y/GB-month”&lt;/li&gt;
&lt;li&gt;“Snapshots are $Z/GB-month”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those values can be valid in a specific region and setup, but not as universal truths.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EBS pricing differs by volume type (gp2, gp3, io1, io2, st1, etc.)&lt;/li&gt;
&lt;li&gt;gp3 can add separate performance costs for extra IOPS/throughput&lt;/li&gt;
&lt;li&gt;snapshot charges depend on snapshot type/storage class and service context&lt;/li&gt;
&lt;li&gt;OpenSearch itself charges continuously for active domain infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: two domains with similar data size can have very different monthly costs depending on node family, node count, AZ architecture, and EBS configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  3) A practical monthly estimation formula
&lt;/h2&gt;

&lt;p&gt;For domain-based OpenSearch, a useful engineering estimate is:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Monthly OpenSearch Cost ≈&lt;/code&gt;&lt;br&gt;
&lt;code&gt;(Σ node_hour_rate × node_count × 730)&lt;/code&gt;&lt;br&gt;
&lt;code&gt;+ (EBS_GB × EBS_GB_rate)&lt;/code&gt;&lt;br&gt;
&lt;code&gt;+ (gp3_extra_iops × iops_rate, if applicable)&lt;/code&gt;&lt;br&gt;
&lt;code&gt;+ (gp3_extra_throughput × throughput_rate, if applicable)&lt;/code&gt;&lt;br&gt;
&lt;code&gt;+ warm/cold tier costs&lt;/code&gt;&lt;br&gt;
&lt;code&gt;+ snapshot storage&lt;/code&gt;&lt;br&gt;
&lt;code&gt;+ data transfer&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This will still be an estimate, but it is far closer to reality than a single “per-GB” assumption.&lt;/p&gt;

&lt;h2&gt;
  
  
  4) What to inspect first (high ROI checks)
&lt;/h2&gt;

&lt;p&gt;When I audit OpenSearch cost, I check these first:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Idle domains&lt;/li&gt;
&lt;li&gt;Domains with zero or near-zero search/index traffic for days or weeks.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Easy wins: delete, downscale, or consolidate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Overprovisioned data nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Low CPU + low indexing/search rates + high instance count.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rightsize node families and counts cautiously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EBS mismatch&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gp2 when gp3 would be cheaper for same durability target.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Oversized volumes with consistently low utilization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Snapshot sprawl&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Old manual snapshots with no retention policy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define lifecycle and retention rules.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Non-production environments running 24/7&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dev/test domains that do not need full-time uptime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Schedule down periods where possible.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  5) Fast validation commands
&lt;/h2&gt;

&lt;p&gt;Use CLI/API checks before changing anything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Inventory domains:&lt;br&gt;
&lt;code&gt;aws opensearch list-domain-names&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Domain config and capacity:&lt;br&gt;
&lt;code&gt;aws opensearch describe-domain --domain-name &amp;lt;domain_name&amp;gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage and utilization metrics (CloudWatch):&lt;br&gt;
check request/indexing activity, CPU utilization, memory pressure, free storage, and write/read throughput over at least 14 days.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then correlate with CUR or billing exports before taking action.&lt;/p&gt;

&lt;h2&gt;
  
  
  6) A safer way to communicate pricing
&lt;/h2&gt;

&lt;p&gt;Instead of writing:&lt;br&gt;
“OpenSearch costs $0.25/GB, EBS costs $0.10/GB, snapshots cost $0.095/GB”&lt;/p&gt;

&lt;p&gt;Say this:&lt;br&gt;
“OpenSearch spend is a combination of node-hours, EBS storage/performance, and snapshot/storage retention. Exact rates vary by region, node family, storage class, and deployment choices.”&lt;/p&gt;

&lt;p&gt;That phrasing is both technically correct and operationally useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;OpenSearch cost optimization is not about finding one wrong number. It is about identifying the dominant cost driver for your current architecture, then changing one lever at a time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;eliminate idle&lt;/li&gt;
&lt;li&gt;rightsize nodes&lt;/li&gt;
&lt;li&gt;tune EBS&lt;/li&gt;
&lt;li&gt;enforce snapshot retention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you do those four consistently, you will usually reduce spend without hurting query latency or reliability.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;CloudWise automates AWS cost analysis across 42+ services. Try it at &lt;a href="https://cloudcostwise.io" rel="noopener noreferrer"&gt;cloudcostwise.io&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>finops</category>
      <category>opensearch</category>
      <category>costoptimization</category>
    </item>
    <item>
      <title>The Silent $33/Month Charge: Understanding AWS NAT Gateway Costs</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 12 Mar 2026 13:12:47 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/the-silent-33month-charge-understanding-aws-nat-gateway-costs-19c3</link>
      <guid>https://dev.to/cloudwiseteam/the-silent-33month-charge-understanding-aws-nat-gateway-costs-19c3</guid>
      <description>&lt;p&gt;Most AWS cost conversations focus on EC2 instances and RDS databases. Meanwhile, NAT Gateways quietly burn $32.85 per month &lt;em&gt;each&lt;/em&gt; — whether they process a terabyte of data or zero bytes.&lt;/p&gt;

&lt;h2&gt;
  
  
  How NAT Gateway Billing Works
&lt;/h2&gt;

&lt;p&gt;NAT Gateway pricing has two components:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Rate (us-east-1)&lt;/th&gt;
&lt;th&gt;Monthly Estimate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hourly base charge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.045/hour&lt;/td&gt;
&lt;td&gt;$32.85/month (730 hrs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data processing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.045/GB&lt;/td&gt;
&lt;td&gt;Varies by traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The base charge is the one that catches teams off guard. It runs 24/7 from the moment the NAT Gateway is created until it's deleted. No traffic? Doesn't matter — you're still paying $0.045 for every hour it exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Real Cost Hides: Multi-AZ Deployments
&lt;/h2&gt;

&lt;p&gt;The standard Terraform pattern for a production VPC creates one NAT Gateway per Availability Zone:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_nat_gateway"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;availability_zones&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;allocation_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_eip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nat&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three AZs means three NAT Gateways. That's $98.55/month in base charges alone — before a single byte of data is processed. For a staging environment that mirrors production network architecture, you're paying nearly $100/month for network redundancy that staging doesn't need.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Math at Scale
&lt;/h2&gt;

&lt;p&gt;Let's walk through realistic scenarios:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Small team (1 VPC, 3 AZs):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 NAT Gateways × $32.85 = &lt;strong&gt;$98.55/month&lt;/strong&gt; base&lt;/li&gt;
&lt;li&gt;500 GB data processed × $0.045 = $22.50&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: $121.05/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mid-size company (4 VPCs across dev/staging/prod/sandbox, 3 AZs each):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;12 NAT Gateways × $32.85 = &lt;strong&gt;$394.20/month&lt;/strong&gt; base&lt;/li&gt;
&lt;li&gt;Most non-prod NAT Gateways processing near-zero traffic&lt;/li&gt;
&lt;li&gt;Likely waste: 6–9 idle gateways = &lt;strong&gt;$197–$296/month wasted&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Enterprise (20+ VPCs, multi-region):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;60+ NAT Gateways × $32.85 = &lt;strong&gt;$1,971/month&lt;/strong&gt; base&lt;/li&gt;
&lt;li&gt;Traffic typically concentrated in 1–2 AZs per VPC&lt;/li&gt;
&lt;li&gt;Idle NAT Gateways can easily exceed &lt;strong&gt;$500/month&lt;/strong&gt; in pure waste&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Find Idle NAT Gateways
&lt;/h2&gt;

&lt;p&gt;Two CloudWatch metrics tell you everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;BytesOutToDestination&lt;/code&gt;&lt;/strong&gt; — total bytes sent through the NAT Gateway&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ActiveConnectionCount&lt;/code&gt;&lt;/strong&gt; — number of concurrent active connections&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If both are zero for 7+ days, the NAT Gateway is idle. Here's how to check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List all NAT Gateways&lt;/span&gt;
aws ec2 describe-nat-gateways &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'NatGateways[?State==`available`].{
    ID:NatGatewayId,
    SubnetId:SubnetId,
    VpcId:VpcId,
    State:State
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table

&lt;span class="c"&gt;# Check traffic for a specific NAT Gateway (last 7 days)&lt;/span&gt;
aws cloudwatch get-metric-statistics &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; AWS/NATGateway &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--metric-name&lt;/span&gt; BytesOutToDestination &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;NatGatewayId,Value&lt;span class="o"&gt;=&lt;/span&gt;nat-0abc123def456 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-v-7d&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--end-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--period&lt;/span&gt; 86400 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--statistics&lt;/span&gt; Sum
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If every daily sum is &lt;code&gt;0.0&lt;/code&gt;, that NAT Gateway is costing you $32.85/month for nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternatives for Low-Traffic Environments
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. VPC Endpoints (Gateway type — free)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your private subnets only need to reach S3 or DynamoDB, a Gateway VPC Endpoint handles it with zero hourly or data processing charges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 create-vpc-endpoint &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-id&lt;/span&gt; vpc-abc123 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-name&lt;/span&gt; com.amazonaws.us-east-1.s3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--route-table-ids&lt;/span&gt; rtb-abc123
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This single command can eliminate the NAT Gateway entirely for S3-only workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. NAT Instances (for dev/staging)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;t4g.nano&lt;/code&gt; instance running as a NAT instance costs ~$3.07/month — roughly 10x cheaper than a NAT Gateway. The tradeoff is no managed HA, no automatic scaling, and you manage the instance yourself. For non-production environments, that's often acceptable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Consolidate AZs in non-production&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Staging doesn't need 3 NAT Gateways. Route all private subnets through a single NAT Gateway in one AZ. Cross-AZ data transfer adds $0.01/GB, but at low staging traffic volumes, that's negligible compared to saving $65.70/month in base charges.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Regional NAT Gateway Option
&lt;/h2&gt;

&lt;p&gt;AWS recently introduced Regional NAT Gateways, which span multiple AZs but are billed per AZ per hour. If your Regional NAT Gateway covers 3 AZs, you're charged $0.045 × 3 = $0.135/hour — the same as running 3 individual NAT Gateways. The advantage is operational simplicity, not cost savings.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Quick Audit Checklist
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Count your NAT Gateways&lt;/strong&gt;: &lt;code&gt;aws ec2 describe-nat-gateways --query 'NatGateways[?State==&lt;/code&gt;available&lt;code&gt;]' | jq length&lt;/code&gt; — multiply by $32.85/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check for zero-traffic gateways&lt;/strong&gt;: Query CloudWatch for &lt;code&gt;BytesOutToDestination&lt;/code&gt; over the past 14 days&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review non-production VPCs&lt;/strong&gt;: Do dev/staging environments truly need NAT Gateway HA across 3 AZs?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluate VPC Endpoints&lt;/strong&gt;: If traffic is primarily S3/DynamoDB, Gateway Endpoints are free&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;NAT Gateways are one of those AWS resources where the "set it and forget it" mentality costs real money. A five-minute audit can often save $100–$300/month.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;CloudWise automates AWS cost analysis across 38+ services — including idle NAT Gateway detection. Try it at &lt;a href="https://cloudcostwise.io" rel="noopener noreferrer"&gt;cloudcostwise.io&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>finops</category>
      <category>devops</category>
      <category>cloudoptimization</category>
    </item>
    <item>
      <title>The Hidden Cost Layers of EC2 (And Why Stopped Instances Still Drain Your Budget)</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 05 Mar 2026 20:04:13 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/the-hidden-cost-layers-of-ec2-and-why-stopped-instances-still-drain-your-budget-5bmf</link>
      <guid>https://dev.to/cloudwiseteam/the-hidden-cost-layers-of-ec2-and-why-stopped-instances-still-drain-your-budget-5bmf</guid>
      <description>&lt;p&gt;EC2 looks simple on the bill — until you pull it apart. What most teams see is a single line item for instance hours. In reality, every running (and even stopped) EC2 instance generates charges across multiple dimensions, and the overlooked ones tend to accumulate quietly.&lt;/p&gt;

&lt;p&gt;Let's break down where EC2 costs actually come from, what keeps billing you after you click "Stop", and what you can do about it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Obvious: Instance Hours
&lt;/h2&gt;

&lt;p&gt;On-Demand pricing ranges from roughly &lt;strong&gt;$0.0042/hr&lt;/strong&gt; for a &lt;code&gt;t4g.nano&lt;/code&gt; to over &lt;strong&gt;$30/hr&lt;/strong&gt; for GPU-accelerated instances like the &lt;code&gt;p4d.24xlarge&lt;/code&gt;. The exact rate depends on the instance family (general purpose, compute-optimized, memory-optimized, GPU, etc.), the instance size, and the region.&lt;/p&gt;

&lt;p&gt;Most teams have a reasonable handle on this cost. The real surprises come from everything else attached to those instances.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Persistent One: EBS Storage
&lt;/h2&gt;

&lt;p&gt;Every EC2 instance boots from at least one EBS volume, and most production instances have additional data volumes attached. EBS is billed &lt;strong&gt;per GB per month&lt;/strong&gt;, regardless of whether the instance is running:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Volume Type&lt;/th&gt;
&lt;th&gt;Price (us-east-1)&lt;/th&gt;
&lt;th&gt;Typical Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gp3 (General Purpose SSD)&lt;/td&gt;
&lt;td&gt;$0.08/GB/mo&lt;/td&gt;
&lt;td&gt;Default for most workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gp2 (Previous Gen SSD)&lt;/td&gt;
&lt;td&gt;$0.10/GB/mo&lt;/td&gt;
&lt;td&gt;Legacy — still widely used&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;io1/io2 (Provisioned IOPS SSD)&lt;/td&gt;
&lt;td&gt;$0.125/GB/mo + IOPS charges&lt;/td&gt;
&lt;td&gt;High-performance databases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;st1 (Throughput Optimized HDD)&lt;/td&gt;
&lt;td&gt;$0.045/GB/mo&lt;/td&gt;
&lt;td&gt;Big data, log processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sc1 (Cold HDD)&lt;/td&gt;
&lt;td&gt;$0.015/GB/mo&lt;/td&gt;
&lt;td&gt;Infrequent access archives&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you &lt;strong&gt;stop&lt;/strong&gt; an EC2 instance, you stop paying for compute hours. But every EBS volume attached to that instance continues to incur storage charges at the same rate. A stopped instance with 500 GB of gp3 storage still costs you &lt;strong&gt;$40/month&lt;/strong&gt; in EBS alone — indefinitely.&lt;/p&gt;

&lt;p&gt;This is one of the most common sources of invisible cloud waste. Teams spin up instances for a project, stop them "temporarily", and forget about them. Months later, those volumes are still quietly billing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Migration Opportunity: gp2 → gp3
&lt;/h2&gt;

&lt;p&gt;If your account still has EBS volumes running on &lt;code&gt;gp2&lt;/code&gt;, you're overpaying. AWS released &lt;code&gt;gp3&lt;/code&gt; as a direct replacement that is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;20% cheaper&lt;/strong&gt; per GB ($0.08 vs $0.10)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher baseline performance&lt;/strong&gt;: 3,000 IOPS and 125 MB/s throughput included (gp2 baseline is only 100 IOPS per GB, minimum 100)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Independently tunable IOPS and throughput&lt;/strong&gt; — with gp2, you have to increase volume size to get more IOPS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The migration is non-destructive and can be done live via &lt;code&gt;ModifyVolume&lt;/code&gt; with no downtime. There's almost no reason not to migrate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Forgotten Ones: Unattached Volumes and Old Snapshots
&lt;/h2&gt;

&lt;p&gt;When an EC2 instance is terminated, its root volume is deleted by default — but any additional EBS volumes may persist as &lt;strong&gt;unattached volumes&lt;/strong&gt; (status: &lt;code&gt;available&lt;/code&gt;). These volumes have no instance connected but are billed at the full storage rate.&lt;/p&gt;

&lt;p&gt;Similarly, &lt;strong&gt;EBS snapshots&lt;/strong&gt; accumulate over time. Each snapshot is billed based on the actual data blocks stored (not the full volume size), at $0.05/GB/month. A team that takes daily snapshots of a 500 GB volume without a retention policy can easily accumulate terabytes of snapshot storage within a year.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sneaky Ones: Elastic IPs and Data Transfer
&lt;/h2&gt;

&lt;p&gt;Two more cost components are often overlooked:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Elastic IPs&lt;/strong&gt;: A public IPv4 address attached to a running instance has historically been free, but as of February 2024, AWS charges &lt;strong&gt;$0.005/hr ($3.60/month)&lt;/strong&gt; for every public IPv4 address — whether attached to a running instance or not. An Elastic IP on a stopped instance, or one not attached to any instance at all, costs the same.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Transfer&lt;/strong&gt;: Outbound data transfer from EC2 to the internet is $0.09/GB (first 10 TB/month tier in us-east-1). Cross-AZ traffic within the same region costs $0.01/GB in each direction. These charges don't appear under "EC2" in Cost Explorer — they show up under "Data Transfer", making them easy to miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idle Tax
&lt;/h2&gt;

&lt;p&gt;An instance that's running but not doing meaningful work is arguably the most expensive form of waste, because you're paying for both compute and storage. Common culprits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dev/staging instances&lt;/strong&gt; left running outside business hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legacy instances&lt;/strong&gt; that served a purpose months ago but were never decommissioned&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Over-provisioned instances&lt;/strong&gt; where a &lt;code&gt;c5.4xlarge&lt;/code&gt; is running at 3% CPU because someone chose the instance size "just in case"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS CloudWatch metrics make it straightforward to identify instances with sustained low CPU utilization (below 5% over 14 days is a common threshold), but few teams audit this regularly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Can Do
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit stopped instances&lt;/strong&gt;: If an instance has been stopped for more than 30 days, either terminate it (after snapshotting the volumes if needed) or at minimum, detach and delete unused volumes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Migrate gp2 → gp3&lt;/strong&gt;: Free performance improvement and 20% cost reduction on storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set snapshot retention policies&lt;/strong&gt;: Delete snapshots older than 90 days unless compliance requires otherwise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schedule dev/staging instances&lt;/strong&gt;: Use Instance Scheduler or Lambda-based automation to stop instances outside working hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean up unattached volumes&lt;/strong&gt;: Any EBS volume with status &lt;code&gt;available&lt;/code&gt; is costing you money for nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review Elastic IPs&lt;/strong&gt;: Release any EIPs not attached to running infrastructure.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;None of these are difficult individually. The challenge is doing them consistently across every account and region. That's the kind of thing automation handles better than humans.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;CloudWise detects idle EC2 instances, stopped instances with EBS volumes, unattached volumes, gp2-to-gp3 migration opportunities, old snapshots, and more — across all your regions and accounts. Try it at &lt;a href="https://cloudcostwise.io" rel="noopener noreferrer"&gt;cloudcostwise.io&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>devops</category>
      <category>cloudwise</category>
    </item>
    <item>
      <title>How SageMaker Actually Bills: A Breakdown for Engineers</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 26 Feb 2026 18:28:34 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/how-sagemaker-actually-bills-a-breakdown-for-engineers-1cb7</link>
      <guid>https://dev.to/cloudwiseteam/how-sagemaker-actually-bills-a-breakdown-for-engineers-1cb7</guid>
      <description>&lt;p&gt;You deployed a SageMaker notebook to prototype a model. A week later, your AWS bill has a $280 line item you can't explain.&lt;/p&gt;

&lt;p&gt;Sound familiar?&lt;/p&gt;

&lt;p&gt;SageMaker is one of the most powerful ML platforms on AWS — and one of the most confusing to bill for. Unlike EC2 (one instance, one hourly rate), SageMaker has &lt;strong&gt;at least 12 independent billing dimensions&lt;/strong&gt; spread across notebooks, training, endpoints, storage, data processing, and more. Each one ticks on its own meter.&lt;/p&gt;

&lt;p&gt;This post breaks down every SageMaker billing component, shows you the real numbers, and highlights the traps that catch even experienced AWS engineers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Mental Model: SageMaker Is Not One Service
&lt;/h2&gt;

&lt;p&gt;Think of SageMaker as a &lt;strong&gt;collection of services&lt;/strong&gt; that share a console. Each has its own pricing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────┐
│                 SageMaker                       │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ Notebooks│  │ Training │  │   Endpoints   │  │
│  │ (Dev)    │  │ (Build)  │  │   (Serve)     │  │
│  │ $/hr     │  │ $/hr     │  │   $/hr 24/7   │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │Processing│  │ Storage  │  │  Data Wrangler│  │
│  │ (ETL)    │  │ (EBS+S3) │  │   (Prep)      │  │
│  │ $/hr     │  │ $/GB-mo  │  │   $/hr        │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ Canvas   │  │ Feature  │  │  Inference    │  │
│  │ (No-code)│  │ Store    │  │  Recommender  │  │
│  │ $/hr     │  │ $/GB+req │  │  (load test)  │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
└─────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Each box bills independently.&lt;/strong&gt; You can have zero training cost but be paying hundreds for an idle endpoint. Let's walk through each.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Notebook Instances: The Silent $37/month Drain
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How it charges&lt;/strong&gt;: Per-second billing while the notebook is in &lt;code&gt;InService&lt;/code&gt; status. You pay for the instance whether or not you have a kernel running.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance Type&lt;/th&gt;
&lt;th&gt;$/Hour&lt;/th&gt;
&lt;th&gt;Monthly (24/7)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ml.t3.medium&lt;/td&gt;
&lt;td&gt;$0.05&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$36.50&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.t3.large&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$73.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.m5.large&lt;/td&gt;
&lt;td&gt;$0.115&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$83.95&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.m5.xlarge&lt;/td&gt;
&lt;td&gt;$0.23&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$167.90&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.c5.xlarge&lt;/td&gt;
&lt;td&gt;$0.204&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$148.92&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The trap&lt;/strong&gt;: Notebooks keep billing even when you close the browser tab. The instance stays &lt;code&gt;InService&lt;/code&gt; until you explicitly &lt;strong&gt;stop&lt;/strong&gt; it. There's no auto-stop by default.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check for running notebooks right now&lt;/span&gt;
aws sagemaker list-notebook-instances &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--status-equals&lt;/span&gt; InService &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'NotebookInstances[].{Name:NotebookInstanceName,Type:InstanceType,Created:CreationTime}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What catches teams off guard:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You spin up a &lt;code&gt;ml.t3.medium&lt;/code&gt; to test a concept on Monday&lt;/li&gt;
&lt;li&gt;You forget about it Friday&lt;/li&gt;
&lt;li&gt;It runs for 4 weekends = 192 extra hours = &lt;strong&gt;$9.60 wasted&lt;/strong&gt; per forgotten instance&lt;/li&gt;
&lt;li&gt;Multiply by a team of 5 data scientists doing this regularly = real money&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cost-saving tip&lt;/strong&gt;: Use SageMaker &lt;strong&gt;Studio&lt;/strong&gt; notebooks with auto-shutdown instead of classic notebook instances. Or set a CloudWatch alarm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Alarm if notebook is InService for &amp;gt; 12 hours with no API activity&lt;/span&gt;
aws cloudwatch put-metric-alarm &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--alarm-name&lt;/span&gt; &lt;span class="s2"&gt;"sagemaker-notebook-idle"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; &lt;span class="s2"&gt;"AWS/SageMaker"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--metric-name&lt;/span&gt; &lt;span class="s2"&gt;"InvocationsPerInstance"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;NotebookInstanceName,Value&lt;span class="o"&gt;=&lt;/span&gt;my-notebook &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--statistic&lt;/span&gt; Sum &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--period&lt;/span&gt; 43200 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--threshold&lt;/span&gt; 0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--comparison-operator&lt;/span&gt; LessThanOrEqualToThreshold &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--evaluation-periods&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--alarm-actions&lt;/span&gt; arn:aws:sns:us-east-1:123456789012:alert-topic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Notebook Storage (Often Overlooked)
&lt;/h3&gt;

&lt;p&gt;Each notebook instance has an &lt;strong&gt;EBS volume&lt;/strong&gt; (default 5 GB, configurable up to 16 TB). You pay for it even when the notebook is stopped:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Volume Size&lt;/th&gt;
&lt;th&gt;$/Month&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5 GB (default)&lt;/td&gt;
&lt;td&gt;$0.58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50 GB&lt;/td&gt;
&lt;td&gt;$5.80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500 GB&lt;/td&gt;
&lt;td&gt;$58.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At $0.116/GB-month (gp2 pricing), a 500 GB volume costs &lt;strong&gt;$58/month&lt;/strong&gt; just sitting there — even while the notebook is stopped.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Training Jobs: Pay-Per-Second, But Instance Choice Matters Enormously
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How it charges&lt;/strong&gt;: Per-second billing while the training job runs. No charge when it completes. The clock starts at instance launch and stops at job completion or failure.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance Type&lt;/th&gt;
&lt;th&gt;$/Hour&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ml.m5.large&lt;/td&gt;
&lt;td&gt;$0.115&lt;/td&gt;
&lt;td&gt;Tabular data, small models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.m5.xlarge&lt;/td&gt;
&lt;td&gt;$0.23&lt;/td&gt;
&lt;td&gt;Medium models, preprocessing-heavy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.c5.xlarge&lt;/td&gt;
&lt;td&gt;$0.204&lt;/td&gt;
&lt;td&gt;CPU-bound training (gradient boosting)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.p3.2xlarge&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$3.825&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GPU training (deep learning)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.p3.8xlarge&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$14.688&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-GPU training&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.p3.16xlarge&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$28.152&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Distributed deep learning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.p4d.24xlarge&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$37.688&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large model training (8× A100 GPUs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.g5.xlarge&lt;/td&gt;
&lt;td&gt;$1.408&lt;/td&gt;
&lt;td&gt;Cost-effective GPU (single NVIDIA A10G)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.trn1.2xlarge&lt;/td&gt;
&lt;td&gt;$1.3438&lt;/td&gt;
&lt;td&gt;AWS Trainium — optimized for training&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The trap&lt;/strong&gt;: GPU instances are eye-wateringly expensive per hour. A single &lt;code&gt;ml.p3.2xlarge&lt;/code&gt; training job that takes 24 hours costs &lt;strong&gt;$91.80&lt;/strong&gt;. If your hyperparameter tuning job launches 20 variants in parallel, that's &lt;strong&gt;$1,836 in one day&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training Cost Formula
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Training Cost = (instance_count × instance_price_per_second × training_duration_seconds)
              + (storage_gb × $0.116/GB-month × duration_fraction)
              + (data_download_from_s3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Spot Training: 60-90% Savings (With a Catch)
&lt;/h3&gt;

&lt;p&gt;SageMaker supports &lt;strong&gt;managed spot training&lt;/strong&gt; — using EC2 Spot Instances for training jobs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;estimator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;estimator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Estimator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;    &lt;span class="n"&gt;use_spot_instances&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_wait&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# Max time to wait for spot capacity
&lt;/span&gt;    &lt;span class="n"&gt;max_run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# Max training time
&lt;/span&gt;    &lt;span class="c1"&gt;# checkpoint_s3_uri for spot interruption recovery
&lt;/span&gt;    &lt;span class="n"&gt;checkpoint_s3_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3://my-bucket/checkpoints/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Savings&lt;/strong&gt;: Typically &lt;strong&gt;60–90% off&lt;/strong&gt; On-Demand pricing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The catch&lt;/strong&gt;: Spot instances can be interrupted. Your training job gets a 2-minute warning, then terminates. Without checkpointing, you lose all progress and pay for the time already consumed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pro tip&lt;/strong&gt;: Always set &lt;code&gt;checkpoint_s3_uri&lt;/code&gt; when using spot training. This saves model checkpoints to S3 so interrupted jobs can resume instead of restarting from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managed Warm Pools (New)
&lt;/h3&gt;

&lt;p&gt;If you run many training jobs in sequence (e.g., hyperparameter tuning), each job normally provisions a new instance from scratch (2–5 minutes startup). &lt;strong&gt;Warm pools&lt;/strong&gt; keep instances running between jobs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You pay for instance time during the keep-alive period&lt;/li&gt;
&lt;li&gt;But you skip the ~3 minute cold start per job&lt;/li&gt;
&lt;li&gt;Break-even: if you run enough sequential jobs that the saved startup time exceeds the keep-alive cost&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Real-Time Endpoints: The Big One
&lt;/h2&gt;

&lt;p&gt;This is where most SageMaker overspend happens. &lt;strong&gt;Endpoints run 24/7 and bill continuously&lt;/strong&gt;, even with zero traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it charges&lt;/strong&gt;: Per-second billing while the endpoint is &lt;code&gt;InService&lt;/code&gt;. You pay for the full instance(s) whether they receive 0 or 10,000 requests per second.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Monthly Endpoint Cost = instance_count × hourly_rate × 730 hours
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance&lt;/th&gt;
&lt;th&gt;$/Hour&lt;/th&gt;
&lt;th&gt;Monthly (1 instance)&lt;/th&gt;
&lt;th&gt;Monthly (2 instances)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ml.t2.medium&lt;/td&gt;
&lt;td&gt;$0.065&lt;/td&gt;
&lt;td&gt;$47.45&lt;/td&gt;
&lt;td&gt;$94.90&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.m5.large&lt;/td&gt;
&lt;td&gt;$0.115&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$83.95&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$167.90&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.m5.xlarge&lt;/td&gt;
&lt;td&gt;$0.23&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$167.90&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$335.80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.c5.xlarge&lt;/td&gt;
&lt;td&gt;$0.204&lt;/td&gt;
&lt;td&gt;$148.92&lt;/td&gt;
&lt;td&gt;$297.84&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.g4dn.xlarge&lt;/td&gt;
&lt;td&gt;$0.736&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$537.28&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1,074.56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.p3.2xlarge&lt;/td&gt;
&lt;td&gt;$3.825&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2,792.25&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$5,584.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.inf1.xlarge&lt;/td&gt;
&lt;td&gt;$0.297&lt;/td&gt;
&lt;td&gt;$216.81&lt;/td&gt;
&lt;td&gt;$433.62&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Read that again&lt;/strong&gt;: A single &lt;code&gt;ml.p3.2xlarge&lt;/code&gt; endpoint costs &lt;strong&gt;$2,792/month&lt;/strong&gt;. Two instances for high availability: &lt;strong&gt;$5,585/month&lt;/strong&gt;. Many teams deploy this, see it works, and forget to right-size.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Idle Endpoint Problem
&lt;/h3&gt;

&lt;p&gt;A SageMaker endpoint with &lt;strong&gt;zero invocations&lt;/strong&gt; still costs the full hourly rate. Common scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model was deployed for a demo → demo ended → endpoint left running&lt;/li&gt;
&lt;li&gt;A/B testing: old variant endpoint wasn't deleted after the new model won&lt;/li&gt;
&lt;li&gt;Dev/staging endpoints running 24/7 when they're only used during business hours
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find endpoints with zero invocations in the last 7 days&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;endpoint &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;aws sagemaker list-endpoints &lt;span class="nt"&gt;--status-equals&lt;/span&gt; InService &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Endpoints[].EndpointName'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do

  &lt;/span&gt;&lt;span class="nv"&gt;invocations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws cloudwatch get-metric-statistics &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--namespace&lt;/span&gt; AWS/SageMaker &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metric-name&lt;/span&gt; Invocations &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;EndpointName,Value&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$endpoint&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;VariantName,Value&lt;span class="o"&gt;=&lt;/span&gt;AllTraffic &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'7 days ago'&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--end-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--period&lt;/span&gt; 604800 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--statistics&lt;/span&gt; Sum &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Datapoints[0].Sum'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text 2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$invocations&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"None"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$invocations&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"0.0"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⚠️  IDLE: &lt;/span&gt;&lt;span class="nv"&gt;$endpoint&lt;/span&gt;&lt;span class="s2"&gt; (0 invocations in 7 days)"&lt;/span&gt;
  &lt;span class="k"&gt;fi
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Model Endpoints: Pack More Models, Pay Less
&lt;/h3&gt;

&lt;p&gt;If you have many low-traffic models, a &lt;strong&gt;Multi-Model Endpoint (MME)&lt;/strong&gt; lets you load models on-demand into a single endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Standard:  10 models × ml.m5.large × 730 hrs = $839.50/month
MME:       1 endpoint × ml.m5.xlarge × 730 hrs = $167.90/month
                                      Savings:   $671.60/month (80%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tradeoff: cold-start latency when loading a model that isn't cached. Fine for batch-like traffic; bad for latency-sensitive real-time inference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Serverless Inference: Pay Only for What You Use
&lt;/h3&gt;

&lt;p&gt;For sporadic traffic (&amp;lt; ~1000 requests/hour), &lt;strong&gt;Serverless Inference&lt;/strong&gt; eliminates the always-on cost:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pricing:
  - Memory: $0.0000016/GB-second
  - Requests: included

Example: 1000 requests/day, 500ms avg, 4GB memory
  = 1000 × 0.5s × 4GB × $0.0000016/GB-s × 30 days
  = $0.096/month  ← vs $83.95/month for ml.m5.large
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The catch&lt;/strong&gt;: Cold starts (30s–2min for first invocation after idle period) and max 6 MB payload.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Storage: Three Hidden Meters
&lt;/h2&gt;

&lt;p&gt;SageMaker storage costs come from three independent sources:&lt;/p&gt;

&lt;h3&gt;
  
  
  a) Notebook EBS Volumes
&lt;/h3&gt;

&lt;p&gt;Already covered above: $0.116/GB-month, billed even when notebook is stopped.&lt;/p&gt;

&lt;h3&gt;
  
  
  b) Training Job Storage
&lt;/h3&gt;

&lt;p&gt;Each training job gets a temporary EBS volume for input data and model artifacts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Default: 30 GB per instance&lt;/li&gt;
&lt;li&gt;Configurable up to 16 TB&lt;/li&gt;
&lt;li&gt;Billed only during training (per-second)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSD (gp2)&lt;/strong&gt;: $0.116/GB-month, prorated to seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  c) Model Artifacts in S3
&lt;/h3&gt;

&lt;p&gt;Trained models are stored in S3 as &lt;code&gt;.tar.gz&lt;/code&gt; archives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 Standard: $0.023/GB-month&lt;/li&gt;
&lt;li&gt;A 5 GB model × 20 training runs = 100 GB = &lt;strong&gt;$2.30/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;But: large language model checkpoints can be 50–200 GB each&lt;/li&gt;
&lt;li&gt;10 checkpoints × 100 GB = 1 TB = &lt;strong&gt;$23/month&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro tip&lt;/strong&gt;: Set an S3 Lifecycle policy to move old model artifacts to S3 Glacier after 30 days:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Rules"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ID"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Archive old SageMaker models"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Filter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Prefix"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sagemaker/"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Enabled"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Transitions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"StorageClass"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GLACIER"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Processing Jobs (ETL/Feature Engineering)
&lt;/h2&gt;

&lt;p&gt;SageMaker Processing runs containerized data processing workloads:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it charges&lt;/strong&gt;: Same as training — per-second billing for the instances used.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance&lt;/th&gt;
&lt;th&gt;$/Hour&lt;/th&gt;
&lt;th&gt;1-hour ETL job&lt;/th&gt;
&lt;th&gt;8-hour daily ETL (monthly)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ml.m5.xlarge&lt;/td&gt;
&lt;td&gt;$0.23&lt;/td&gt;
&lt;td&gt;$0.23&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$55.20&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.m5.4xlarge&lt;/td&gt;
&lt;td&gt;$0.922&lt;/td&gt;
&lt;td&gt;$0.92&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$221.28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ml.r5.4xlarge&lt;/td&gt;
&lt;td&gt;$1.21&lt;/td&gt;
&lt;td&gt;$1.21&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$290.40&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The trap&lt;/strong&gt;: Processing jobs often run as part of a pipeline. If your pipeline runs daily with 4 instances for 3 hours:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4 instances × ml.m5.xlarge × $0.23/hr × 3 hrs × 30 days = $82.80/month
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not huge — but if someone accidentally sets the pipeline to run hourly instead of daily:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4 × $0.23 × 3 × 24 × 30 = $1,987.20/month  ← oops
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  6. SageMaker Savings Plans
&lt;/h2&gt;

&lt;p&gt;AWS offers &lt;strong&gt;SageMaker Savings Plans&lt;/strong&gt; — commit to a $/hour spend for 1 or 3 years in exchange for a discount:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Commitment&lt;/th&gt;
&lt;th&gt;Discount vs On-Demand&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1-year, no upfront&lt;/td&gt;
&lt;td&gt;~20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year, partial upfront&lt;/td&gt;
&lt;td&gt;~27%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year, all upfront&lt;/td&gt;
&lt;td&gt;~30%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3-year, all upfront&lt;/td&gt;
&lt;td&gt;~64%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What's covered&lt;/strong&gt;: Notebook instances, Studio notebooks, training, processing, batch transform, real-time inference, and serverless inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's NOT covered&lt;/strong&gt;: Data transfer, S3 storage, EBS storage, CloudWatch, and any non-SageMaker charges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Break-even&lt;/strong&gt;: If you consistently spend &amp;gt; $100/month on SageMaker compute, a 1-year no-upfront plan likely saves you money. The commitment is dollar-based (e.g., "$0.50/hour"), not instance-based — so you can shift between instance types.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Data Transfer: The Other Hidden Cost
&lt;/h2&gt;

&lt;p&gt;SageMaker data transfer charges are identical to EC2 data transfer:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;S3 → SageMaker (same region)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker → S3 (same region)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internet → SageMaker&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker → Internet&lt;/td&gt;
&lt;td&gt;$0.09/GB (first 10 TB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-region S3 → SageMaker&lt;/td&gt;
&lt;td&gt;$0.01–0.02/GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-AZ (multi-instance training)&lt;/td&gt;
&lt;td&gt;$0.01/GB each way&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The trap&lt;/strong&gt;: Distributed training across multiple instances in different AZs generates &lt;strong&gt;inter-AZ data transfer&lt;/strong&gt; charges for gradient synchronization. A training job with 8 &lt;code&gt;ml.p3.16xlarge&lt;/code&gt; instances exchanging 100 GB of gradients per hour across AZs can add $2/hour in data transfer alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mitigation&lt;/strong&gt;: Use &lt;strong&gt;SageMaker's managed instance placement&lt;/strong&gt; (it tries to co-locate instances in the same AZ). For distributed training, consider &lt;strong&gt;EFA (Elastic Fabric Adapter)&lt;/strong&gt; enabled instances (&lt;code&gt;ml.p4d.24xlarge&lt;/code&gt;, &lt;code&gt;ml.trn1.32xlarge&lt;/code&gt;) — inter-node traffic over EFA is not charged.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. The Full Bill Breakdown — A Realistic Example
&lt;/h2&gt;

&lt;p&gt;Let's walk through a realistic monthly SageMaker bill for a mid-size ML team (3 data scientists, 2 models in production):&lt;/p&gt;

&lt;h3&gt;
  
  
  Development
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;3 Notebook instances&lt;/td&gt;
&lt;td&gt;ml.m5.large, ~160 hrs/month each&lt;/td&gt;
&lt;td&gt;$55.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notebook storage&lt;/td&gt;
&lt;td&gt;3 × 50 GB&lt;/td&gt;
&lt;td&gt;$17.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Subtotal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$72.60&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Training
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Training jobs (CPU)&lt;/td&gt;
&lt;td&gt;20 jobs × ml.m5.xlarge × 2 hrs&lt;/td&gt;
&lt;td&gt;$9.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training jobs (GPU)&lt;/td&gt;
&lt;td&gt;5 jobs × ml.g5.xlarge × 4 hrs&lt;/td&gt;
&lt;td&gt;$28.16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HPO tuning&lt;/td&gt;
&lt;td&gt;1 job × 50 trials × ml.g5.xlarge × 1 hr&lt;/td&gt;
&lt;td&gt;$70.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training storage&lt;/td&gt;
&lt;td&gt;20 GB per job, 25 jobs&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Subtotal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$107.90&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Inference (The Biggest Line Item)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prod endpoint (Model A)&lt;/td&gt;
&lt;td&gt;2× ml.m5.xlarge × 730 hrs&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$335.80&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prod endpoint (Model B)&lt;/td&gt;
&lt;td&gt;2× ml.g4dn.xlarge × 730 hrs&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,074.56&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Staging endpoint&lt;/td&gt;
&lt;td&gt;1× ml.m5.large × 730 hrs&lt;/td&gt;
&lt;td&gt;$83.95&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Subtotal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,494.31&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Other
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Processing (daily ETL)&lt;/td&gt;
&lt;td&gt;2× ml.m5.xlarge × 1 hr × 30 days&lt;/td&gt;
&lt;td&gt;$13.80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model artifacts in S3&lt;/td&gt;
&lt;td&gt;200 GB across all experiments&lt;/td&gt;
&lt;td&gt;$4.60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data transfer (internet)&lt;/td&gt;
&lt;td&gt;50 GB model serving responses&lt;/td&gt;
&lt;td&gt;$4.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch (metrics)&lt;/td&gt;
&lt;td&gt;Custom endpoint metrics&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Subtotal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$25.90&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Total
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Development:   $72.60    (4.3%)
Training:      $107.90   (6.3%)
Inference:     $1,494.31 (87.9%)  ← 88% of the bill
Other:         $25.90    (1.5%)
──────────────────────────────────
TOTAL:         $1,700.71/month
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The punchline&lt;/strong&gt;: Nearly &lt;strong&gt;88% of this team's SageMaker spend is inference endpoints&lt;/strong&gt; running 24/7. The training — which is what the team actually thinks about and optimizes — is only 6% of the bill.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. The 7 Most Common SageMaker Billing Mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Leaving Notebook Instances Running
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: $37–$168/month per forgotten notebook&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Fix&lt;/strong&gt;: Use SageMaker Studio with auto-shutdown, or set a lifecycle config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Auto-stop notebook after 1 hour of inactivity&lt;/span&gt;
&lt;span class="nv"&gt;IDLE_TIME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3600
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;jupyter notebook list | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"http"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;-eq&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;aws sagemaker stop-notebook-instance &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--notebook-instance-name&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /opt/ml/metadata/resource-name&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Not Deleting Endpoints After Experimentation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: $84–$2,792/month per forgotten endpoint&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Fix&lt;/strong&gt;: Tag endpoints with &lt;code&gt;environment=dev&lt;/code&gt; and run a nightly cleanup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete all dev endpoints older than 3 days&lt;/span&gt;
aws sagemaker list-endpoints &lt;span class="nt"&gt;--status-equals&lt;/span&gt; InService &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Endpoints[?CreationTime&amp;lt;`2026-02-18`].EndpointName'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text | xargs &lt;span class="nt"&gt;-I&lt;/span&gt;&lt;span class="o"&gt;{}&lt;/span&gt; aws sagemaker delete-endpoint &lt;span class="nt"&gt;--endpoint-name&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Over-Provisioning Instance Types
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: 2–10× the necessary spend&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Fix&lt;/strong&gt;: Start with the smallest instance that works. Use CloudWatch to check actual utilization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check CPU utilization of an endpoint&lt;/span&gt;
aws cloudwatch get-metric-statistics &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; /aws/sagemaker/Endpoints &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--metric-name&lt;/span&gt; CPUUtilization &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;EndpointName,Value&lt;span class="o"&gt;=&lt;/span&gt;my-endpoint &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;VariantName,Value&lt;span class="o"&gt;=&lt;/span&gt;AllTraffic &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; 2026-02-14T00:00:00 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--end-time&lt;/span&gt; 2026-02-21T00:00:00 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--period&lt;/span&gt; 3600 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--statistics&lt;/span&gt; Average &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'sort_by(Datapoints, &amp;amp;Timestamp)[].{Time:Timestamp,CPU:Average}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If average CPU is &amp;lt; 20%, you're likely over-provisioned. An &lt;code&gt;ml.m5.xlarge&lt;/code&gt; at 15% utilization could be an &lt;code&gt;ml.m5.large&lt;/code&gt; (50% cheaper).&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Running Staging Endpoints 24/7
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: $84–$538/month for endpoints used ~8 hrs/day&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Fix&lt;/strong&gt;: Schedule endpoint creation/deletion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Scale staging endpoint to 0 instances at 7 PM, back to 1 at 8 AM
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;application_autoscaling&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application-autoscaling&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register the endpoint as a scalable target
&lt;/span&gt;&lt;span class="n"&gt;application_autoscaling&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register_scalable_target&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ServiceNamespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sagemaker&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ResourceId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;endpoint/staging-model/variant/AllTraffic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScalableDimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sagemaker:variant:DesiredInstanceCount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MinCapacity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MaxCapacity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Scale to 0 at 7 PM UTC
&lt;/span&gt;&lt;span class="n"&gt;application_autoscaling&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_scheduled_action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ServiceNamespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sagemaker&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScheduledActionName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;scale-down-evening&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ResourceId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;endpoint/staging-model/variant/AllTraffic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScalableDimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sagemaker:variant:DesiredInstanceCount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Schedule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cron(0 19 * * ? *)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScalableTargetAction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MinCapacity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MaxCapacity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Not Using Spot for Training
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: 2–10× overpayment on training jobs&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Fix&lt;/strong&gt;: Add &lt;code&gt;use_spot_instances=True&lt;/code&gt; + &lt;code&gt;checkpoint_s3_uri&lt;/code&gt; to every training estimator.&lt;/p&gt;
&lt;h3&gt;
  
  
  6. Ignoring Multi-Model Endpoints for Low-Traffic Models
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: $84+/month per model × N models&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Fix&lt;/strong&gt;: Consolidate into a single MME. Works well for models with &amp;lt; 100 requests/hour.&lt;/p&gt;
&lt;h3&gt;
  
  
  7. No SageMaker Savings Plan
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: 20–64% overpayment on steady-state compute&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Fix&lt;/strong&gt;: Analyze 30 days of SageMaker usage → commit to a 1-year no-upfront Savings Plan for your baseline spend.&lt;/p&gt;


&lt;h2&gt;
  
  
  10. Quick-Reference Billing Cheat Sheet
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Billing Model&lt;/th&gt;
&lt;th&gt;Minimum Charge&lt;/th&gt;
&lt;th&gt;Always On?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Notebook Instance&lt;/td&gt;
&lt;td&gt;Per-second (InService)&lt;/td&gt;
&lt;td&gt;1 second&lt;/td&gt;
&lt;td&gt;Yes, until stopped&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Studio Notebook&lt;/td&gt;
&lt;td&gt;Per-second (running kernel)&lt;/td&gt;
&lt;td&gt;1 second&lt;/td&gt;
&lt;td&gt;No (auto-shutdown capable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training Job&lt;/td&gt;
&lt;td&gt;Per-second (job duration)&lt;/td&gt;
&lt;td&gt;1 second&lt;/td&gt;
&lt;td&gt;No (job-scoped)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Processing Job&lt;/td&gt;
&lt;td&gt;Per-second (job duration)&lt;/td&gt;
&lt;td&gt;1 second&lt;/td&gt;
&lt;td&gt;No (job-scoped)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-Time Endpoint&lt;/td&gt;
&lt;td&gt;Per-second (InService)&lt;/td&gt;
&lt;td&gt;1 second&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes, 24/7&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serverless Endpoint&lt;/td&gt;
&lt;td&gt;Per-request + memory-second&lt;/td&gt;
&lt;td&gt;None (pay per use)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async Endpoint&lt;/td&gt;
&lt;td&gt;Per-second (InService)&lt;/td&gt;
&lt;td&gt;1 second&lt;/td&gt;
&lt;td&gt;Yes (but can scale to 0)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch Transform&lt;/td&gt;
&lt;td&gt;Per-second (job duration)&lt;/td&gt;
&lt;td&gt;1 second&lt;/td&gt;
&lt;td&gt;No (job-scoped)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feature Store&lt;/td&gt;
&lt;td&gt;Per-read/write + storage&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Depends on store type&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EBS Storage&lt;/td&gt;
&lt;td&gt;Per-GB-month&lt;/td&gt;
&lt;td&gt;$0.116/GB-month&lt;/td&gt;
&lt;td&gt;Yes, even when stopped&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 Artifacts&lt;/td&gt;
&lt;td&gt;Per-GB-month&lt;/td&gt;
&lt;td&gt;$0.023/GB-month&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Transfer Out&lt;/td&gt;
&lt;td&gt;Per-GB&lt;/td&gt;
&lt;td&gt;$0.09/GB&lt;/td&gt;
&lt;td&gt;Only on egress&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  11. The One Metric That Matters Most
&lt;/h2&gt;

&lt;p&gt;If you track only &lt;strong&gt;one metric&lt;/strong&gt; for SageMaker cost efficiency, track this:&lt;/p&gt;

&lt;p&gt;$$\text{Cost per 1K Invocations} = \frac{\text{Monthly Endpoint Cost}}{\text{Monthly Invocations} \div 1000}$$&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Endpoint: 2× &lt;code&gt;ml.m5.xlarge&lt;/code&gt; = $335.80/month&lt;/li&gt;
&lt;li&gt;Invocations: 500,000/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;$$\frac{\$335.80}{500} = \$0.67 \text{ per 1K invocations}$$&lt;/p&gt;

&lt;p&gt;If that number is above $1.00 — you're likely over-provisioned or should consider serverless inference.&lt;/p&gt;

&lt;p&gt;If it's above $5.00 — you either have very low traffic (delete the endpoint at night) or you're burning money on GPU instances that aren't needed.&lt;/p&gt;

&lt;p&gt;If it's above $20.00 — the endpoint is effectively idle. Delete it.&lt;/p&gt;


&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;SageMaker billing boils down to three rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Endpoints are the #1 cost driver&lt;/strong&gt; — they run 24/7. Everything else is transient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If it's &lt;code&gt;InService&lt;/code&gt;, you're paying&lt;/strong&gt; — notebooks, endpoints, anything with that status.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The bill you expect (training) is rarely the bill you get (inference)&lt;/strong&gt; — teams optimize training time but ignore endpoint sprawl.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The engineers who keep SageMaker costs under control aren't the ones who pick the cheapest instance type. They're the ones who have a &lt;strong&gt;process for deleting things they're not using&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The best SageMaker cost optimization is a cron job&lt;/span&gt;
&lt;span class="c"&gt;# Run weekly: find and report idle SageMaker resources&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Idle Notebook Instances ==="&lt;/span&gt;
aws sagemaker list-notebook-instances &lt;span class="nt"&gt;--status-equals&lt;/span&gt; InService &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'NotebookInstances[].NotebookInstanceName'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; table

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Idle Endpoints (0 invocations, 7d) ==="&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;ep &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;aws sagemaker list-endpoints &lt;span class="nt"&gt;--status-equals&lt;/span&gt; InService &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Endpoints[].EndpointName'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;inv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws cloudwatch get-metric-statistics &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--namespace&lt;/span&gt; AWS/SageMaker &lt;span class="nt"&gt;--metric-name&lt;/span&gt; Invocations &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;EndpointName,Value&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ep&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;VariantName,Value&lt;span class="o"&gt;=&lt;/span&gt;AllTraffic &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-v-7d&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--end-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--period&lt;/span&gt; 604800 &lt;span class="nt"&gt;--statistics&lt;/span&gt; Sum &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Datapoints[0].Sum'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text 2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$inv&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"None"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$inv&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"0.0"&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  ⚠️  &lt;/span&gt;&lt;span class="nv"&gt;$ep&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;em&gt;Building something to automate this? We built &lt;a href="https://cloudcostwise.io" rel="noopener noreferrer"&gt;CloudWise&lt;/a&gt; to automatically detect idle SageMaker notebooks, endpoints, and oversized instances across all your AWS accounts — including air-gapped environments with no internet access. It's one of 90+ waste detectors that scan your infrastructure so you don't have to run scripts manually.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Found this useful?&lt;/strong&gt; Drop a 🔖 bookmark — this is the reference I wish I had when I first got a surprise SageMaker bill.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>machinelearning</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How EC2 + EBS Actually Bills: A Breakdown for Engineers</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 19 Feb 2026 15:18:29 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/how-ec2-ebs-actually-bills-a-breakdown-for-engineers-2al2</link>
      <guid>https://dev.to/cloudwiseteam/how-ec2-ebs-actually-bills-a-breakdown-for-engineers-2al2</guid>
      <description>&lt;h1&gt;
  
  
  The "Stopped Instance" Trap
&lt;/h1&gt;

&lt;p&gt;Every AWS engineer has done it. You spin up an EC2 instance for a quick test, run your script, and then "Stop" the instance thinking you've stopped the bleeding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You haven't.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While the &lt;em&gt;Compute&lt;/em&gt; meter has stopped spinning, the &lt;em&gt;Storage&lt;/em&gt; meter is still running at full speed. And if you're using high-performance storage or have elastic IPs attached, you might be bleeding cash without realizing it.&lt;/p&gt;

&lt;p&gt;In this post, I'm going to break down exactly how an EC2 instance is billed, component by component, so you can stop leaking money on "zombie" resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Compute Layer (EC2)
&lt;/h2&gt;

&lt;p&gt;This is the part everyone understands. When the instance is &lt;code&gt;Running&lt;/code&gt;, you pay. When it's &lt;code&gt;Stopped&lt;/code&gt; or &lt;code&gt;Terminated&lt;/code&gt;, you don't.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;On-Demand:&lt;/strong&gt; You pay by the second (minimum 60 seconds).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Spot:&lt;/strong&gt; You pay the market price, which fluctuates.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Savings Plans/RIs:&lt;/strong&gt; You commit to usage in exchange for a discount.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Gotcha:&lt;/strong&gt; If you use a "Hibernate" stop instead of a regular stop, you are still paying for the RAM state stored on disk (more on that below).&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The Storage Layer (EBS) - The Silent Killer
&lt;/h2&gt;

&lt;p&gt;This is where 90% of "phantom costs" come from.&lt;/p&gt;

&lt;p&gt;When you launch an EC2 instance, it almost always comes with an EBS volume (the root drive). &lt;strong&gt;This volume exists independently of the instance.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Scenario:&lt;/strong&gt; You launch an &lt;code&gt;m5.large&lt;/code&gt; with a 100GB &lt;code&gt;gp3&lt;/code&gt; volume.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Action:&lt;/strong&gt; You stop the instance.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Result:&lt;/strong&gt; You stop paying for the &lt;code&gt;m5.large&lt;/code&gt; ($0.096/hr), but you &lt;strong&gt;continue paying&lt;/strong&gt; for the 100GB &lt;code&gt;gp3&lt;/code&gt; volume ($0.08/GB/month).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have 100 "stopped" dev instances sitting around, that's 10TB of storage you're paying for every month. That's ~$800/month for literally nothing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "IOPS" Trap
&lt;/h3&gt;

&lt;p&gt;With &lt;code&gt;gp3&lt;/code&gt; and &lt;code&gt;io2&lt;/code&gt; volumes, you can provision extra IOPS and Throughput. These are billed &lt;strong&gt;separately&lt;/strong&gt; from the storage capacity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Storage:&lt;/strong&gt; $0.08/GB-month&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;IOPS:&lt;/strong&gt; $0.005/provisioned IOPS-month (above 3,000)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Throughput:&lt;/strong&gt; $0.04/provisioned MB/s-month (above 125)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you provision 10,000 IOPS for a database test and then stop the instance, &lt;strong&gt;you are still paying for those 10,000 IOPS&lt;/strong&gt; even though the volume is doing zero reads/writes.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Network Layer (Data Transfer &amp;amp; IPs)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Elastic IPs (EIPs)
&lt;/h3&gt;

&lt;p&gt;This is a classic AWS "tax."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Attached to Running Instance:&lt;/strong&gt; Free (mostly).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Attached to Stopped Instance:&lt;/strong&gt; &lt;strong&gt;$0.005/hour.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Unattached:&lt;/strong&gt; &lt;strong&gt;$0.005/hour.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you stop an instance but keep the static IP, AWS charges you because you are "hogging" a scarce IPv4 address.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Transfer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Inbound:&lt;/strong&gt; Free.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Outbound (Internet):&lt;/strong&gt; Expensive (~$0.09/GB).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cross-AZ:&lt;/strong&gt; If your EC2 instance talks to an RDS database in a different Availability Zone, you pay &lt;strong&gt;$0.01/GB&lt;/strong&gt; in &lt;em&gt;each direction&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. The "Zombie" Snapshot
&lt;/h2&gt;

&lt;p&gt;When you terminate an instance, the root volume usually deletes with it (if "Delete on Termination" is checked). But any &lt;strong&gt;manual snapshots&lt;/strong&gt; you took of that volume remain.&lt;/p&gt;

&lt;p&gt;I've seen accounts with terabytes of snapshots from 2018 for instances that haven't existed in 5 years. At $0.05/GB-month, that adds up fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: A "Clean" Shutdown Workflow
&lt;/h2&gt;

&lt;p&gt;Don't just click "Stop." If you're done with an instance for the day (or week), follow this checklist:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Check for EIPs:&lt;/strong&gt; Release them if you don't need the static IP.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Snapshot &amp;amp; Delete:&lt;/strong&gt; If you need the data but not the compute, take a snapshot of the volume and &lt;strong&gt;delete the volume itself&lt;/strong&gt;. Snapshots are cheaper ($0.05/GB) than active volumes ($0.08/GB).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Tagging:&lt;/strong&gt; Tag everything with &lt;code&gt;Owner&lt;/code&gt; and &lt;code&gt;ExpiryDate&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Automation:&lt;/strong&gt; Use a tool (like CloudWise or a simple Lambda) to scan for "Available" volumes (volumes not attached to any instance) and delete them after 7 days.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Summary Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Billed When Running?&lt;/th&gt;
&lt;th&gt;Billed When Stopped?&lt;/th&gt;
&lt;th&gt;Billed When Terminated?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EC2 Compute&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EBS Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;✅ &lt;strong&gt;YES&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;❌ No (if deleted)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EBS IOPS/Throughput&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;✅ &lt;strong&gt;YES&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;❌ No (if deleted)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Elastic IP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ No (usually)&lt;/td&gt;
&lt;td&gt;✅ &lt;strong&gt;YES&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;✅ &lt;strong&gt;YES&lt;/strong&gt; (if not released)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Transfer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Stop paying for air. Check your "Volumes" tab today.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Rick, building &lt;a href="https://cloudcostwise.io" rel="noopener noreferrer"&gt;CloudWise&lt;/a&gt; to automate this cleanup for you. I write about AWS cost optimization and DevOps every week.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>devops</category>
      <category>finops</category>
    </item>
    <item>
      <title>How I Built an "Agentic" AWS Cost Optimizer (That Doesn't Break Production)</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Tue, 17 Feb 2026 19:12:08 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/how-i-built-an-agentic-aws-cost-optimizer-that-doesnt-break-production-d77</link>
      <guid>https://dev.to/cloudwiseteam/how-i-built-an-agentic-aws-cost-optimizer-that-doesnt-break-production-d77</guid>
      <description>&lt;p&gt;I’ve spent 25 years in the Software Industry, and I’ve learned one universal truth: &lt;strong&gt;Engineers are terrified of deleting things.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We all have that one EC2 instance named &lt;code&gt;test-do-not-delete-final&lt;/code&gt; that has been running for 3 years. We know it’s probably waste. The dashboard says it’s waste. But nobody deletes it. Why?&lt;/p&gt;

&lt;p&gt;Because the &lt;strong&gt;risk&lt;/strong&gt; of breaking production is infinite, and the &lt;strong&gt;reward&lt;/strong&gt; of saving $50/month is zero.&lt;/p&gt;

&lt;p&gt;This is the "Fear Tax." And it’s why most FinOps tools fail. They give you a list of 1,000 "optimization opportunities," and you ignore them all because you don't have time to manually verify safety for each one.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;CloudWise Agentic Tier&lt;/strong&gt; to solve this. It’s an agent that doesn't just &lt;em&gt;find&lt;/em&gt; waste—it safely &lt;em&gt;removes&lt;/em&gt; it after explicit approval, with a rollback guarantee.&lt;/p&gt;

&lt;p&gt;Here is the technical deep dive on how I built the safety architecture using Python, Boto3, and Cross-Account IAM Roles.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: "Safety First"
&lt;/h2&gt;

&lt;p&gt;The core design philosophy is &lt;strong&gt;Reversibility&lt;/strong&gt;. Every destructive action must be reversible. If it can't be undone, the agent isn't allowed to touch it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Scan:&lt;/strong&gt; Identify idle resources (e.g., EBS volumes unattached &amp;gt; 7 days).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Pre-Check:&lt;/strong&gt; Run read-only calls to verify the resource state and resolve dependencies.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Snapshot:&lt;/strong&gt; Take a final backup (e.g., &lt;code&gt;CreateSnapshot&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Dry Run:&lt;/strong&gt; Simulate the deletion to check for IAM permissions and dependencies.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Execute:&lt;/strong&gt; Perform the destructive action.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Rollback (Optional):&lt;/strong&gt; If anything breaks, one-click restore.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmulzk4ss97amaz7gutnl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmulzk4ss97amaz7gutnl.png" alt="CloudWise Agentic Architecture" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The "Secret Sauce": Pre-Checks &amp;amp; Placeholders
&lt;/h2&gt;

&lt;p&gt;Most tools just run &lt;code&gt;boto3.client('ec2').delete_volume()&lt;/code&gt;. That’s dangerous.&lt;/p&gt;

&lt;p&gt;My agent uses a &lt;strong&gt;Pre-Check Phase&lt;/strong&gt; to verify the resource state &lt;em&gt;before&lt;/em&gt; generating the execution plan. It also resolves dynamic placeholders.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Pre-Check Logic
&lt;/h3&gt;

&lt;p&gt;Before we even think about deleting, we run a read-only probe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_execute_pre_checks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pre_checks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Run read-only API calls to verify resource state.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pre_checks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# e.g. service="ec2", action="describe_volumes", params={"VolumeIds": ["vol-123"]}
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;service&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;params&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Dynamic Placeholder Resolution
&lt;/h3&gt;

&lt;p&gt;The planner doesn't always know the ID of the snapshot it &lt;em&gt;will&lt;/em&gt; create. So I implemented a placeholder system.&lt;/p&gt;

&lt;p&gt;The plan might look like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;code&gt;ec2:CreateSnapshot&lt;/code&gt; (Target: &lt;code&gt;vol-123&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt; &lt;code&gt;ec2:DeleteVolume&lt;/code&gt; (Target: &lt;code&gt;vol-123&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But if we need to restore, we need the &lt;strong&gt;Snapshot ID&lt;/strong&gt; that hasn't been created yet.&lt;/p&gt;

&lt;p&gt;The system captures the output of Step 1 and injects it into the Rollback Plan using a placeholder like &lt;code&gt;SNAPSHOT_ID_FROM_PRECHECK&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_resolve_placeholders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pre_check_results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Resolve dynamic placeholders like VOLUME_ID_FROM_PRECHECK
    using data from the pre-check phase.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;lookup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_build_precheck_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pre_check_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;resolved_calls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;api_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;params_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;params&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# Replace placeholders with actual values
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;params_str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;params_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params_str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;params&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params_str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;resolved_calls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resolved_calls&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Security: The "2-Hop" IAM Chain
&lt;/h2&gt;

&lt;p&gt;Security is the biggest blocker for SaaS tools. I use a &lt;strong&gt;2-Hop IAM Architecture&lt;/strong&gt; to ensure strict isolation.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Hop 1 (Service Role):&lt;/strong&gt; The Lambda function assumes a &lt;code&gt;CloudWiseServiceRole&lt;/code&gt; in my account. This acts as a bastion.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Hop 2 (Customer Role):&lt;/strong&gt; The Service Role assumes the &lt;code&gt;CloudWiseRemediationRole&lt;/code&gt; in the &lt;em&gt;customer's&lt;/em&gt; account.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why 2 Hops?
&lt;/h3&gt;

&lt;p&gt;It allows me to rotate the internal Lambda roles without asking 100 customers to update their Trust Policies. The customer only trusts &lt;strong&gt;one&lt;/strong&gt; static Service Role ARN.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Customer Trust Policy
&lt;/h3&gt;

&lt;p&gt;This is the only thing the customer installs. It trusts &lt;strong&gt;my AWS Account&lt;/strong&gt;, not a specific user, but enforces an &lt;code&gt;ExternalId&lt;/code&gt; to prevent "Confused Deputy" attacks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::MY_PROD_ACCOUNT_ID:root"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sts:ExternalId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CUSTOMER_UNIQUE_ID"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: Using &lt;code&gt;root&lt;/code&gt; principal allows me to manage the specific IAM role that assumes this role on my side, without breaking the customer's trust.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Handling Edge Cases (The "In The Trenches" Stuff)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CloudFront is Weird
&lt;/h3&gt;

&lt;p&gt;You can't just update a CloudFront distribution. You need the current &lt;code&gt;ETag&lt;/code&gt; (version ID) to prove you aren't overwriting someone else's changes.&lt;/p&gt;

&lt;p&gt;My agent handles this automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_prepare_cloudfront_update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Fetch current config to get the ETag
&lt;/span&gt;    &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_distribution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;etag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ETag&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Merge our changes
&lt;/span&gt;    &lt;span class="n"&gt;current_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distribution&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DistributionConfig&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;current_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DistributionConfig&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Return the payload with the ETag
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DistributionConfig&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;current_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IfMatch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;etag&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The "Agentic" Future
&lt;/h2&gt;

&lt;p&gt;The term "Agentic" is getting thrown around a lot, but in infrastructure, it has a specific meaning to me: &lt;strong&gt;Software that does the work, not just the analysis.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For FinOps to mature, we have to stop treating "Cost Optimization" as a homework assignment for engineers. It should be a garbage collection process that runs in the background—safe, reversible, and automated.&lt;/p&gt;

&lt;p&gt;If you want to see this in action (or critique my code/architecture), I’m building this in public. You can check out the live tool at &lt;a href="https://cloudcostwise.io" rel="noopener noreferrer"&gt;cloudcostwise.io&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I’m Rick, a solo founder building CloudWise. I write about AWS, Python, and the psychology of engineering.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>finops</category>
      <category>python</category>
      <category>devops</category>
    </item>
    <item>
      <title>FinOps Implementation: A Roadmap for Cost Monitoring in 2026</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 12 Feb 2026 14:34:01 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/finops-implementation-a-roadmap-for-cost-monitoring-in-2026-4kkf</link>
      <guid>https://dev.to/cloudwiseteam/finops-implementation-a-roadmap-for-cost-monitoring-in-2026-4kkf</guid>
      <description>&lt;p&gt;As the founder of CloudWise, a focused AWS cost optimization platform, I have navigated the complexities of AWS cost management firsthand. With a commitment to helping businesses tackle their financial inefficiencies, I’ve learned that effective FinOps (Financial Operations) implementation is crucial for sustainable cloud cost management. In this article, I’ll outline a practical roadmap for cost monitoring that companies can adopt in 2026, leveraging insights gained from analyzing real AWS spending data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding FinOps
&lt;/h2&gt;

&lt;p&gt;FinOps is a cultural practice that combines finance, technology, and business to manage cloud costs effectively. It encourages collaboration between teams to ensure that spending aligns with business objectives. The rise of cloud services has made FinOps increasingly relevant as organizations migrate to the cloud and face unpredictable costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Current Landscape of AWS Costs
&lt;/h3&gt;

&lt;p&gt;AWS spending patterns reveal some striking insights. For instance, a recent analysis of hundreds of AWS accounts showed that up to &lt;strong&gt;30%&lt;/strong&gt; of cloud spending is wasted on underutilized or idle resources. This waste can stem from a lack of visibility into resource usage, poor budgeting practices, and inadequate monitoring of spending trends. &lt;/p&gt;

&lt;p&gt;In 2026, as cloud adoption continues to grow, organizations will need to refine their FinOps practices to address these challenges. Here’s a roadmap based on real-world data and experiences.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Establish a FinOps Culture
&lt;/h2&gt;

&lt;p&gt;Creating a FinOps culture begins with aligning teams around the shared goal of cost efficiency. This involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Education and Training&lt;/strong&gt;: Ensure that finance, engineering, and product teams understand AWS billing and how decisions impact costs. Regular workshops can help demystify AWS pricing models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-Functional Collaboration&lt;/strong&gt;: Foster open communication between teams. Utilize tools that promote transparency and accessibility to cost data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Insight from CloudWise
&lt;/h3&gt;

&lt;p&gt;From our experience, organizations that prioritize a FinOps culture see a &lt;strong&gt;15-20% reduction&lt;/strong&gt; in cloud costs within the first year. This is possible through better resource allocation and informed decision-making.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Implement Robust Cost Monitoring Tools
&lt;/h2&gt;

&lt;p&gt;To effectively manage AWS costs, it’s critical to have robust cost monitoring tools in place. At CloudWise, we focus on the following capabilities:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. AWS Cost Analysis
&lt;/h3&gt;

&lt;p&gt;Utilizing real-time data for AWS cost analysis allows teams to identify spending trends and anomalies quickly. By analyzing historical data, organizations can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand which services account for the majority of costs.&lt;/li&gt;
&lt;li&gt;Identify usage spikes that could indicate potential overspending.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Budget Alerts
&lt;/h3&gt;

&lt;p&gt;Establish budget alerts to notify teams when they approach or exceed set thresholds. This proactive approach allows for timely interventions before costs spiral out of control. &lt;/p&gt;

&lt;h3&gt;
  
  
  3. Resource Optimization
&lt;/h3&gt;

&lt;p&gt;Regularly review resource utilization. Our analysis shows that organizations that actively optimize resources can save an average of &lt;strong&gt;20-30%&lt;/strong&gt; on their AWS bills. Key strategies include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Right-Sizing Instances&lt;/strong&gt;: Continuously monitor instance types and sizes to ensure they match workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identifying Idle Resources&lt;/strong&gt;: Use tools to detect underutilized resources and terminate or downscale them.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example: Budget Alert Implementation
&lt;/h3&gt;

&lt;p&gt;Here’s an example of how to set up budget alerts using AWS Budgets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a new budget&lt;/span&gt;
aws budgets create-budget &lt;span class="nt"&gt;--account-id&lt;/span&gt; &amp;lt;YOUR_ACCOUNT_ID&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--budget&lt;/span&gt; &lt;span class="s1"&gt;'{"BudgetName": "Monthly Budget", "BudgetLimit": {"Amount": "&amp;lt;YOUR_BUDGET_LIMIT&amp;gt;", "Unit": "USD"}, "BudgetType": "COST"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--notifications-with-subscribers&lt;/span&gt; &lt;span class="s1"&gt;'[{"Notification": {"NotificationType": "ACTUAL", "ComparisonOperator": "GREATER_THAN", "Threshold": 80}, "Subscribers": [{"SubscriptionType": "EMAIL", "Address": "&amp;lt;YOUR_EMAIL&amp;gt;"}]}]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script creates a budget that alerts you when actual spending exceeds 80% of your set limit, helping you maintain control over your costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Continuous Improvement and Optimization
&lt;/h2&gt;

&lt;p&gt;FinOps is not a one-time effort; it’s a continuous process. In 2026, organizations will need to adopt a mindset of ongoing optimization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Regular Reviews&lt;/strong&gt;: Schedule quarterly reviews of cloud spending and resource utilization. Use analytics to inform future decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anomaly Detection&lt;/strong&gt;: Implement tools that utilize machine learning to identify unusual spending patterns. This can help catch unexpected costs before they become significant issues.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real Data Insight
&lt;/h3&gt;

&lt;p&gt;From data analysis, we found that companies leveraging anomaly detection tools could reduce unexpected costs by &lt;strong&gt;up to 40%&lt;/strong&gt;. These tools help identify spending spikes that may be overlooked during manual reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Leverage Advanced Analytics
&lt;/h2&gt;

&lt;p&gt;As businesses scale, the complexity of cost management increases. In 2026, advanced analytics will play a crucial role in FinOps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Predictive Analytics&lt;/strong&gt;: Use historical data to predict future spending patterns and make informed budgeting decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Allocation Tags&lt;/strong&gt;: Implement tagging strategies to allocate costs accurately across departments, projects, or teams. This promotes accountability and encourages responsible spending.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example: Tagging Strategy
&lt;/h3&gt;

&lt;p&gt;To implement a tagging strategy, you can use the AWS CLI to apply tags to your resources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Tagging an EC2 instance&lt;/span&gt;
aws ec2 create-tags &lt;span class="nt"&gt;--resources&lt;/span&gt; &amp;lt;INSTANCE_ID&amp;gt; &lt;span class="nt"&gt;--tags&lt;/span&gt; &lt;span class="nv"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Department,Value&lt;span class="o"&gt;=&lt;/span&gt;Marketing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command tags your EC2 instance with the department responsible for its cost, enabling better tracking and accountability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Implementing a successful FinOps strategy requires commitment, the right tools, and a culture of collaboration and accountability. As I reflect on the journey of building CloudWise, I’ve seen how organizations that embrace these principles can significantly reduce their AWS costs and optimize their cloud spending.&lt;/p&gt;

&lt;p&gt;By following this roadmap, businesses can navigate the complexities of cloud cost management in 2026 and beyond. At CloudWise, we remain dedicated to providing insights and tools to help organizations tackle their AWS cost challenges effectively. &lt;/p&gt;

&lt;p&gt;Let's work together to make cloud spending more transparent and manageable, ensuring that your business thrives in the cloud economy.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;CloudWise provides instant AWS cost insights. Check it out at cloudcostwise.io&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>architecture</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>EC2 Spot vs. Reserved: Which Saved Us $5,000 Last Quarter?</title>
      <dc:creator>Rick Wise</dc:creator>
      <pubDate>Thu, 05 Feb 2026 15:44:51 +0000</pubDate>
      <link>https://dev.to/cloudwiseteam/ec2-spot-vs-reserved-which-saved-us-5000-last-quarter-481b</link>
      <guid>https://dev.to/cloudwiseteam/ec2-spot-vs-reserved-which-saved-us-5000-last-quarter-481b</guid>
      <description>&lt;p&gt;As the founder of CloudWise, an AWS cost optimization platform, I’ve spent countless hours analyzing AWS spending patterns—not just for our clients but also for our own infrastructure. I wanted to share insights from our latest findings that saved us $5,000 last quarter by optimizing our EC2 instances, specifically by comparing Spot Instances and Reserved Instances.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding EC2 Pricing Models
&lt;/h2&gt;

&lt;p&gt;AWS offers several pricing models for EC2 instances, but the two most discussed are Spot Instances and Reserved Instances. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spot Instances&lt;/strong&gt; allow you to bid on spare EC2 capacity at discounts of up to 90% off the On-Demand price. They are incredibly cost-effective but can be interrupted by AWS if they need the capacity back.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reserved Instances&lt;/strong&gt;, on the other hand, require a commitment for a one- or three-year term. They provide a significant discount over On-Demand prices in exchange for this commitment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Dilemma
&lt;/h3&gt;

&lt;p&gt;When I started CloudWise as a solo developer, I faced a common dilemma: how to optimize costs without sacrificing performance. As we scaled, our AWS costs grew, and I needed a strategy to curb spending while ensuring our services remained stable and reliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data-Driven Insights
&lt;/h3&gt;

&lt;p&gt;Using our own cost analysis capabilities, I dove into our AWS spending data. Here’s what I found:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Workload Patterns&lt;/strong&gt;: Analyzing our usage patterns, I noticed that many of our workloads were not consistent. During peak hours, we needed reliability, but during off-peak hours, we had significant idle time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost Analysis&lt;/strong&gt;: By analyzing our EC2 spending, we could see that while Reserved Instances provided savings, they didn’t align well with our fluctuating workloads. We were locking ourselves into costs for instances we didn’t always use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Budget Alerts&lt;/strong&gt;: Our budget alerts indicated that while we were well under budget, the savings could be improved further by employing Spot Instances for non-critical workloads.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Approach
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Step 1: Identifying Workloads
&lt;/h4&gt;

&lt;p&gt;We categorized our workloads into two buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Critical Workloads&lt;/strong&gt;: These required high availability and reliability (e.g., production databases).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-Critical Workloads&lt;/strong&gt;: Tasks that could be interrupted, such as batch processing jobs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Step 2: Implementing Spot Instances
&lt;/h4&gt;

&lt;p&gt;For non-critical workloads, we transitioned to Spot Instances. We set up a bidding strategy that allowed us to utilize Spot Instances when prices were low. Our analysis showed that we could save an average of 70% compared to On-Demand prices for these instances.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 3: Retaining Reserved Instances
&lt;/h4&gt;

&lt;p&gt;For our critical workloads, we maintained our Reserved Instances. This approach ensured we had the necessary capacity when needed while benefiting from the cost savings of long-term commitments.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Results
&lt;/h3&gt;

&lt;p&gt;By the end of the quarter, we analyzed our AWS spending and found that this hybrid approach—using Spot Instances for non-critical workloads and Reserved Instances for critical tasks—saved us nearly $5,000. &lt;/p&gt;

&lt;h4&gt;
  
  
  Key Metrics
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost of Spot Instances&lt;/strong&gt;: $2,000 for the quarter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost of Reserved Instances&lt;/strong&gt;: $7,000 for the quarter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Savings Compared to On-Demand&lt;/strong&gt;: $5,000&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Challenges Faced
&lt;/h3&gt;

&lt;p&gt;Transitioning to a mixed strategy wasn’t without challenges. Here are a few hurdles we encountered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Interruption Management&lt;/strong&gt;: Spot Instances can be terminated at any time. We had to implement a robust job queuing and retry mechanism to handle these interruptions gracefully.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;: Keeping track of Spot Instance pricing and availability requires constant vigilance. We utilized our own cost analysis tools to monitor these metrics effectively.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Lessons Learned
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flexibility is Key&lt;/strong&gt;: The ability to adapt your infrastructure to your workload patterns can yield significant savings. It’s worth investing time in understanding your usage trends.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use Automation&lt;/strong&gt;: Automating instance provisioning and monitoring can minimize the overhead of managing Spot Instances and help you respond quickly to changes in pricing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Analyze Regularly&lt;/strong&gt;: Regularly analyzing your AWS spending data is crucial. We rely on our platform’s insights to make informed decisions, which has proven invaluable as we grow.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;In conclusion, our journey towards optimizing AWS costs at CloudWise has been both challenging and rewarding. By leveraging a combination of Spot and Reserved Instances, we found a balance that not only saved us $5,000 last quarter but also provided the flexibility needed to scale our services effectively.&lt;/p&gt;

&lt;p&gt;If you're facing similar AWS cost challenges, I encourage you to take a data-driven approach. Analyze your spending patterns, identify your workload requirements, and don’t hesitate to mix and match instance types. The potential for savings is substantial, and you might be surprised at what you can achieve with the right insights.&lt;/p&gt;




&lt;p&gt;If you're interested in learning more about AWS cost optimization, stay tuned for future insights from our experiences at CloudWise!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;CloudWise provides instant AWS cost insights. Check it out at cloudcostwise.io&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>architecture</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
