<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abhijeet H</title>
    <description>The latest articles on DEV Community by Abhijeet H (@abhijeet_h_ea7533c94).</description>
    <link>https://dev.to/abhijeet_h_ea7533c94</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3769825%2F8c738067-9cee-4ed8-9ec5-153e95692c2d.png</url>
      <title>DEV Community: Abhijeet H</title>
      <link>https://dev.to/abhijeet_h_ea7533c94</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abhijeet_h_ea7533c94"/>
    <language>en</language>
    <item>
      <title>The Math Behind VM Right-Sizing (Stop guessing your Azure SKU)</title>
      <dc:creator>Abhijeet H</dc:creator>
      <pubDate>Fri, 13 Feb 2026 00:35:04 +0000</pubDate>
      <link>https://dev.to/abhijeet_h_ea7533c94/the-math-behind-vm-right-sizing-stop-guessing-your-azure-sku-130n</link>
      <guid>https://dev.to/abhijeet_h_ea7533c94/the-math-behind-vm-right-sizing-stop-guessing-your-azure-sku-130n</guid>
      <description>&lt;p&gt;We have all done this at some point. You are deploying a new application, and the manager asks, "What size VM do we need?"&lt;/p&gt;

&lt;p&gt;You don't want to be the person who crashed the production server because of low RAM. So, what do you do? You take the estimated requirement and multiply it by 2 or 4. "Just to be safe."&lt;/p&gt;

&lt;p&gt;If the load test hit 60% CPU on 4 vCPUs, we request 8 vCPUs. The VM goes live, runs at 12% utilization, and we never look at it again.&lt;/p&gt;

&lt;p&gt;This "Safety-margin culture" is the single biggest reason for cloud waste.&lt;/p&gt;

&lt;p&gt;I am currently building &lt;a href="https://cloudsavvy.io" rel="noopener noreferrer"&gt;CloudSavvy.io&lt;/a&gt; to automate this problem, but today I want to share the core engineering logic and the math you need to implement right-sizing yourself without breaking production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem Statement: The Cost of Static Sizing
&lt;/h2&gt;

&lt;p&gt;Most organizations size VMs at deployment time and never revisit the decision. This is a structural issue.&lt;/p&gt;

&lt;p&gt;Consider a &lt;strong&gt;D8s_v5&lt;/strong&gt; (8 vCPU, 32 GiB) in East US.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; ~$280/month.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actual Usage:&lt;/strong&gt; 11% CPU, 22% Memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A &lt;strong&gt;D4s_v5&lt;/strong&gt; (4 vCPU, 16 GiB) costs ~$140/month. It would handle that load with plenty of buffer. If you have 200 VMs like this, the annual waste reaches six figures.&lt;/p&gt;

&lt;p&gt;The problem is not that engineers over-provision deliberately. The problem is that right-sizing requires continuous, metrics-driven evaluation—and most teams lack the instrumentation to do it systematically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Metrics Required (CPU is not enough)
&lt;/h2&gt;

&lt;p&gt;Many scripts just look at "Average CPU" and suggest a downsize. This is dangerous. You need to analyze four resource dimensions over a &lt;strong&gt;30-day window&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. CPU Utilization
&lt;/h3&gt;

&lt;p&gt;Raw average is insufficient. You need three statistical views:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Average:&lt;/strong&gt; If it is below 20% sustained for 30 days, it is a downsizing candidate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P95 (95th Percentile):&lt;/strong&gt; This captures the realistic peak. If P95 is below 50%, you are definitely over-provisioned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Peak (P99/Max):&lt;/strong&gt; If Peak is high (90%+) but P95 is low, the workload is "bursty." Do not switch to a smaller fixed SKU; consider a &lt;strong&gt;B-series (Burstable)&lt;/strong&gt; instead.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Memory Utilization
&lt;/h3&gt;

&lt;p&gt;This is the most neglected metric. A VM can run at 10% CPU while using 85% of available memory (common for databases and caching workloads).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;memory_utilization_pct = ((total_memory - available_memory) / total_memory) * 100&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If average memory utilization exceeds 80% sustained, the VM is a candidate for &lt;strong&gt;Upsizing&lt;/strong&gt; or a family change (e.g., to E-series), regardless of CPU. If you ignore this, you risk Out Of Memory (OOM) crashes.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Disk IOPS and Throughput
&lt;/h3&gt;

&lt;p&gt;Disk performance constrains VM sizing independently of CPU. Azure VM SKUs have hard ceilings.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard_D4s_v5:&lt;/strong&gt; Max 6,400 IOPS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standard_D2s_v5:&lt;/strong&gt; Max 3,200 IOPS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your workload sustains 5,800 IOPS and you downsize to a D2s because "CPU is low," you will hit I/O throttling and the application will lag. Always compare P95 IOPS against the target SKU limit.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Network Throughput
&lt;/h3&gt;

&lt;p&gt;Similar to disk, network bandwidth is SKU-dependent. If sustained network throughput exceeds 60% of the target SKU's ceiling, &lt;strong&gt;block the downsize&lt;/strong&gt;. Network-bound workloads (like API gateways) often have low CPU but cannot tolerate bandwidth reduction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sizing Decision Logic
&lt;/h2&gt;

&lt;p&gt;You cannot rely on simple thresholds. You need a decision framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here is the logic flow:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Coverage Gate&lt;/strong&gt;&lt;br&gt;
If &lt;code&gt;cpu_hours &amp;lt; 648&lt;/code&gt; (90% of 720 hours/30 days), &lt;strong&gt;BLOCK&lt;/strong&gt;. Do not guess with insufficient data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Classification&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cpu_sustained_low&lt;/code&gt; = (cpu_p95 &amp;lt; 20%) AND (cpu_avg &amp;lt; 15%)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory_low&lt;/code&gt; = (memory_p95 &amp;lt; 40%)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory_high&lt;/code&gt; = (memory_p95 &amp;gt;= 75%)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Action Determination&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;IF&lt;/strong&gt; &lt;code&gt;cpu_sustained_low&lt;/code&gt; AND &lt;code&gt;memory_low&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Action:&lt;/strong&gt; DOWNSIZE within same family.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; D8s_v5 → D4s_v5.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;IF&lt;/strong&gt; &lt;code&gt;cpu_sustained_low&lt;/code&gt; AND &lt;code&gt;memory_high&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Action:&lt;/strong&gt; SWITCH FAMILY to memory-optimized (E-series).&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; D8s_v5 → E4s_v5 (fewer vCPUs, same memory).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;IF&lt;/strong&gt; &lt;code&gt;cpu_high&lt;/code&gt; AND &lt;code&gt;memory_low&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Action:&lt;/strong&gt; SWITCH FAMILY to compute-optimized (F-series).&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; D8s_v5 → F8s_v2 (same vCPU, less memory, higher clock speed).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;IF&lt;/strong&gt; CPU variability is high (stddev/mean &amp;gt; 0.6):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Action:&lt;/strong&gt; RECOMMEND BURSTABLE (B-series).&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; D4s_v5 → B4ms.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Guardrails&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;IOPS Safety:&lt;/strong&gt; IF &lt;code&gt;target_sku_max_iops&lt;/code&gt; &amp;lt; &lt;code&gt;current_disk_iops_p95&lt;/code&gt; * 1.2 → &lt;strong&gt;BLOCK&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production Tag:&lt;/strong&gt; IF resource is tagged "Production" → apply 30% stricter headroom margins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance:&lt;/strong&gt; IF tagged "PCI-DSS" or "HIPAA" → &lt;strong&gt;BLOCK&lt;/strong&gt; automated resize.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example Scenarios
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Scenario A: The Memory-Bound Database&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Current:&lt;/strong&gt; Standard_D8s_v5 (8 vCPU, 32 GiB) — USD 280/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics:&lt;/strong&gt; CPU avg 12%, P95 25% | Memory avg 78%, P95 89%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analysis:&lt;/strong&gt; CPU is underutilized, but memory is near capacity. Downsizing D-series reduces RAM, risking OOM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommendation:&lt;/strong&gt; Switch to &lt;strong&gt;Standard_E4s_v5&lt;/strong&gt; (4 vCPU, 32 GiB).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Savings:&lt;/strong&gt; USD 85/month. Memory preserved, CPU reduced to match actual utilization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenario B: The GPU Mistake&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Current:&lt;/strong&gt; Standard_NC24s_v3 (24 vCPU, 4x V100 GPUs) — USD 9,204/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics:&lt;/strong&gt; GPU utilization avg 22% (single GPU active).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analysis:&lt;/strong&gt; Only 1 of 4 GPUs is active. The workload is a single-model inference service that does not parallelize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommendation:&lt;/strong&gt; Downsize to &lt;strong&gt;Standard_NC6s_v3&lt;/strong&gt; (6 vCPU, 1x V100).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Savings:&lt;/strong&gt; USD 6,903/month.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Engineering Considerations
&lt;/h2&gt;

&lt;p&gt;If you are implementing this, keep in mind:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Telemetry:&lt;/strong&gt; Use &lt;strong&gt;Azure Monitor Metrics API&lt;/strong&gt; (Microsoft.Compute/virtualMachines), not Resource Graph. Resource Graph provides metadata, not performance history.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Sampling Window:&lt;/strong&gt; 30 days is the mandatory minimum to capture monthly batch jobs. 7 days is too risky.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Missing Data:&lt;/strong&gt; Missing metric hours are not zero-utilization hours. If the agent was down, do not interpolate. Block the recommendation.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;ROI Check:&lt;/strong&gt; Calculate the exact monthly cost delta. If savings &amp;lt; USD 5/month, skip it. It's not worth the engineering effort.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Right-sizing is not just about cost minimization—it is cost-to-performance optimization. The goal is to eliminate waste without introducing performance risk.&lt;/p&gt;

&lt;p&gt;A one-time audit is not enough because workloads change. If you automate this logic effectively, you can maintain performance while significantly reducing your Azure bill.&lt;/p&gt;

&lt;p&gt;If you are looking for a tool that automates this entire decision framework, do check out &lt;strong&gt;&lt;a href="https://cloudsavvy.io" rel="noopener noreferrer"&gt;CloudSavvy.io&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let me know in the comments if you have faced issues with IOPS throttling after resizing!&lt;/p&gt;

</description>
      <category>finops</category>
      <category>azure</category>
      <category>infrastructure</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
