DEV Community

Cover image for AWS EFS Emergency Response: How I Spent $69 in 26 Hours (And How to Avoid It)
Yuto Takashi
Yuto Takashi

Posted on

AWS EFS Emergency Response: How I Spent $69 in 26 Hours (And How to Avoid It)

TL;DR

During a Jenkins EFS incident, I switched to Provisioned Throughput (300 MiB/s) for emergency response. It cost $69 for just 26 hours. If I had known about Elastic Throughput, it would have been around $3.50. Here's what I learned about EFS throughput modes and cost optimization.

The Incident

Last week, our Jenkins CI/CD pipeline came to a halt due to EFS metadata IOPS exhaustion. As an emergency measure, I changed the EFS throughput mode to Provisioned Throughput at 300 MiB/s to keep Jenkins running while investigating the root cause.

The next day, I checked AWS Cost Explorer and saw:

$69.00

For 26 hours of usage. Ouch.

Why You Should Care

If you're running EFS for production workloads, understanding throughput modes is critical. A simple configuration choice can mean the difference between $3 and $69 for the same workload.

EFS Throughput Modes: A Quick Comparison

AWS EFS offers three throughput modes:

1. Bursting Throughput (Default)

Cost: Storage cost only

Performance scales with storage size. You get baseline throughput based on your storage capacity, plus burst credits for temporary spikes.

  • ✅ No extra cost
  • ❌ Performance degrades when credits run out (our problem)

2. Provisioned Throughput

Cost: Storage + Throughput cost

Tokyo region: ~$7.2 per MiB/s per month

For 300 MiB/s:

  • Monthly: 300 × $7.2 = $2,160
  • 26 hours: $2,160 × (26/720) ≈ $78 (actual: $69)

  • ✅ Guaranteed performance

  • ❌ Very expensive, billed even when idle

3. Elastic Throughput

Cost: Storage + Actual usage

Tokyo region:

  • Read: $0.04/GB
  • Write: $0.07/GB

For 26 hours with ~50GB usage:

  • 50GB × $0.07 ≈ $3.50

  • ✅ Pay-per-use, auto-scales

  • ❌ Harder to predict costs

Cost Comparison

Mode 26-hour Cost When to Use
Bursting $5.6/month Normal operations
Provisioned $69 Constant high throughput
Elastic $3.50 Spike handling (best for most cases)

Difference: ~$65 (~$9,500 yen)

What I Should Have Done

Instead of jumping to Provisioned Throughput, here's the better approach:

Step 1: Switch to Elastic Throughput

aws efs put-file-system-policy \
  --file-system-id fs-xxxxxx \
  --throughput-mode elastic
Enter fullscreen mode Exit fullscreen mode

This would have:

  • Auto-scaled during investigation
  • Cost only ~$3.50 for the same period
  • No manual capacity planning needed

Step 2: Investigate Root Cause

While Elastic Throughput handles the spike automatically, investigate and fix the underlying issue (in our case, Git temporary files accumulating).

Step 3: Set Up Monitoring

CloudWatch alarms for:

  • PercentIOLimit > 75%
  • Early warning before IOPS exhaustion

Why I Didn't Choose Elastic Throughput

Honestly? I didn't know it existed.

Elastic Throughput was announced in 2022, but I hadn't updated my knowledge. During the emergency, my mental model was:

  1. Bursting = free but unreliable
  2. Provisioned = expensive but guaranteed

I missed the third, better option.

Was the Decision Wrong?

Not entirely. Let's look at ROI:

Cost: $69 (10,000 yen)

Avoided Loss:

  • 10 engineers × 3 hours waiting = 30 person-hours
  • At ~$50/hour = $1,500 in productivity loss
  • Plus deployment delays (hard to quantify)

ROI: ~20x

The decision to prioritize business continuity was correct. But knowing about Elastic Throughput would have achieved the same result for 1/20th the cost.

Lessons Learned

1. Always Research Current Options

Don't rely on old knowledge during emergencies. Take 5 minutes to check AWS documentation for the latest features.

2. Cost Estimation is Part of the Response

"Make it work first" is important, but:

  • List all options
  • Quick cost comparison
  • Choose based on data, not urgency

3. Document and Share

This $69 lesson becomes valuable when shared. Your team (and the community) can learn without paying the same price.

Action Items

If you're using EFS:

  • [ ] Check your current throughput mode
  • [ ] Consider Elastic Throughput for variable workloads
  • [ ] Set up CloudWatch alarms for PercentIOLimit
  • [ ] Document your throughput mode decision process

Bottom Line

Use Elastic Throughput for most production workloads.

It's the best of both worlds:

  • Handles spikes automatically
  • Pay only for what you use
  • No capacity planning required

Provisioned Throughput should be reserved for constant, predictable high-throughput scenarios.

Next time I face a similar situation, I'll reach for Elastic Throughput first.


I write more about technical decision-making and engineering practices on my blog.
Check it out: https://tielec.blog/


References

Top comments (0)