DEV Community: KloudAudit

Your AWS bill is boring now. Your OpenAI bill is where the surprises live.

KloudAudit — Sat, 11 Jul 2026 18:30:32 +0000

Here's a pattern I keep seeing: a team ships an AI feature, usage grows, and three months later someone in finance asks why the Anthropic or OpenAI line item tripled. Nobody can answer immediately, because nobody was watching it the way they watch EC2 or RDS.

Cloud cost tooling has spent a decade building dashboards for compute, storage, and network. AI API spend is still mostly untracked — teams find out what happened after the invoice arrives, not before.

I added a free audit for exactly this gap. Here's what it checks and why each one matters.

Why AI spend behaves differently from cloud spend

Infrastructure waste is usually static — an idle EC2 instance costs the same whether anyone's using it or not. AI spend is dynamic: it scales with usage, model choice, and prompt design, all of which change weekly as a product evolves. A team can be doing everything "right" in January and burning 5x the necessary spend by March, purely because nobody revisited the model routing decision made at launch.

That volatility is exactly why this needs its own checklist instead of being bolted onto a generic cloud audit.

The 12 checks, grouped by category

Model Selection

Using the same (often frontier) model for every task, including ones a cheaper model handles equally well
Deprecated model versions still in production, which frequently carry legacy pricing
No tiering logic — no routing between "cheap enough" and "needs the best model available"

Caching

No response caching for repeated or near-identical prompts
No prompt caching for long, reused system prompts

Spending Controls

No monthly spend cap set on the provider account
No max_tokens limit, so output length is effectively unbounded
No alerting when spend crosses an expected threshold

Architecture

Dev and test environments calling the same paid production endpoints
Individual request calls where a batch API would cut cost significantly

Attribution

No way to tell which feature or team is driving spend
No token usage monitoring or trend visibility over time

Each of these maps to a specific, checkable thing in your codebase or provider dashboard — not a vague "optimize your AI costs" suggestion.

What the free report shows

Same format as the existing cloud cost audit: a spend score, estimated monthly and annual recoverable cost, and the flagged issues sorted by impact. No credentials requested — it's a self-assessment, not a scan of your actual account.

Here's what one flagged issue looks like in the preview:

Using deprecated model versions
Old model IDs carry legacy pricing that runs meaningfully higher than current equivalents for the same or better output.

The paid version (Blueprint, same price as the existing Cost Blueprint) adds the exact fix for each flagged issue — model routing code, caching implementation with the right API parameters, spend cap configuration steps, batch API migration code. Written to be implementable the same afternoon, not "here are some ideas."

Try it

Free audit, 12 questions, about 10 minutes: kloudaudit.eu — select "AI APIs" as the provider.

If you've already gone through a cost audit surprise with your AI spend, I'd genuinely like to hear what caused it — replying here or on the site.

The AWS Charge Silently Eating Your K8s Budget: NAT Gateway Explained

KloudAudit — Sun, 07 Jun 2026 11:41:52 +0000

You get your AWS bill. EC2 looks reasonable. RDS looks fine. Then there's a line item called "NAT Gateway" sitting at $800 and you have no idea why.

This is the charge that catches almost every team running Kubernetes on AWS. Here is exactly what it is, why it grows silently, and how to fix it in 20 minutes.

What NAT Gateway Actually Does

A NAT Gateway lets resources in your private subnets reach the internet without exposing them directly. Every byte of traffic that flows through it costs $0.045 per GB — on top of the hourly charge of $0.045 per hour ($32/month just for existing).

For small workloads that's negligible. For a K8s cluster with 20+ pods constantly pulling images, sending logs, and calling external APIs, it compounds fast.

Why K8s Makes It Worse

Three patterns specific to Kubernetes that silently inflate NAT Gateway costs:

1. ECR image pulls routing through NAT

Every time a node pulls a container image from ECR, that traffic goes through NAT Gateway by default. A cluster that scales frequently — pulling images on new nodes — can generate hundreds of GB of NAT traffic per month just from image pulls.

Fix: Create a VPC endpoint for ECR. Traffic stays inside AWS and costs nothing.

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxxxxxxx \
  --service-name com.amazonaws.eu-west-1.ecr.dkr \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-xxxxxxxx \
  --security-group-ids sg-xxxxxxxx

2. Cross-AZ pod traffic

When a pod in eu-west-1a calls a service whose pod is scheduled in eu-west-1b, that traffic crosses availability zones. Each GB costs $0.01 in data transfer. At scale this adds up fast.

Fix: Use topology-aware routing to prefer same-AZ endpoints:

apiVersion: v1
kind: Service
metadata:
  name: your-service
  annotations:
    service.kubernetes.io/topology-mode: Auto
spec:
  ...

3. S3 traffic routing through NAT

If your pods are reading or writing to S3 without a VPC endpoint, every byte goes through NAT Gateway. At $0.045/GB this destroys any S3 cost savings from storage tiering.

Fix: Create a VPC Gateway endpoint for S3 — it is free:

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxxxxxxx \
  --service-name com.amazonaws.eu-west-1.s3 \
  --vpc-endpoint-type Gateway \
  --route-table-ids rtb-xxxxxxxx

How Much Are You Actually Paying

Check your current NAT Gateway spend:

aws ce get-cost-and-usage \
  --time-period Start=2026-05-01,End=2026-05-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon Virtual Private Cloud"]}}' \
  --group-by Type=DIMENSION,Key=USAGE_TYPE \
  --output table

Look for lines containing NatGateway-Bytes and NatGateway-Hours. That is your total NAT cost split by data processed and hourly charges.

The Fix Priority Order

Fix	Effort	Typical saving
S3 VPC Gateway endpoint	5 minutes	20-40% of NAT cost
ECR VPC Interface endpoint	15 minutes	30-50% of NAT cost
Topology-aware routing	30 minutes	10-20% of NAT cost
Review remaining traffic	Ongoing	Varies

Start with S3 — it is free, takes 5 minutes, and has zero risk. Most teams see immediate impact on their next bill.

What a Real Fix Looks Like

A team running 15 microservices on EKS, spending $1,200/month on NAT Gateway:

Added S3 VPC Gateway endpoint: saved $380/month
Added ECR VPC Interface endpoint: saved $290/month
Enabled topology-aware routing: saved $140/month

Total: $810/month recovered. 50 minutes of work.

The ECR endpoint costs $7.30/month for the interface endpoint itself. The net saving was still $283/month after that cost.

Check All 18 Cost Patterns at Once

NAT Gateway is one of 18 checks that commonly hide recoverable AWS spend. If you want a systematic view of everything — EBS volumes, RDS scheduling, Reserved Instances, security misconfigurations — run the free audit at kloudaudit.eu

No AWS credentials. No signup. 15 minutes.

Samuel Ayodele Adomeh is a Senior DevOps Engineer and Azure Solutions Architect based in Wrocław, Poland. He built KloudAudit after seven years of reviewing cloud bills and seeing the same waste patterns on every infrastructure he worked with.

How to Cut AWS Compute Costs 60% Before End of Quarter Without Migrating Anything

KloudAudit — Mon, 01 Jun 2026 13:20:06 +0000

Your manager just asked you to cut cloud costs by end of quarter.

Your first instinct is to look at migrating to a cheaper provider. Don't. That's a 3-month project minimum and you have 6 weeks.

Here's what actually works fast — three changes that show up in billing within days, not months.

1. Spot Instances for Training and Batch Jobs (saves 60–70%)

If your team is running ML training, ETL jobs, or any batch processing on on-demand EC2, you're paying full price for work that can be interrupted and restarted.

Spot instances run the exact same hardware for 60–70% less. The only requirement is that your job handles interruptions gracefully — which for training jobs with checkpointing, it already does.

For SageMaker training jobs:

estimator = sagemaker.estimator.Estimator(
    ...
    use_spot_instances=True,
    max_wait=7200,        # 2 hour max wait
    max_run=3600,         # 1 hour max run
)

For raw EC2:

aws ec2 request-spot-instances \
  --instance-count 1 \
  --type one-time \
  --launch-specification file://spec.json

Real impact: A team running p3.2xlarge on-demand at $3.06/hour switches to spot at ~$0.91/hour. 100 training hours per month = $215 saved. Every month.

2. Savings Plans for Baseline Compute (saves 30–40%)

Spot works for interruptible workloads. For everything that runs continuously — schedulers, API servers, always-on processing nodes — Reserved Instances or Savings Plans give you 30–40% off with zero changes to your infrastructure.

The commitment is financial, not technical. You're not locked into specific instance types or regions with Compute Savings Plans.

Check your on-demand baseline first:

aws ce get-cost-and-usage \
  --time-period Start=2026-05-01,End=2026-05-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=INSTANCE_TYPE \
  --output table

Find the instance types you run every day without exception. Buy a 1-year no-upfront Savings Plan for that baseline. The discount applies immediately — it shows up in your next billing cycle.

Real impact: $5,000/month in stable EC2 spend becomes ~$3,000/month with a 1-year Compute Savings Plan. $24,000 saved per year. 10 minutes to purchase.

3. Schedule Dev and Staging Clusters (saves 60% of those environments)

Your production cluster needs to run 24/7. Your dev and staging clusters almost certainly do not.

If your team works 8am–8pm, your dev environment is sitting idle for 12 hours every night and 48 hours every weekend. That's 252 hours of idle time per month out of 720 total — 35% of your bill for zero value.

EventBridge rule to stop EC2 instances nightly:

# Stop instances tagged Environment=dev at 8pm UTC
aws events put-rule \
  --schedule-expression "cron(0 20 * * ? *)" \
  --name "StopDevInstances" \
  --state ENABLED

# Start them again at 8am UTC  
aws events put-rule \
  --schedule-expression "cron(0 8 * * ? *)" \
  --name "StartDevInstances" \
  --state ENABLED

For RDS dev databases:

# Stop dev RDS instance
aws rds stop-db-instance \
  --db-instance-identifier your-dev-db

# Note: RDS auto-stops after 7 days — 
# use a Lambda to restart it on schedule

Real impact: A dev environment costing $2,000/month running 24/7 costs $1,300/month on a business-hours schedule. $700/month saved from one EventBridge rule.

The Combined Impact

Change	Effort	Time to implement	Monthly saving
Spot for training/batch	Low	1–2 hours	60–70% of those workloads
Savings Plans for baseline	Very low	10 minutes	30–40% of stable compute
Schedule dev/staging	Low	20 minutes	60% of non-prod environments

A team spending $20,000/month on compute that implements all three can realistically be at $10,000–$12,000/month within 30 days. No migration. No architecture changes. No new vendors.

Before You Start — Find Out What You're Actually Paying For

The three fixes above work best when you know exactly where your compute spend is going. Run this first:

aws ce get-cost-and-usage \
  --time-period Start=2026-05-01,End=2026-05-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --output table

Then drill into EC2 specifically:

aws ce get-cost-and-usage \
  --time-period Start=2026-05-01,End=2026-05-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=INSTANCE_TYPE \
  --filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon EC2"]}}' \
  --output table

This shows you exactly which instance types are costing the most — which tells you where Spot and Savings Plans will have the most impact.

Want a Structured Check Across All 18 Common Patterns?

These three fixes cover compute. But most teams also have recoverable spend hiding in storage, networking, and database that they're not looking at.

If you want a systematic 15-minute check across all 18 patterns — including NAT Gateway overuse, unattached EBS volumes, missing Reserved Instances, and dev RDS running 24/7 — run the free audit at kloudaudit.eu.

No AWS credentials. No signup. Just your answers and an instant savings estimate.

KloudAudit vs AWS Cost Explorer: Why I Stopped Using Cost Explorer for Waste Detection

KloudAudit — Sun, 24 May 2026 07:47:36 +0000

It was a Tuesday morning. The AWS bill had jumped from $6,200 to $8,700 in a single month.

My first instinct was to open Cost Explorer. I filtered by service, drilled into EC2, switched to daily granularity, added a usage type dimension. Two hours later I had a beautiful set of charts showing me that yes, EC2 costs had gone up. I still had no idea which instances, why, or what to do about it.

That's the moment I realised Cost Explorer is a billing visibility tool — not a waste detection tool. There's a difference, and confusing the two costs engineering teams thousands of dollars a month.

What AWS Cost Explorer Is Actually Good At

Let me be fair before I criticise it.

Cost Explorer is genuinely excellent at what it was designed to do:

Billing trend analysis. If your bill jumps month-over-month, Cost Explorer will show you which service drove it. That's useful for finance conversations and budget forecasting.

Reserved Instance recommendations. The RI and Savings Plans recommendations in Cost Explorer are solid. If you're running stable on-demand compute, it'll tell you your potential savings and breakeven period.

Rightsizing recommendations. The rightsizing recommendations tab (under Cost Optimisation Hub) surfaces instances running at low utilisation. It's not perfect, but it's free and built-in.

Cost allocation by tag. If your tagging strategy is solid, Cost Explorer lets you slice spend by team, project, or environment. This is valuable for chargeback models.

For what it is — a billing dashboard — it's well built.

What It Misses

Here's the problem: Cost Explorer starts from your bill and asks "where is the money going?" It doesn't start from your infrastructure and ask "what is wasting money and how do I fix it?"

It requires you to already know what to look for. If you know you have a NAT Gateway problem, Cost Explorer will confirm it. If you don't know to look for NAT Gateway costs, you'll never find them — they'll just show up as "EC2 - Other" and you'll move on.

It shows spend, not waste. A $2,000/month RDS instance might be perfectly justified or completely idle. Cost Explorer shows you the $2,000. It doesn't tell you the instance has been at 2% CPU for three months.

No prioritisation. Even when Cost Explorer surfaces an issue, it doesn't tell you what to fix first. An engineering team with 15 cost issues and limited bandwidth needs to know which three to tackle this sprint. Cost Explorer gives you a list with no ranking.

No actionability. "EC2 costs increased by $400 this month" is an observation. "Your m5.2xlarge in us-east-1 has been idle for 47 days — here's the CLI command to terminate it" is an action. Cost Explorer delivers observations.

It's reactive by design. You open it after the bill arrives. By then, the waste has already happened. For the next month's bill to be lower, you need to identify and fix problems before the billing cycle closes.

How KloudAudit Approaches It Differently

I built KloudAudit after hitting this wall repeatedly across client engagements. The insight was simple: most cloud waste comes from a predictable set of patterns. You don't need real-time API access to detect them — you need a structured set of questions about your infrastructure.

The tool works as a 18-check self-assessment across five categories: compute, storage, network, database, and governance. You answer based on your own knowledge of your infrastructure. No credentials. No IAM roles. No OAuth. No security review required.

The output is a Waste Score (0-100), a savings estimate based on your actual bill size, and a prioritised list of findings sorted by ease of implementation — quick wins first, complex optimisations last.

The findings are sorted by implementation effort by default, not by savings size. This matters: a team that's never done FinOps before shouldn't start by migrating VMs across regions. They should start by deleting the unattached EBS volumes and stopping the dev RDS on weekends.

Side-by-Side Comparison

	AWS Cost Explorer	KloudAudit
Setup time	Already available in your console	15 minutes, no setup
Credentials required	AWS account login	None — ever
Time to first finding	30–120 minutes of exploration	Immediate on audit completion
Output	Charts and spend breakdowns	Prioritised fix list with savings estimates
Actionability	Low — shows spend, not fixes	High — sorted by ease of implementation
AI fix guide	No	Yes — CLI commands, Terraform, verification steps ($79)
Cost	Free (included with AWS)	Free audit, $79 for full blueprint
Best for	Billing visibility, trend analysis	Structured waste detection, first FinOps audit

When to Use Each

These tools solve different problems. You need both.

Use AWS Cost Explorer when:

You want to understand where your budget is going month-over-month
You're preparing a finance report or chargeback allocation
You want RI/Savings Plans recommendations for stable workloads
You already know what to investigate and need the data to confirm it

Use KloudAudit when:

You suspect you're overpaying but don't know where
You're starting a FinOps practice and need a structured starting point
You need to identify quick wins your team can implement this sprint
You want to audit without a security review or procurement process
You're at a company where connecting a third-party tool to your AWS account requires a 3-month approval process

The Real Cost of Using the Wrong Tool

The average engineering team running on AWS wastes 20–45% of their cloud spend. At $8,000/month — a modest bill for a Series A startup — that's $1,600–$3,600/month in recoverable spend.

Cost Explorer will show you that $8,000 bill clearly broken down by service. It won't tell you that $640 of it is a dev RDS instance running 24/7 that could be auto-stopped at 8pm each night with a 20-minute EventBridge setup.

That's the gap. And it compounds every month you leave it unfixed.

Try It

If you've been staring at your AWS bill wondering where to start, run the free KloudAudit audit at kloudaudit.eu. It takes 15 minutes, requires no credentials, and gives you a prioritised list of what to fix first.

Your Cost Explorer will still be there for the billing visibility. KloudAudit gives you the roadmap to make those numbers smaller.

How to cut your AWS RDS costs by 65% in 20 minutes (without touching production)

KloudAudit — Wed, 13 May 2026 11:37:30 +0000

This is the single most common finding in every cloud bill I audit.

Dev and staging RDS instances running 24/7 at production size.

A db.r5.xlarge costs $876/month running continuously. If your team uses it 8 hours a day on weekdays, you're paying for 720 hours and using 170. That's a 76% waste rate on a single instance.

The fix takes 20 minutes and touches nothing in production.

The problem in numbers

db.r5.xlarge on-demand:  $876/month  (720 hrs)
db.r5.xlarge scheduled:  $306/month  (252 hrs — weekdays 8am-7pm)
Monthly saving:          $570/month
Annual saving:           $6,840/year

Multiply that by 2-3 dev/staging environments and you're looking at $15,000–$20,000/year on instances that sleep most of the time.

Step 1: Find your idle RDS instances

# List all RDS instances with their sizes and status
aws rds describe-db-instances \
  --query 'DBInstances[*].{ID:DBInstanceIdentifier,Class:DBInstanceClass,Status:DBInstanceStatus,MultiAZ:MultiAZ}' \
  --output table

Look for anything that:

Has "dev", "staging", "test", or "qa" in the name
Is the same instance class as production
Has Multi-AZ enabled (almost never needed for dev)

Step 2: Create the IAM role

cat > trust-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Service": "lambda.amazonaws.com" },
    "Action": "sts:AssumeRole"
  }]
}
EOF

aws iam create-role \
  --role-name rds-scheduler-role \
  --assume-role-policy-document file://trust-policy.json

aws iam attach-role-policy \
  --role-name rds-scheduler-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonRDSFullAccess

Step 3: Create the Lambda function

# rds_scheduler.py
import boto3
import os

def handler(event, context):
    rds = boto3.client('rds')
    action = event.get('action')  # 'start' or 'stop'
    instances = os.environ.get('RDS_INSTANCES', '').split(',')

    for instance_id in instances:
        if not instance_id.strip():
            continue
        try:
            if action == 'stop':
                rds.stop_db_instance(DBInstanceIdentifier=instance_id.strip())
                print(f'Stopped: {instance_id}')
            elif action == 'start':
                rds.start_db_instance(DBInstanceIdentifier=instance_id.strip())
                print(f'Started: {instance_id}')
        except Exception as e:
            print(f'Error on {instance_id}: {e}')

    return {'status': 'done', 'action': action, 'instances': instances}

Deploy it:

zip scheduler.zip rds_scheduler.py

aws lambda create-function \
  --function-name rds-scheduler \
  --runtime python3.12 \
  --role arn:aws:iam::YOUR_ACCOUNT_ID:role/rds-scheduler-role \
  --handler rds_scheduler.handler \
  --zip-file fileb://scheduler.zip \
  --environment "Variables={RDS_INSTANCES=your-dev-db,your-staging-db}"

Step 4: Create the EventBridge schedules

# Stop at 7pm every weekday
# Adjust UTC offset for your timezone — CET/Warsaw is UTC+1, so 18:00 UTC = 7pm local
aws events put-rule \
  --name "StopDevRDS" \
  --schedule-expression "cron(0 18 ? * MON-FRI *)" \
  --state ENABLED

# Start at 8am every weekday
aws events put-rule \
  --name "StartDevRDS" \
  --schedule-expression "cron(0 7 ? * MON-FRI *)" \
  --state ENABLED

# Get your Lambda ARN
LAMBDA_ARN=$(aws lambda get-function \
  --function-name rds-scheduler \
  --query 'Configuration.FunctionArn' \
  --output text)

# Add targets
aws events put-targets \
  --rule StopDevRDS \
  --targets "Id=1,Arn=$LAMBDA_ARN,Input='{\"action\":\"stop\"}'"

aws events put-targets \
  --rule StartDevRDS \
  --targets "Id=1,Arn=$LAMBDA_ARN,Input='{\"action\":\"start\"}'"

# Allow EventBridge to invoke Lambda
aws lambda add-permission \
  --function-name rds-scheduler \
  --statement-id allow-eventbridge \
  --action lambda:InvokeFunction \
  --principal events.amazonaws.com \
  --source-arn $(aws events describe-rule --name StopDevRDS --query 'Arn' --output text)

Step 5: Downsize the instance too

While you're here — dev doesn't need a db.r5.xlarge. Drop to db.t3.medium for another 70% saving:

aws rds modify-db-instance \
  --db-instance-identifier your-dev-db \
  --db-instance-class db.t3.medium \
  --apply-immediately

Combined result:

db.r5.xlarge 24/7:       $876/month  (original)
db.r5.xlarge scheduled:  $306/month  (schedule only)
db.t3.medium scheduled:  $89/month   (schedule + downsize)

Annual saving:           $9,444/year — from one dev database

Step 6: Add a wake-up Slack command (optional)

Your team will occasionally need the DB outside hours. Give them a self-service way to start it:

# Create a Lambda function URL
aws lambda create-function-url-config \
  --function-name rds-scheduler \
  --auth-type NONE

Then in Slack: Apps → Slash Commands → /start-devdb → point to the URL with body {"action": "start"}.

Engineers can start the DB themselves without waiting for someone with AWS console access.

What this looks like in real teams

I've helped teams set this up in about 20 minutes. The conversation afterwards is always the same:

"I can't believe we were paying for that for 14 months."

Not because they didn't know. Because nobody sat down and looked.

This is one of 18 checks in KloudAudit — a free self-guided cloud cost audit that walks through all of these systematically. No AWS credentials required. Takes 15 minutes.

Quick reference

Action	Command
Find idle RDS	`aws rds describe-db-instances --query ...`
Stop instance now	`aws rds stop-db-instance --db-instance-identifier NAME`
Start instance now	`aws rds start-db-instance --db-instance-identifier NAME`
Check instance state	`aws rds describe-db-instances --db-instance-identifier NAME --query 'DBInstances[0].DBInstanceStatus'`

What's your team's current RDS setup — scheduled or running 24/7?

Drop your setup in the comments — genuinely curious how common this is.

The 5 AWS charges silently draining your budget (and how to fix each one)

KloudAudit — Sun, 03 May 2026 18:29:14 +0000

The 5 AWS charges silently draining your budget (and how to fix each one)

I've reviewed a lot of cloud bills over 7 years as a DevOps engineer.

The waste isn't exotic. It's not misconfigured Kubernetes clusters or obscure data transfer fees nobody knew existed. It's the same five things, over and over, on teams that genuinely believe they're managing costs well.

Here's what they are — and exactly how to fix each one.

1. Dev and staging RDS running 24/7

Your production database needs to run continuously. Your dev database does not.

A db.r5.xlarge runs ~$876/month. If your team uses it 8 hours a day on weekdays, you're paying for 720 hours and using roughly 170. That's $626/month funding nothing.

The fix is a Lambda schedule. Takes 20 minutes to set up, runs forever:

# Stop dev RDS at 7pm every weekday
aws events put-rule \
  --name "StopDevRDS" \
  --schedule-expression "cron(0 17 ? * MON-FRI *)" \
  --state ENABLED

# Start it at 8am
aws events put-rule \
  --name "StartDevRDS" \
  --schedule-expression "cron(0 8 ? * MON-FRI *)" \
  --state ENABLED

Typical saving: 65% on affected instances.

The objection I always hear: "But what if someone needs it on the weekend?" In 7 years I've seen this come up twice. The fix: add a manual start button in your internal tooling. A single Lambda invocation.

2. Everything on On-Demand pricing

On-Demand is the rack rate. The price AWS publishes because they have to, not because they expect serious workloads to stay there.

If you've had stable compute running for 6+ months — and most teams have — you're paying a 30-45% premium for flexibility you're not using.

Compute Savings Plans are the easiest path. No instance family commitment, no region lock-in:

# See exactly how much you'd save
aws savingsplans describe-savings-plans-purchase-recommendation \
  --savings-plans-type COMPUTE_SP \
  --term-in-years ONE_YEAR \
  --payment-option NO_UPFRONT \
  --lookback-period-in-days THIRTY_DAYS

Run that. Read the output. The "Estimated Monthly Savings" figure is money you're leaving on the table every month you wait.

Typical saving: 30-45% on covered compute. Zero architecture changes required.

3. Unattached EBS volumes

When you terminate an EC2 instance, AWS doesn't automatically delete the attached EBS volume. Unless you explicitly configured it to do so when you launched the instance, the volume sits there — charged at $0.10/GB/month — indefinitely.

Find them:

aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime,Type:VolumeType}' \
  --output table

Review the output. Anything created more than 30 days ago from an instance that no longer exists is almost certainly orphaned. Snapshot it for $0.05/GB/month if you're not sure, then delete the volume.

I've found $200–$1,400/month from this command alone. The worst case I've seen: a team that migrated to EKS 18 months prior and left 67 volumes from their old EC2 fleet running the whole time.

Typical saving: variable, but this takes 10 minutes to audit.

4. S3 storage never tiered

S3 Standard is $0.023/GB/month. S3 Glacier Instant Retrieval is $0.004/GB/month.

Your logs from 2023 don't need Standard. Neither do your backups, your old deployment artifacts, or the compliance exports nobody has opened since they were generated.

A lifecycle policy fixes this automatically:

aws s3api put-bucket-lifecycle-configuration \
  --bucket your-bucket-name \
  --lifecycle-configuration '{
    "Rules": [{
      "ID": "AutoTier",
      "Status": "Enabled",
      "Filter": {"Prefix": ""},
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER_IR"}
      ]
    }]
  }'

Set it once. It runs forever. Data moves through tiers automatically as it ages.

Typical saving: 30-60% on storage older than 90 days.

5. NAT Gateway data processing charges

This one surprises people every time.

NAT Gateway charges $0.045 per GB processed — in both directions. If your microservices are calling AWS APIs (S3, DynamoDB, SQS) through a NAT Gateway instead of VPC endpoints, you're paying per-GB for every single request.

At scale this adds up fast. I've seen teams with $800/month NAT Gateway bills that dropped to $60 after adding two free endpoints.

The S3 and DynamoDB Gateway endpoints are free:

# Free endpoint for S3 — traffic no longer routes through NAT
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxxxxxxxx \
  --service-name com.amazonaws.eu-west-1.s3 \
  --route-table-ids rtb-xxxxxxxxx

# Free endpoint for DynamoDB
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxxxxxxxx \
  --service-name com.amazonaws.eu-west-1.dynamodb \
  --route-table-ids rtb-xxxxxxxxx

Typical saving: 10-40% of your NAT Gateway bill.

The pattern underneath all of this

None of these are obscure. Every experienced AWS engineer knows they exist.

The reason they accumulate anyway is always the same: the team is busy, the bill is complex, and nobody has sat down to look systematically. Cost Explorer shows you the numbers. It doesn't tell you which specific instances to reschedule, which volumes are orphaned, or which buckets have never had a lifecycle rule.

That's the gap.

A tool I built for exactly this

I got tired of doing this analysis manually on every engagement, so I built KloudAudit — a structured 18-checkpoint audit that walks through all five of these (and 13 more) with savings estimates specific to your bill size.

Free to run. Takes 15 minutes. No AWS account access required — you answer questions about your own setup, and it tells you what to fix and in what order.

If you write about cloud costs or DevOps, I also built a free embeddable savings calculator your readers can use directly on your articles. One script tag, no signup, no cost: kloudaudit.eu/widget/

What's the most surprising cloud cost you've discovered? I'm collecting war stories — drop it in the comments.

The 5 AWS charges silently draining your budget (and how to fix each one)

KloudAudit — Thu, 30 Apr 2026 06:39:09 +0000

Every AWS bill tells a story. After couple of years as a DevOps engineer reviewing cloud infrastructure, I've learned to read that story, and it almost always has the same five chapters.
These aren't exotic edge cases. They show up on the bills of well-run teams with experienced engineers. They survive because cloud billing is complex, everyone is busy, and "we'll optimise later" is the most expensive phrase in engineering.
Here's what to look for — and exactly how to fix each one.

1. Dev/Staging RDS Running 24/7

What it costs: A db.r5.xlarge running continuously costs ~$876/month. If it's only used 8 hours a day on weekdays, you're paying for 720 hours and using ~170.

Why it happens: The database got created for a sprint, the sprint ended, and nobody scheduled a shutdown.

The fix — Lambda stop/start schedule (20 minutes to implement):

`# Create an IAM role for Lambda with RDS stop/start permissions
aws iam create-role --role-name RDSScheduler \
--assume-role-policy-document file://lambda-trust-policy.json

Stop RDS at 7pm weekdays

aws events put-rule \
--name "StopDevRDS" \
--schedule-expression "cron(0 17 ? * MON-FRI *)"

Start RDS at 8am weekdays

aws events put-rule \
--name "StartDevRDS" \
--schedule-expression "cron(0 8 ? * MON-FRI *)"`

Savings: 65% reduction on affected instances.

2. Everything on On-Demand Pricing

What it costs: On-Demand is the list price the most expensive way to run EC2. If you've had stable workloads running for 6+ months, you're paying a significant premium for flexibility you're not using.
Why it happens: Reservations require commitment and upfront analysis. On-Demand requires neither.

The fix — Compute Savings Plans:

shell bash# Check your On-Demand spend for the last 3 months aws ce get-cost-and-usage \ --time-period Start=2026-01-01,End=2026-04-01 \ --granularity MONTHLY \ --filter '{"Dimensions":{"Key":"PURCHASE_TYPE","Values":["On Demand"]}}' \ --metrics BlendedCost

Check Savings Plans recommendations

shell aws savingsplans describe-savings-plans-purchase-recommendation \ --savings-plans-type COMPUTE_SP \ --term-in-years ONE_YEAR \ --payment-option NO_UPFRONT

Purchase a 1-year no-upfront Compute Savings Plan for your baseline usage. Zero architecture change. Zero risk.

Savings: 30–45% on covered compute.

3. Unattached EBS Volumes

What it costs: $0.10/GB/month sounds trivial. 50 forgotten 100GB volumes from a migration two years ago = $500/month. Every month. For nothing.

Why it happens: When you terminate an EC2 instance, AWS doesn't automatically delete the attached EBS volume unless you explicitly configured it to do so.
The fix — find and review them in 60 seconds:

bash# List all unattached EBS volumes with their size and creation date

aws ec2 describe-volumes \ --filters Name=status,Values=available \ --query 'Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime,Type:VolumeType}' \ --output table

# Delete a specific volume (after verifying you don't need it) aws ec2 delete-volume --volume-id vol-xxxxxxxxxxxxxxxxx

Before deleting: create a snapshot if there's any chance the data is needed. Snapshots cost ~$0.05/GB/month — a fraction of the volume cost.
Savings: variable, but I've seen $200–$1,200/month from this alone.

4. S3 Storage Never Tiered

What it costs: S3 Standard is $0.023/GB/month. S3 Glacier Instant Retrieval is $0.004/GB/month. Data from 2022 that nobody has accessed sitting in Standard costs 5.75x more than it needs to.
Why it happens: Lifecycle rules require deliberate configuration. Default bucket settings keep everything in Standard forever.

The fix — lifecycle policy (5 minutes):

bash# Apply intelligent tiering lifecycle rule to a bucket aws s3api put-bucket-lifecycle-configuration \ --bucket your-bucket-name \ --lifecycle-configuration '{ "Rules": [{ "ID": "IntelligentTiering", "Status": "Enabled", "Filter": {"Prefix": ""}, "Transitions": [ {"Days": 30, "StorageClass": "STANDARD_IA"}, {"Days": 90, "StorageClass": "GLACIER_IR"} ] }] }'

Savings: 30–60% on storage older than 90 days.

5. NAT Gateway Data Processing Charges

What it costs: NAT Gateway charges $0.045 per GB processed; in both directions. Microservices calling each other through a NAT Gateway instead of VPC endpoints can generate $1,000+/month in charges that appear as "data transfer" on your bill.
Why it happens: It's invisible. The architecture diagram looks fine. Nobody knows NAT Gateway charges per-GB until they see the bill.

The fix — VPC endpoints for AWS services:

`bash# Create a VPC endpoint for S3 (free — Gateway type)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-xxxxxxxxx \
--service-name com.amazonaws.eu-west-1.s3 \
--route-table-ids rtb-xxxxxxxxx

Create a VPC endpoint for DynamoDB (free — Gateway type)

aws ec2 create-vpc-endpoint \
--vpc-id vpc-xxxxxxxxx \
--service-name com.amazonaws.eu-west-1.dynamodb \
--route-table-ids rtb-xxxxxxxxx`

S3 and DynamoDB Gateway endpoints are free. Traffic to these services no longer routes through NAT Gateway.

Savings: 10–30% of your NAT Gateway bill, potentially much more.

How to Find All of This in 15 Minutes

If you want to run through all five of these, plus 13 more checks across compute, storage, database, and networking, I built a free structured audit tool that does exactly this.

No account access required. No IAM roles. No agents. You answer questions about your setup and it flags issues with savings estimates.

kloudaudit.eu — free to run, takes 15 minutes.

What's the biggest AWS cost surprise you've found on your bill?
Drop it in the comments, always curious what patterns others are seeing.

How to cut your AWS bill by 20–45% without touching your architecture

KloudAudit — Mon, 27 Apr 2026 19:58:51 +0000

Cloud bills grow quietly.
You add a feature, spin up a database for testing, forget to delete it. You launch on On-Demand because you're moving fast. Your S3 bucket fills up and nobody sets lifecycle rules because "we'll do it later.
Two years later you're paying $8,000/month and your CFO is asking questions.
Here's a systematic way to audit your own bill in under an hour.
Step 1 — Compute: find your zombies
Pull a utilization report for the last 30 days. Any instance averaging under 10% CPU and 20% memory is a zombie. Either rightsize it or kill it. This alone typically saves 15–25% of compute spend.
Step 2 — Reservations: stop paying on-demand tax
If you've been running instances for 6+ months and they're stable, you're paying the most expensive rate possible. 1-year Reserved Instances or Compute Savings Plans cut this by 30–45% with zero architecture change.
Step 3 — Storage: tier everything
Data you haven't accessed in 90 days should not be in S3 Standard. Set lifecycle policies to move to Infrequent Access at 30 days, Glacier at 90 days. S3 Intelligent-Tiering does this automatically.
Step 4 — Databases: dev environments
Stop running your staging RDS 24/7. Use Lambda schedules to shut them down at 7pm and restart at 8am. That's 65% of the hours eliminated.
Step 5 — Spot for batch
Any workload that can tolerate interruption — CI runners, ML training, data pipelines — should be on Spot. 60–80% cheaper. AWS rarely reclaims Spot for most instance types.

I built a free tool that walks you through all of this systematically — kloudaudit.eu. 18 checkpoints, takes 15 minutes, no account access needed.
What am I missing? Drop your favorite cost-saving trick in the comments.