DEV Community: Hardeep Singh Tiwana

NAT Gateways Killing Your Container Costs? Amazon ECR VPC endpoints to the Rescue

Hardeep Singh Tiwana — Fri, 19 Dec 2025 21:32:27 +0000

Picture this. Your AWS bill hits, and there it is: $10K in NAT Gateway charges for 3 NAT GWs in us-east-1. You started to dig in, and see ~$8K comes from NatGateway-Bytes (Data Processed) alone, assuming most of it tied to ECR image pulls. I've helped teams spot this exact issue using Cost Explorer and VPC Flow logs, watching container deployments quietly eat budgets. The solution? Amazon ECR VPC endpoints. They dropped NAT bills by >75% in one setup I worked on. Let's walk through spotting it, the math, and the flow change.

TL;DR:

ECR image pulls through NAT Gateways cost $0.045/GB.

VPC Interface Endpoints cost $0.01/GB (78% cheaper).

Real example: ~$8K/month → ~$2K/month = ~$70K annual savings.

💡 Key Takeaways

The Problem: NAT Gateways charge $0.045/GB for data processing. For ECR-heavy workloads, this adds up fast, as our example case shows $8,010/month in data processing charges alone!

The Solution: Deploy three VPC endpoints to route ECR traffic privately:

ECR API Interface Endpoint (com.amazonaws.<region>.ecr.api)
- Handles authentication and image manifests
- Cost: ~$22/month per AZ + minimal data charges
- Required: Must deploy in each AZ for high availability
ECR Docker Interface Endpoint (com.amazonaws.<region>.ecr.dkr)
- Handles Docker pull/push commands
- Cost: ~$22/month per AZ + minimal data charges
- Required: Must deploy in each AZ for high availability
S3 Gateway Endpoint (com.amazonaws.<region>.s3) ⭐ THE MOST CRITICAL ONE
- Handles actual image layer downloads (99%+ of your data!)
- Cost: $0.00 (FREE!)
- Required: Without this, your image layers still hit NAT Gateways

The Savings: For 178,000 GB/month of ECR traffic:
- Before: $8,108.55/month (NAT Gateways)
- After: $1,823.80/month (VPC Endpoints)
- Savings: $6,284.75/month (77.5%) = $75,417/year

Why This Works?: ECR stores Docker image layers in S3. The free S3 Gateway endpoint handles 95%+ of your data transfer, while the two paid Interface endpoints handle control plane operations. All three work together to eliminate NAT Gateway data processing charges.

Implementation Time: ~30 minutes with Terraform, plus 48 hours to validate savings in Cost Explorer.

Critical Success Factor: You MUST deploy all three endpoints. Deploying only the ECR endpoints without the S3 Gateway endpoint will save you almost nothing because the bulk of your data will still flow through NAT Gateways

Let's start with the Brutal Math: NAT vs. Endpoints Head-to-Head

Think standard 3-AZ VPC with private subnets and container workloads. NAT charges $0.045 per hour per AZ plus $0.045 per GB processed. Endpoints run $0.01 per hour per ENI and $0.01 per GB. Much better for high volume.

Note: AWS requires 2 VPC interface endpoints per AZ for complete ECR private access: ecr.api, ecr.dkr, and s3 (layers), making it 6 ENIs total in a 3-AZ setup. The S3 Gateway endpoint modifies route tables and creates no ENIs. If you like to read more on this, follow links at the end of this post.

ecr.api → Interface endpoint (ENI per AZ)
ecr.dkr → Interface endpoint (ENI per AZ)
s3 → Gateway endpoint (NO ENIs, modifies route tables)

NAT Gateway vs VPC Endpoints Cost Comparison

Configuration: 3 AZs with 3 NAT Gateways vs 3 VPC Endpoints

VPC Endpoint Configuration:

com.amazonaws..ecr.api (Interface) - $0.01/hour per AZ + $0.01/GB
com.amazonaws..ecr.dkr (Interface) - $0.01/hour per AZ + $0.01/GB
com.amazonaws..s3 (Gateway) - FREE (no hourly or data charges)

NAT Gateway Configuration:

3 NAT Gateways (one per AZ) - $0.045/hour each + $0.045/GB

Here's the model, scaled to $8K spend as data baseline (730 hours a month, 9 endpoints: 3 per AZ for ECR API, Docker, and S3):

Data Volume (GB/mo)	NAT Cost ($)	VPC Endpoint Cost ($)	Monthly Savings ($)	Savings %
100	103.05	44.80	58.25	56.5%
500	121.05	48.80	72.25	59.7%
1,000	143.55	53.80	89.75	62.5%
5,000	323.55	93.80	229.75	71.0%
10,000	548.55	143.80	404.75	73.8%
50,000	2,348.55	543.80	1,804.75	76.8%
100,000	4,598.55	1,043.80	3,554.75	77.3%
178,000	8,108.55	1,823.80	6,284.75	77.5%

Total NAT spend declines like a falling rock, at production scale, you will see ROI in days.

Example use case with assumptions

Assume we have 3 NAT Gateways in us-east-1 processing 178,000 GB of ECR traffic monthly.

Cost Breakdown for Total Monthly Cost: $8,108.55

NAT Gateway Hourly Charges: $98.55
- $0.045 per hour × 3 NAT Gateways × 730 hours/month
- This covers the provisioning cost for maintaining 3 NAT Gateways (one per AZ)
Data Processing Charges: $8,010.00
- $0.045 per GB × 178,000 GB
- This is the charge for processing all data flowing through the NAT Gateways

Per NAT Gateway:
- Hourly cost: $32.85/month per gateway
- Data processing (if evenly distributed): $2,670.00/month per gateway
Important Note: The data processing charge of $8,010 represents the vast majority (98.8%) of our assumed total NAT Gateway costs. Since we're processing ECR (Elastic Container Registry) traffic within the same region, we won't incur additional data transfer charges for the traffic itself, but the NAT Gateway data processing fee still applies.

Prerequisites:

Private subnets with NAT Gateway access
ECR repositories in the same region
Security groups allowing HTTPS (443) from workloads

Hunt Down Those Hidden ECR Pull Fees

Start in AWS Cost Explorer. In Group by, select Dimension Usage Type, Filter to Service: EC2 - Other and Usage type group: for EC2: NAT Gateway - Data Processed and EC2: NAT Gateway - Running Hours. You'll see NatGateway-Bytes racking up that e.g. $8K at $0.045 per GB, plus NatGateway-Hours for the $0.045 hourly per AZ hit.

For proof, enable VPC Flow Logs on your subnets. Filter for port 443 traffic to ecr.api or ecr.dkr domains (Specifically, Look for destination port 443 traffic to IP addresses in the ECR service IP ranges, available via AWS IP ranges JSON).

Do you see private subnet bytes flooding NAT ENIs? That's the problem. Every pull sends a small request out via NAT, fetches metadata, then hauls gigabytes back, doubling up on processing fees. (If it is an Inter-AZ hop, it add $0.01 per GB more. Caught this pattern adding ~$3000 a month extra in a recent cluster review.)

Using VPC Flow Logs to Track and Validate ECR Traffic Costs

Before deploying VPC endpoints, you need proof that ECR is actually consuming your NAT Gateway bandwidth. After deployment, you need validation that traffic shifted correctly. VPC Flow Logs provide both.

Step 1: Enable VPC Flow Logs

Enable Flow Logs on your private subnets where container workloads run:

Via AWS CLI:

aws ec2 create-flow-logs \
  --resource-type Subnet \
  --resource-ids subnet-xxxxx subnet-yyyyy subnet-zzzzz \
  --traffic-type ALL \
  --log-destination-type cloud-watch-logs \
  --log-group-name /aws/vpc/flowlogs \
  --deliver-logs-permission-arn arn:aws:iam::ACCOUNT_ID:role/flowlogsRole

Via Terraform: : Follow link to see the module on terraform website

resource "aws_flow_log" "private_subnets" {
  iam_role_arn    = aws_iam_role.flow_logs.arn
  log_destination = aws_cloudwatch_log_group.flow_logs.arn
  traffic_type    = "ALL"
  vpc_id          = aws_vpc.main.id
}

Step 2: Identify Top HTTPS Destinations

Run this CloudWatch Logs Insights query to find your highest-volume HTTPS destinations:

fields @timestamp, srcAddr, dstAddr, dstPort, bytes, action
| filter dstPort = 443
| filter interfaceId like /eni-/
| stats sum(bytes) as totalBytes by dstAddr
| sort totalBytes desc
| limit 50

This shows which destinations consume the most bandwidth on port 443. The top destinations are likely S3 IPs (for ECR image layers).

Step 3: Identify S3 and ECR Service IP Ranges

VPC Flow Logs show IP addresses, not domain names. Download AWS's IP ranges to identify both S3 and ECR traffic:

# Download AWS IP ranges
curl -o ip-ranges.json https://ip-ranges.amazonaws.com/ip-ranges.json

# Inspect services for your region
jq -r '.prefixes[] | select(.region=="us-east-1") | .service' ip-ranges.json | sort -u

Once you know the correct service values, narrow it down, since ECR doesn't have a designated value, we use AMAZON:

# Once you know the correct service values, narrow it down, for example:
jq -r '.prefixes[] | select(.service=="AMAZON" or .service=="S3" and .region=="us-east-1") | .ip_prefix' ip-ranges.json

Example IP ranges for us-east-1:

44.223.121.0/24
44.223.122.0/24
98.80.195.0/25
98.80.238.0/23
3.5.0.0/19
1.178.4.0/24

You will see >95% of traffic for S3:

S3 (where ECR stores image layers - 95%+ of your traffic)
ECR (API and Docker registry - <5% of your traffic) Why This Matters: Your 178,000 GB/month is primarily S3 traffic (image layer downloads), not ECR API calls. You must track S3 IPs to see the real cost impact!

(Always check the current AWS IP ranges JSON for your specific region)

Step 4: Calculate NAT Gateway ECR+S3 Traffic

Filter Flow Logs for traffic to BOTH S3 and ECR IPs through NAT Gateway ENIs:

NOTE:

Do NOT copy paste as it is, update filter dstAddr like line to match the range from previus command output.
Replace /^3\.5\./ or dstAddr like /^52\.94\./ or dstAddr like /^3\.5\./ with real IPs you want to look for

fields @timestamp, srcAddr, dstAddr, dstPort, bytes, interfaceId
| filter dstPort = 443
| filter interfaceId like /eni-/ and action = "ACCEPT"
| filter dstAddr like /^3\.5\./ or dstAddr like /^52\.94\./ or dstAddr like /^3\.5\./
| stats sum(bytes) as totalBytes by interfaceId, dstAddr
| sort totalBytes desc

Identify NAT Gateway ENIs:

aws ec2 describe-nat-gateways --region us-east-1 \
  --query 'NatGateways[].{NatGatewayId:NatGatewayId, NetworkInterfaceIds:NatGatewayAddresses[].NetworkInterfaceId}' \
  --output table

Cross-reference the ENI IDs from your query results with NAT Gateway ENIs.
💡 Pro Tip: The top destination IPs by bytes will be S3 ranges, not ECR ranges. This confirms that S3 Gateway endpoint is critical for cost savings!

Step 5: Calculate Monthly Cost Impact

From your Flow Logs query results:

Sum total bytes through NAT Gateway ENIs to S3 + ECR IPs
Convert to GB: totalBytes / 1,000,000,000 (AWS uses decimal GB)
Calculate cost: GB × $0.045

Cost Calculation Example:

Flow Logs show: 191,102,976,000 bytes to S3/ECR
Convert: 191,102,976,000 / 1,000,000,000 = 191.10 GB
For 178,000 GB/month: 178,000 × $0.045 = $8,010/month

Traffic Breakdown (typical):

S3 image layers: ~177,850 GB (99.91%)
ECR API calls: ~50 GB (0.03%)
ECR Docker registry: ~100 GB (0.06%)

Step 6: Validate After VPC Endpoint Deployment

After deploying VPC endpoints, confirm traffic shifted to private IPs:

fields @timestamp, srcAddr, dstAddr, dstPort, bytes, interfaceId
| filter dstPort = 443
| filter dstAddr like /^10\./
| filter interfaceId like /eni-/
| stats sum(bytes) as totalBytes by interfaceId
| sort totalBytes desc

What you should see:

✓ Traffic now goes to private 10.x.x.x IPs (VPC endpoint ENIs)
✓ NAT Gateway ENIs show minimal S3/ECR traffic
✓ Total bytes shifted from NAT to VPC endpoints

❌ But this validation method has problems ❌

⚠️ The above given filter only filters for RFC 1918 private IPs (10.0.0.0/8), but VPC endpoints use different address ranges:

Gateway Endpoints (S3, DynamoDB)

Use prefix list routes (pl-xxx), not destination IPs in flow logs
dstAddr shows the actual S3 service IP (public range like 52.x.x.x), not private
Flow log records bypass the interfaceId filter entirely because they hit the prefix list route directly

Interface Endpoints (ECR.api, ECR.dkr, etc.)

Use PrivateLink IPs in the VPC CIDR (e.g., 10.0.x.x if your VPC is 10.0.0.0/16)
dstAddr shows the endpoint ENI IP (private), but only if your VPC CIDR starts with 10.

So what would correct validation queries look like?

1. Interface Endpoints (ECR, etc.) - Check PrivateLink traffic

fields @timestamp, srcAddr, dstAddr, dstPort, bytes, interfaceId
| filter dstPort = 443
| filter dstAddr like /^10\./  # Your VPC CIDR range
| filter interfaceId like /eni-/
| stats sum(bytes) as totalBytes by dstAddr, interfaceId
| sort totalBytes desc

⚠️ Only works if your VPC CIDR is 10.x.x.x. Replace with your actual CIDR (e.g., 172.16. or 192.168.).

2. Gateway Endpoints (S3) - Check prefix list bypass

fields @timestamp, srcAddr, dstAddr, dstPort, bytes, interfaceId
| filter dstPort = 443
| filter s3BucketName != "" or dstAddr like /s3\./  # S3 traffic
| filter interfaceId like /nat-/ == false  # Not NAT ENIs
| stats sum(bytes) as totalBytes by dstAddr
| sort totalBytes desc

3. 🎯 NAT Gateway traffic drop (The real validation)🎯

fields @timestamp, srcAddr, dstAddr, dstPort, bytes, interfaceId
| filter dstPort = 443
| filter interfaceId like /nat-/
| stats sum(bytes) as totalBytes by interfaceId
| sort totalBytes desc

Before endpoints: High bytes on NAT ENIs
After endpoints: Bytes drop significantly on those same ENIs.

🎯 What success looks like 🎯

BEFORE endpoints:

NAT ENI: 150 GB to s3.us-east-1.amazonaws.com
NAT ENI: 25 GB to 123456789012.dkr.ecr.us-east-1.amazonaws.com

AFTER endpoints:

NAT ENI: 5 GB (mostly external APIs)
Interface ENI: 25 GB to 10.0.2.100 (ECR.dkr endpoint)
S3 traffic: Prefix list route (no NAT ENI)

Key metric: NAT ENI bytes drop. That's your validation.

The /^10\./ filter only catches interface endpoints and only if your VPC uses that range. Use the NAT traffic reduction query instead.

Validate endpoint ENI IDs:

# ECR API endpoint ENIs
aws ec2 describe-vpc-endpoints --region us-east-1 \
  --filters "Name=service-name,Values=com.amazonaws.us-east-1.ecr.api" \
  --query 'VpcEndpoints[*].NetworkInterfaceIds' \
  --output table

# ECR Docker endpoint ENIs
aws ec2 describe-vpc-endpoints --region us-east-1 \
  --filters "Name=service-name,Values=com.amazonaws.us-east-1.ecr.dkr" \
  --query 'VpcEndpoints[*].NetworkInterfaceIds' \
  --output table

# S3 Gateway endpoint (no ENIs - modifies route tables)
aws ec2 describe-vpc-endpoints --region us-east-1 \
  --filters "Name=service-name,Values=com.amazonaws.us-east-1.s3" \
  --query 'VpcEndpoints[*].[VpcEndpointId,VpcEndpointType,RouteTableIds]' \
  --output table

Step 7: Correlate with Cost Explorer

Confirm the cost impact in AWS Cost Explorer:

Navigate to: Cost Explorer → Cost & Usage Reports
Group by: Usage Type
Filter Service: EC2 - Other
Look for:
- NatGateway-Bytes (should drop ~75%)
- VpcEndpoint-Bytes (should increase proportionally)
- Time range: Compare 2 weeks before vs 2 weeks after deployment

Expected results:

- NAT Gateway data processing: $8,010 → ~$2,000 (75% reduction)
- VPC Endpoint data processing: $0 → ~$1,780
- Net savings: ~$6,285/month

Understanding the Three-Endpoint Architecture

Why you need all three endpoints:

ECR API Interface Endpoint (com.amazonaws.us-east-1.ecr.api)
- Handles authentication, authorization, image manifests
- Low data volume (~50 GB/month)
- Cost: $21.90/month (3 AZs × 730 hrs × $0.01) + ~$0.50 data
ECR Docker Interface Endpoint (com.amazonaws.us-east-1.ecr.dkr)
- Handles Docker pull/push commands, layer discovery
- Low data volume (~100 GB/month)
- Cost: $21.90/month (3 AZs × 730 hrs × $0.01) + ~$1.00 data
S3 Gateway Endpoint (com.amazonaws.us-east-1.s3) ← THE CRITICAL ONE
- Handles actual image layer downloads (99%+ of your data!)
- High data volume (~177,850 GB/month)
- Cost: $0.00 (FREE!) ← This is where your savings come from! Without the S3 Gateway endpoint, your image layer downloads would still hit NAT Gateways even with ECR endpoints deployed!

Pro Tips for Flow Logs Analysis

✓ Track S3 IPs, not just ECR IPs - S3 is where 95%+ of ECR data flows
✓ Enable Flow Logs on private subnets only - Reduces log volume and costs
✓ Use CloudWatch Logs Insights - Best for ad-hoc queries and quick analysis
✓ Consider Amazon Athena - Better for large-scale historical analysis
✓ Set up CloudWatch alarms - Alert on unexpected NAT traffic spikes
✓ Tag your resources - Makes NAT Gateways and VPC endpoints easier to identify
✓ Factor in Flow Logs cost - Approximately $0.50/GB ingested to CloudWatch
✓ Aggregate by 5-minute intervals - Reduces log volume without losing insights
✓ Monitor for 2-4 weeks - Ensures you capture full deployment cycles and traffic patterns

Before and After: Understanding The Traffic Flow

Before: ECS Tasks → NAT Gateway → Internet → ECR/S3 (expensive)
After: ECS Tasks → VPC Endpoints → AWS Private Network → ECR/S3 (optimized)

Before endpoints

A pod in a private subnet hits NAT Gateway for every ECR pull
Request goes outbound to the internet, ECR API replies inbound through NAT processing, then Docker layers stream back with massive GBs.
Flow Logs show megabytes to NAT ENIs. Cost Explorer's NatGateway-Bytes balloons to $8K.

After, deploy

com.amazonaws.<region>.ecr.api and .ecr.dkr endpoints in each private subnet per AZ, turn on private DNS.
Pod traffic goes straight to the endpoint ENI via PrivateLink, no NAT or internet gateway.
AWS backbone handles the rest, ECR layers flow free within the region.
Flow Logs shift: zero NAT to ECR domains, all bytes on private 10.x endpoint IPs.
In Cost Explorer, NAT usage drop like a falling rock.
Look for usage types containing VpcEndpoint-Hours and VpcEndpoint-Bytes under the VPC service to confirm it is starting to show costs with much smaller amounts as compared to what NAT was showing.

Rolled this out on a Kubernetes fleet processing 178,000 GB/mo ECR traffic. NAT crashed from $10K ($8K data processed) to $2K for services that still need it. Endpoints totaled $1.8k. Filter Data Transfer + EC2 in Cost Explorer you will see EC2: NAT Gateway - Data Processed costs drop sharply, while VpcEndpoint-Hours + VpcEndpoint-Bytes take over at $0.01/GB.

Cost After VPC Interface Endpoints: $$1,823.80/month

New Cost Breakdown:

NAT Gateway Costs:

Hourly charges: $98.55 (gateways remain for other traffic)
Data processing: $0.00 (ECR traffic now bypasses NAT entirely) #### VPC Interface Endpoint Costs:
Hourly charges: $43.80 (2 endpoints × 3 AZs × 730 hours × $0.01/hour)
Data processing: $1,780.00 (178,000 GB × $0.01/GB) ## The Impact: 💰 Monthly Savings: $6,284.75/month (77.5%) 💰 Annual Savings: $75,417.00/year

What You Need to Deploy:

Required Interface Endpoints (per AZ):

✅ com.amazonaws.us-east-1.ecr.api - For ECR API calls
✅ com.amazonaws.us-east-1.ecr.dkr - For Docker registry operations

Required Gateway Endpoint (VPC-wide - For ECR image layer storage - FREE):

✅ com.amazonaws.us-east-1.s3 - Deploy once per VPC (not per AZ)

A quick and dirty example Terraform code"

resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.aws_region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = aws_route_table.private[*].id
  policy            = data.aws_iam_policy_document.s3_ecr_access.json

  tags = {
    Name = "s3-gateway"
  }
}

resource "aws_vpc_endpoint" "ecr-dkr-endpoint" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.aws_region}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  private_dns_enabled = true
  security_group_ids  = [aws_security_group.ecs_task.id]
  subnet_ids          = aws_subnet.private[*].id

  tags = {
    Name = "ecr-dkr"
  }
}

resource "aws_vpc_endpoint" "ecr-api-endpoint" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.aws_region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  private_dns_enabled = true
  security_group_ids  = [aws_security_group.ecs_task.id]
  subnet_ids          = aws_subnet.private[*].id

  tags = {
    Name = "ecr-api"
  }
}

Validation:

Validate with: nslookup ecr.api.us-east-1.amazonaws.com
Should resolve to private 10.x.x.x addresses, not public IPs.

💡 Pro Tip: The S3 Gateway endpoint is critical but FREE.

Add a free S3 Gateway endpoint for ECR layer storage access. While ECR endpoints handle API calls, image layers are stored in S3. The Gateway endpoint ensures this traffic also bypasses NAT at zero cost, so don't skip it. ECR stores image layers in S3, and without this endpoint, your layer downloads will still hit NAT Gateways!

Why Does This Work So Well?

The key is data processing rate difference:

NAT Gateway: $0.045/GB
VPC Endpoint: $0.01/GB (78% cheaper per GB)

Plus, VPC endpoints provide:

Better security - Traffic never leaves AWS network
Lower latency - Direct path to ECR
Higher reliability - No internet gateway dependency
Simplified architecture - Private subnets can pull images directly

Another Implementation detail to keep in mind:

Your NAT Gateways stay in place for other internet-bound traffic (software updates, external APIs, etc.), but all ECR image pulls route through the VPC endpoints instead. This is a configuration change, not a replacement and you get the best of both worlds.

Troubleshooting:

DNS not resolving privately? Enable "Private DNS" on endpoints ✅
Still seeing NAT charges? Check security group rules allow 443 inbound ✅
Pulls timing out? Verify subnet route tables don't force internet gateway ✅
Endpoint not appearing in Cost Explorer? Wait 24-48 hours for billing data to populate; check under Service: "VPC" ✅
Validate endpoint status: aws ec2 describe-vpc-endpoints --filters "Name=service-name,Values=com.amazonaws.us-east-1.ecr.api" ✅

Troubleshooting Flow Logs Analysis

Issue: Can't find NAT Gateway ENIs in Flow Logs

✅ Verify Flow Logs are enabled on the correct subnets
✅ Check that traffic-type is set to ALL (not just ACCEPT or REJECT)
✅ Wait 10-15 minutes after enabling for data to populate

Issue: S3/ECR IP ranges don't match traffic

✅ AWS IP ranges change periodically - always download the latest JSON
✅ Some regions have additional IP ranges not in the main prefixes
✅ Check for both IPv4 and IPv6 ranges if your VPC supports dual-stack
✅ Remember: Most traffic will be to S3 IPs, not ECR IPs!

Issue: Traffic still shows NAT Gateway after endpoint deployment

✅ Verify private_dns_enabled = true on Interface endpoints
✅ Check security groups allow port 443 from workload subnets
✅ Confirm route tables don't have explicit routes forcing internet gateway
✅ Verify S3 Gateway endpoint is associated with correct route tables
✅ Test DNS resolution: nslookup ecr.api.us-east-1.amazonaws.com should return 10.x.x.x
✅ Test S3 access: nslookup s3.us-east-1.amazonaws.com should resolve (Gateway endpoints don't change DNS)

Issue: Cost Explorer doesn't match Flow Logs calculations

✅ Flow Logs show raw bytes; Cost Explorer uses decimal GB (1 GB = 1,000,000,000 bytes)
✅ Cost Explorer has 24-48 hour delay for billing data
✅ Ensure you're comparing the same time periods
✅ Check for data transfer charges vs data processing charges
✅ Remember: S3 Gateway endpoint traffic is FREE, so you won't see it in VPC endpoint costs

Issue: Only seeing small data volumes to ECR IPs

✅ This is NORMAL! ECR API/Docker traffic is <5% of total
✅ The bulk of your data goes to S3 IPs (image layers)
✅ If you're only filtering for ECR IPs, you're missing 95%+ of the traffic
✅ Update your query to include S3 IP ranges

Reality Check

This assumes full traffic shift (realistic for ECR-only optimization). Background NAT persists for other internet traffic. Monitor your Cost Explorer's NAT Gateway data processing charges weekly for the first month. You should see a 75%+ drop if ECR is your primary NAT consumer. If not, investigate other high-volume services using VPC Flow Logs.

Next Steps

Run Cost Explorer analysis (5 min)
Deploy endpoints in non-prod (30 min)
Validate with test pulls (10 min)
Monitor for 48 hours
Roll to production during maintenance window
Track Cost Explorer for 2 weeks to confirm savings

Ready to fix it? Create the endpoints in console or Terraform, tag them like Name:ecr-api for tracking, test docker pull once private DNS propagates. Budget relief comes fast. Seen this work for you? Share in the comments.

References:

Setting up AWS PrivateLink for Amazon ECS, and Amazon ECR

Using VPC endpoint policies to control Amazon ECR access

Reducing EKS cross-AZ cost using Cilium

Hardeep Singh Tiwana — Tue, 14 Oct 2025 17:15:46 +0000

As Kubernetes workloads scale on AWS across multiple Availability Zones (AZs), managing inter-AZ traffic efficiently is crucial for performance and cost savings. AWS charges for data transferred between AZs, and Kubernetes’ standard networking can inadvertently increase this cross-zone traffic. Cilium, a modern, eBPF-powered networking and security solution, offers unique capabilities to reduce these costs while improving network visibility and control. This blog merges clear explanations and official resources, providing a comprehensive overview of how Cilium helps optimize cross-AZ traffic on AWS.

The Challenge of Cross-AZ Traffic on AWS

AWS bills data transfer anytime network traffic crosses AZ boundaries within the same region (PS: it costs for other regions too!, using "same region" to stay aligned with AZ boundary and concept). Kubernetes Service types such as LoadBalancer or NodePort may distribute traffic across nodes in different AZs, leading to increased cross-zone data flow and charges. This is especially impactful at scale where pod-to-pod communication patterns cause costly inter-AZ hops.

How Cilium Limits Cross-AZ Transfer Costs

Cilium employs the Linux kernel's eBPF technology to transform Kubernetes networking with efficiency and deep visibility. Its key features for reducing cross-AZ traffic include:

Topology-Aware Routing: Cilium supports Kubernetes topology-aware service routing, ensuring traffic stays within the same AZ whenever possible to avoid cross-zone charges. This feature uses annotations like topology.kubernetes.io/zone to guide Kubernetes service traffic locality.
ENI Mode Integration: Cilium's ENI IP Address Management (IPAM) mode assigns pod IPs directly to AWS Elastic Network Interfaces (ENIs) attached to nodes within the same AZ. In this setup, pod traffic routes natively through AWS networking without encapsulation, reducing latency and avoiding cross-AZ data transfers.
Advanced IPAM: Cilium offers IPAM modes such as ENI and ClusterPool, providing granular control over IP assignment and routing. These modes improve traffic locality by aligning pod IPs with underlying AWS subnet allocation per AZ.
Policy-Driven Traffic Control: With Cilium’s rich layer 3 to layer 7 network policies, you can enforce strict AZ-local communication rules or selectively allow cross-AZ traffic only when needed.

Practical Cilium Setup on AWS EKS

Implementing Cilium on EKS involves options from full replacement of AWS VPC CNI to running alongside it in a secondary CNI mode. To optimize cross-AZ traffic:

Enable Topology-Aware Routing: Use Kubernetes service annotations paired with Cilium’s kube-proxy replacement to route traffic preferentially within the same AZ.

+-------------------------------------------------------------+
|                 AWS Region (Multiple AZs)                   |
|                                                             |
|   +------------------------+   +------------------------+   |
|   | Availability Zone A    |   | Availability Zone B    |   |
|   |                        |   |                        |   |
|   | +------------------+   |   | +------------------+   |   |
|   | |  Node A1         |   |   | |  Node B1         |   |   |
|   | |  Pod(s) A        |   |   | |  Pod(s) B        |   |   |
|   | +------------------+   |   | +------------------+   |   |
|   |   |                    |   |   |                    |   |
|   |   | Service Traffic    |   |   | Service Traffic    |   |
|   |   | goes within AZ     |   |   | goes within AZ     |   |
|   |   v                    |   |   v                    |   |
|   | +------------------+   |   | +------------------+   |   |
|   | | Pod(s) A (target)|<--+   | | Pod(s) B (target)|<--+   |
|   | +------------------+   |   | +------------------+   |   |
|   |                        |   |                        |   |
|   +------------------------+   +------------------------+   |
|                                                             |
|  Kubernetes Service                                         |
|  - Annotated with topology.kubernetes.io/zone               |
|  - Cilium replaces kube-proxy, respecting topology hints    |
|  - Routes client traffic preferentially within same AZ      |
+-------------------------------------------------------------+

Legend:
- Service traffic stays within the same AZ
- If Pod targets exist in the same AZ, no cross-AZ routing occurs
- Traffic flows across AZs only if necessary (failover or no local endpoints)

Deploy Cilium ENI Mode: This maps pod IPs to ENIs tied to the same AZ subnet as the hosting node, enabling native AWS routing and cutting down on costly inter-AZ traffic.

+-----------------------------------------------------------+
|            AWS Availability Zone A (us-west-2a)           |
|                                                           |
|   +-----------------+      +-----------------+            |
|   |   EC2 Node #1   |      |   EC2 Node #2   |            |
|   |                 |      |                 |            |
|   | +-----------+   |      | +-----------+   |            |
|   | | ENI eth0  |---|------| | ENI eth0  |---|----+       |
|   | +-----------+   |      | +-----------+   |    |       |
|   |   |             |      |   |             |    |       |
|   | +-----------+   |      | +-----------+   |    |       |
|   | | Pod A     |   |      | | Pod B     |   |    |       |
|   | +-----------+   |      | +-----------+   |    |       |
|   |                 |      |                 |    |       |
|   +-----------------+      +-----------------+    |       |
|                                                   |       |
|         Native AWS Subnet & Route Table (local)   |       |
+---------------------------------------------------|-------+
                                                    |
                   minimal inter-AZ traffic         |
                                                    |
+---------------------------------------------------|-------+
|            AWS Availability Zone B (us-west-2b)   |       |
|                                                   |       |
|   +-----------------+      +-----------------+    |       |
|   |   EC2 Node #3   |      |   EC2 Node #4   |    |       |
|   |                 |      |                 |    |       |
|   | +-----------+   |      | +-----------+   |    |       |
|   | | ENI eth0  |---|------| | ENI eth0  |---|----+       |
|   | +-----------+   |      | +-----------+   |            |
|   |   |             |      |   |             |            |
|   | +-----------+   |      | +-----------+   |            |
|   | | Pod C     |   |      | | Pod D     |   |            |
|   | +-----------+   |      | +-----------+   |            |
|   |                 |      |                 |            |
|   +-----------------+      +-----------------+            |
|                                                           |
+-----------------------------------------------------------+

Legend:
- ENI: AWS Elastic Network Interface
- Pod: Kubernetes Pod, with IP mapped to ENI in node's AZ subnet
- Native subnet & route: traffic is routed locally within AZ
- Inter-AZ traffic: minimized (only when necessary for HA or failover)

Leverage Cluster Mesh: For multi-cluster or multi-region scenarios, Cluster Mesh manages service endpoints to prefer local pods and restrict unnecessary data flow across zones.

+----------------------------------------------------------------------------------+
|                                AWS Region (Multi-AZ)                             |
|                                                                                  |
|    +-------------------------------+      +-------------------------------+      |
|    | Availability Zone A (AZ-a)    |      | Availability Zone B (AZ-b)    |      |
|    |                               |      |                               |      |
|    |  +-------------------------+  |      |  +-------------------------+  |      |
|    |  |    Cluster A (in AZ-a)  |  |      |  |    Cluster B (in AZ-b)  |  |      |
|    |  |                         |  |      |  |                         |  |      |
|    |  |  +------+  +------+     |  |      |  |  +------+  +------+     |  |      |
|    |  |  | Pods |  | Pods |     |  |      |  |  | Pods |  | Pods |     |  |      |
|    |  |  +------+  +------+     |  |      |  |  +------+  +------+     |  |      |
|    |  |                         |  |      |  |                         |  |      |
|    |  |  Traffic stays local    |  |      |  |  Traffic stays local    |  |      |
|    |  |  within AZ and Cluster  |  |      |  |  within AZ and Cluster  |  |      |
|    |  +------------|------------+  |      |  +------------|------------+  |      |
|    +---------------|---------------+      +---------------|---------------+      |
|                    |                                      |                      |
|    Traffic to other clusters stays minimal                |                      |
|    for high availability & resiliency                     |                      |
|                    +--------------------------------------+                      |
|                    |                                                             |
|                    |                                                             |
|            +-------v-------+                                                     |
|            |  Cluster Mesh |                                                     |
|            |  Synchronizes |                                                     |
|            |  Service &    |                                                     |
|            |  Endpoint Info|                                                     |
|            +---------------+                                                     |
|                                                                                  |
|               Resiliency: Failover / backup cluster routes traffic across AZs    |
+----------------------------------------------------------------------------------+ 

Legend:
- Pods communicate locally within their cluster and AZ.
- Traffic to other AZs only for resiliency or failover (Cluster Mesh).
- Cluster Mesh ensures clusters share service status across AZs without unnecessary cross-AZ pod traffic.

Many users have reported notable savings on AWS data transfer costs by carefully tuning these settings in real deployments.

Benefits of Using Cilium for Cross-AZ Optimization

Cost Reduction: Keeps data transfer local to the zone, cutting expensive AWS inter-AZ charges.
Improved Availability: Maintains Kubernetes service resiliency by balancing traffic intelligently but favoring locality.
Observability with Hubble: Deep, real-time visibility into pod-to-pod communication paths helps diagnose network flow and optimize topology.
Fine-Grained Security: Layer 7 network policies enable precise control over permissible traffic patterns in and across AZs.

Challenges to Consider

Complex Configuration: Setting up advanced IPAM modes and topology-aware routing requires deeper networking knowledge.
Learning Curve: Teams new to eBPF and Cilium’s enhanced policy model may face an adjustment period.
AWS Resource Limits: AWS ENI attachment limits and subnet sizing must be carefully managed to avoid capacity bottlenecks.
Kubernetes Version Dependency: Some features rely on newer Kubernetes releases supporting topology hints and service routing.

Conclusion

Optimizing cross-AZ traffic on AWS Kubernetes clusters is essential for both cost efficiency and application performance. Cilium’s eBPF-driven approach combined with AWS native networking integration offers a modern, powerful solution. While setup complexity exists, the tradeoff is significant savings and greater control. For technical teams ready to invest in advanced networking, Cilium is a compelling choice.

Istio in Simple English: Imagine Your Apps Living in a Smart City 🚦🏙️

Hardeep Singh Tiwana — Fri, 29 Aug 2025 05:02:14 +0000

After explaining Kubernetes in simple terms, many have asked about service meshes, particularly Istio. So let’s dive into Istio, a powerful service mesh that helps manage, secure, and observe microservices in a Kubernetes environment.

If Kubernetes is like a global restaurant franchise, Istio is like the traffic control and security system of a bustling smart city filled with tons of little shops, roads, and delivery trucks all needing to communicate reliably and securely.

Imagine your collection of microservices as vibrant businesses spread across this city, each handling its own specific job. Some sell bread, others deliver packages, some offer repairs, it’s a complex ecosystem that needs order to thrive.

Without a city planner, traffic controller, and security patrols, this city becomes chaotic fast, with delivery crashes, wrong shipments, and security breaches.

Welcome to Istio City: The Smart Traffic & Security Authority 🚦🏙️

Istio is the invisible infrastructure layer that sits between the services (shops) and their communication networks (roads), helping manage, secure, and monitor traffic moving through your city.

The Istio Smart City Architecture: Two Big Departments 🏢🧠

1. Control Plane: The City Hall 🏛️

At the heart of Istio’s smart city is the Control Plane, led by a brainy department called Istiod. It works like city hall, responsible for:

Traffic Planning and Rules: Deciding which roads trucks take, who gets priority, and who must stop. (Traffic management)
Security & Identity: Issuing ID badges (certificates) to trucks and enforcing checkpoints to block unauthorized visitors. (mTLS, authentication, authorization)
Configuration Distribution: Sending new laws and updates to traffic patrols and checkpoints across the city. (Proxy configuration)
Service Discovery: Keeping track of all active shops and routes in the city.

2. Data Plane: The Traffic Controllers on the Roads 🚓

The Data Plane consists of numerous Envoy proxies that act as local traffic cops and watchdogs stationed alongside each shop or neighborhood. They:

Handle the actual flow of traffic between shops (service-to-service communication)
Enforce traffic rules, security policies, and routing decisions from city hall
Collect data on traffic patterns to send back to the control plane.

The Traffic Tools of Istio City 🛠️

Sidecar Proxies 🛺: In the classic model, every shop gets its own personal traffic cop walking right beside it, guiding every visitor in or out. These “sidecar” proxies are attached to each microservice (Pod). They intercept all requests in and out, securing, routing, and monitoring communication.
Gateways 🚪: Big city gates that control traffic coming into the city from outside, handling things like securing communication from outside customers or other cities.
Virtual Services 🛣️: These are the traffic plans dictating which roads should lead visitors to which shops, including fancy maneuvers like canary releases or A/B testing, sending some visitors down new paths without disrupting the flow.
Destination Rules 🎯: Policies applied to destinations (shops) about how they want visitors handled, controlling load balancing methods, connection pools, and failure recovery behavior.

Sidecar Mode 🛺 vs Ambient Mesh Mode 🚕

Istio lets you choose how to deploy your traffic cops:

Sidecar Mode 🛺 (Each Shop Has Its Own Traffic Cop)

In Sidecar Mode, every microservice gets its own Envoy proxy sidecar walking alongside. Think of this as assigning a personal traffic cop who manages all the incoming and outgoing traffic for that one shop.

Benefits:

Granular control over traffic for every single microservice.
Supports the full spectrum of Istio features (fine routing, detailed telemetry, strict security).
Helps direct visitors to the closest shop for faster service.

Challenges:

Each sidecar uses CPU and memory, having hundreds or thousands adds overhead.
Increased complexity in managing many proxies.
Slight latency increase as traffic goes through proxies one by one.

Ambient Mesh Mode 🚕 (Smart Roads with Patrol Cars)

In the newer Ambient Mode, Istio shifts from giving every shop a dedicated traffic cop to creating smart, shared roads patrolled by a few highly efficient traffic controllers. Instead of a cop next to every shop, the roads themselves become intelligent.

Benefits:

Lower resource usage, fewer proxies means better efficiency at scale.
Easier upgrades and simplified operations since fewer proxies to manage.
Works well for large-scale deployments or services where full sidecar detail isn’t needed.

Challenges:

Less detailed control at the microservice level right now.
Some advanced Istio features are still catching up in support.
Larger security zones; a misconfiguration affects more services.

Why Choose Istio? The City’s Edge In Microservices Management 🌟

Traffic Control: Manage traffic flow with retries, timeouts, canary releases, and circuit breakers so the city runs smoothly.
Security: Automatic mutual TLS, identity verification, and policies build a zero-trust city protecting shops from unauthorized visitors.
Observability: Detailed logs, metrics, and tracing give city planners insights into traffic jams before shoppers complain.
Resilience & Flexibility: Quickly redirect traffic, recover from failures, and deploy new service versions without shutting things down.

Challenges in Running Istio City 🚧

The complexity of Istio’s infrastructure and configuration can be overwhelming for smaller teams.
Managing sidecar overhead and scaling efficiently requires careful planning.
Keeping policies consistent in dynamic, multi-cloud environments takes skill.

Diagram: Istio Smart City Analogy

Here is a custom diagram representing the Istio service mesh smart city analogy, showing the difference between Sidecar Mode and Ambient Mesh Mode, with key components and their roles symbolized visually:

                 +------------------------+
                 |       ISTIO CITY HALL  |
                 |     (Control Plane /   |
                 |      Istiod Controller)|
                 +-----------+------------+
                             |
           ----------------------------------------
           |                                      |
+-----------------------+             +---------------------------+
|    Sidecar Mode 🛺    |             |  Ambient Mesh Mode 🚕     |
|  (Personal Traffic    |             |  (Smart Shared Roads)     |
|   Cop for Each Shop)  |             |                           |
+-----------------------+             +---------------------------+
| +-------------------+ |             | +-----------------------+ |
| |    Shop A         | |             | |    Neighborhood A     | |
| | [App Container]   | |             | | +-------------------+ | |
| | [Envoy Sidecar]   | |             | | | Shared Patrol Car | | |
| +-------------------+ |             | | +-------------------+ | |
|                       |             | |                       | |
| +-------------------+ |             | | +-------------------+ | |
| |    Shop B         | |             | | |  Shop A, Shop B   | | |
| | [App Container]   | |  <------>   | | +-------------------+ | |
| | [Envoy Sidecar]   | |  Traffic    | |                       | |
| +-------------------+ |  Flow       | | +-------------------+ | |
|                       |             | | |    Neighborhood B   | |
| +-------------------+ |             | | | +-----------------+ | |
| |   Gateway (City   | |             | | | |  Shared Patrol  | | |
| |      Gate)        | |             | | | |    Car          | | |
| +-------------------+ |             | | | +-----------------+ | |
|                       |             | | +---------------------+ |
+-----------------------+             +---------------------------+

KEY ROLES:

Control Plane (City Hall): Manages traffic rules, security policies, and distributes configs.
Data Plane (Sidecars or Ambient Patrols): Enforces traffic routing, security, telemetry.

FEATURES:

Sidecar Mode: Proxies attached to each app handle traffic individually.
Ambient Mode: Smart shared proxies manage traffic for multiple apps collectively.

BENEFITS & CHALLENGES:

Sidecar: Granular control & full features; resource overhead & complexity.
Ambient: Lower overhead & simpler ops; less granular control currently.

This diagram visually contrasts the two modes with buildings/shops representing microservices and their proxy traffic managers as individual sidecars or shared patrol cars on the roads, under the supervision of the central Control Plane city hall. It highlights the components and their roles within the smart city (Istio) analogy.

Closing Thoughts 🍞➡️🚦➡️🌆

If Kubernetes is your global kitchen serving hundreds of dishes simultaneously, Istio is your city’s traffic and security authority ensuring every dish travels safely and smoothly from kitchen to customer. Whether you assign a dedicated traffic cop to every kitchen station with sidecars, or upgrade to smart, shared roads with ambient mesh, Istio empowers your microservices city to grow resilient and secure, freeing up your chefs and bakers to focus on cooking the best apps.

Containers in Plain English: The Shipping Container of Tech 🚢🍱

Hardeep Singh Tiwana — Wed, 16 Jul 2025 17:43:07 +0000

You asked, and I listened! After the great feedback on my Kubernetes in plain English explanation, many of you requested a similar breakdown for containers. So, here's my attempt to demystify containers for you. Enjoy the read!

Overview

Imagine you’re sending a meal kit 🍱 (your application) from your kitchen to friends around the world. But kitchens everywhere have different equipment (hardware, operating systems), so you worry: What if my recipe needs a special pan, or a rare spice? Here’s where containers 🚢 come to the rescue.

What is a Container? (The Bento Box of Software 🍱)

A container is like a perfectly-packed, sealed bento box for your meal kit. Inside, you don’t just have the food (your code), but also every little thing needed to make that meal work—sauces, utensils, spice packets (your dependencies), even instructions.

With containers:

You send out your bento box and any kitchen can serve your meal exactly as intended—no confusion, no missing ingredients, no awkward substitutions.
The recipient opens the box and gets a self-contained meal, ready to enjoy, independent of their own pantry.

Key Technologies (with Friendly Analogies)

Container Image 🖼️ ≈ Recipe Blueprint 📒: Like a detailed photo and recipe booklet, a container image contains every step and all ingredients needed to construct the meal, lock it in, and ship it anywhere.
Container Engine 🔥 ≈ Chef’s Stove: The container engine (like Docker) is the versatile stove that knows how to cook any meal packed in one of these bento boxes, regardless of the local kitchen quirks.
Host OS 🖥️ ≈ Restaurant Floor: The kitchen flooring—supports all the stoves. You can run many containers side-by-side, each on its own burner, cooking up entirely different meals without them bumping into each other (thanks to isolation features).
Registry 🗄️ ≈ Recipe Warehouse: A central warehouse where all recipes (container images) are safely stored and ready to be shipped to any kitchen in seconds.

How Containers Are Different from Traditional Boxes (Virtual Machines)

Containers 🍱	Virtual Machines 🏢
Ship only the meal, not the whole kitchen; super lightweight	Each box packs not just the meal, but the entire kitchen (full OS), making it heavy and bulky
Fast to open, serve, and refresh	Takes longer to unbox and setup
A dozen containers can share one kitchen floor	Each needs its own floor space

Why People Love Containers (Benefits)

Portability 🚀: Ship your meal anywhere—laptop, cloud, or on-premise kitchen—with the guarantee it’ll taste the same everywhere.
Resource Efficiency 💡: No wasted space. Spin up dozens of meals (apps) on a single kitchen floor (host machine) without fighting for room.
Scalability 📈: Add more meals during rush hour, or pack them away after lunch (scale up/down instantly). Container orchestrators (think kitchen managers) like Kubernetes can automate this process.
Consistency 🔄: Every cook (developer) and diner (user) gets the same meal, every time—no nasty surprises.

Some Real Challenges (The Sour Bits)

Complexity 🌀: Once you’re shipping thousands of meal kits around the globe, you need smart kitchen managers (like Kubernetes) to keep everything running smoothly. That adds a new layer of learning.
Security 🔐: Everyone loves easy-to-share meals, but you must guard against someone sneaking bad ingredients (vulnerabilities) into your kits.
Visibility 👀: So many small boxes—hard to see what’s inside all of them, making monitoring and troubleshooting tricky.

Wrapping Up

Containers are your secret to stress-free, scalable, and reliable meal delivery—no matter where or how you cook. They're the magic lunchbox that guarantees your creation looks and tastes the same, from your own kitchen to the cloud’s massive cafeteria 🍱➡️☁️.

The next time you deploy an app in a container, picture your perfectly packed bento box, ready to delight diners everywhere—just add heat!

📖🧠📚Sources, Guides, and Inspiration📖🧠📚:

https://dev.to/hstiwana/understanding-kubernetes-in-simple-english-what-would-kubernetes-look-like-if-it-was-a-global-1bal
https://aws.amazon.com/what-is/containerization/
https://en.wikipedia.org/wiki/Containerization_(computing)
https://www.docker.com/resources/what-container/

Understanding Kubernetes in Simple English: What would Kubernetes look like if it was a global restaurant franchise?

Hardeep Singh Tiwana — Tue, 15 Jul 2025 21:47:18 +0000

Imagine Kubernetes as a futuristic, global restaurant franchise. Running thousands of branches reliably, efficiently, and securely needs more than good chefs and cooks—it needs an orchestrated symphony of managers, systems, and trusted recipes. Let’s cook up a story that brings Kubernetes concepts to life through the daily operations of this grand culinary operation.

Welcome to Kubernetes Kitchen 🍳🎛️!

Each Application 🍲 is a Signature Dish served in your restaurant. But a modern dish is more than just the food—it comes with unique instructions, tools, and even a particular type of pan (dependencies). Every time a plate is prepared, it's following a carefully packed kit: this is our Container 🍲.

A Pod 🍳 is like a cooking station on the kitchen line, perhaps with several chefs working side by side on the same dish (multiple containers working together tightly).

ReplicaSet 👨‍🍳 is the restaurant manager responsible for ensuring you always have the right number of cooking stations making a specific dish, so you never run out during the dinner rush. If a cook falls ill or a stove breaks, the manager instantly sets up another station so that service continues uninterrupted.

A Deployment 🧑‍💼 is like the general manager who sets the policy: “We need three pizza stations at all times, and if we ever update the recipe, do it without missing a single order!” The deployment manages the manager (ReplicaSet), so if there’s a menu change (new dish version), it smoothly transitions from the old to the new without service disruption.

Your dish’s secret sauce recipe? Those are Secrets 🔒. The shared pantry list? Those are your ConfigMaps 🗒️: detailed notes provided to each station according to your restaurant’s need for consistency.

Your Volumes 🧊 are the fridge or pantry spaces shared by stations, so cooks can store their ingredients and access them anytime—perfect for special prep or long-simmering stocks.

Kubernetes: The Management Backbone 🍳🎛️

Let’s tour the heart of this restaurant empire—the Kubernetes Cluster:

The Control Plane: Where Strategy Happens 🍳🎛️

API Server:🛎️ The busy bee! This is our "head receptionist". Every instruction—from new recipe rollouts to extra cooks for a busy Saturday—passes through here. All guests (users or components) submit their requests to the API Server, who ensures every message is communicated and tracked across the cluster. Gatekeeper for all Kubenetes instructions and state changes: all staff must check in here.
etcd:🗄️ Most imprtant component, the "master recipe safe/vault". Every table booking, pantry stock, dish recipe, and station setup is logged here—a consistent, reliable, distributed database that never loses important notes. Secure, central storage for recipes, inventory, reservations (cluster state and configuration).
Controller Manager:🕹️ The franchise’s "operations manager". Ensures kitchen floor matches the plan: more stations if needed, retire those not in use. If a kitchen promises to have three pasta stations but one disappears, the controller manager notices and brings another online. It oversees that declarations (from deployments, replicasets, etc.) match reality, constantly adjusting to keep the desired state.
Scheduler:📅 The "line/shift supervisor". As new orders (Pods) come in, the scheduler assigns each to the best available station (Node), making sure the workload is balanced, and no kitchen is overburdened.
Cloud Controller Manager:🌐 The "travel/facilities manager". Connects kitchens to new cities (cloud), coordinates services like front door access or equipment delivery. It ensures each restaurant interacts smoothly with its city—whether it’s opening in new locations on a cloud platform or requesting resources from the local infrastructure.

On the Kitchen Floor: The Worker Nodes 🍽️🏢

Every restaurant Node 🍽️🏢 is a bustling branch, staffed with:

kubelet:👩‍🍳 The "sous-chef" to the control plane, ensuring every cooking station (Pod) on the node is running as ordered, checking their status, and reporting back upstairs/HQ (API server).
Container Runtime:🔥 The "Cooking appliances" e.g the cooking equipment—stove, oven, and pans built for containers (Docker, containerd, etc.)—capable of cooking each dish as packaged.
Kube Proxy:🚦 The "Waiter/Recept./Networking" team. Making sure the correct dishes (services) reach the right tables (network addresses), handling the kitchen’s communication with guests and with other kitchens.

Special Components for Smooth Operations

Pod:🍳 The "kitchen station", prepped and loaded with ingredients (containers) for a dish.
Service:🏷️ The "front counter". Customers don’t care which kitchen made their pasta; they ask for “Pasta Al Dente,” and Service directs that request to any available, healthy station.
ConfigMap:🗒️ The "recipe cards" openly displayed in the kitchen.
Secret:🔒 The "locked-away safe" with secret sauce recipes—chef-only access.
Volume:🧊 The "shared fridge" for ingredients, accessible as needed.

🚀 Scaling the Chain: A Day in the Franchise 🚀

Let’s say your hit app, Pizza Deluxe, is going viral:

The Deployment 🧑‍💼 (general manager) dictates: “We need 10 pizza stations, always running the latest pizza recipe.”
ReplicaSet 👨‍🍳 ensures exactly 10 active pizza stations (Pods) ready to cook.
The Scheduler 📅 finds the best locations for every new station as demand rises.
kubelet on each node confirms, “My stations are prepped and cooking!” If a cook leaves, a new one gets hired automatically.
The API Server 🛎️ never misses a single order, and etcd 🗄️ ensures organizational memory is always correct and up-to-date.
Services 🏷️ ensure that when customers (users) ask for pizza, their request is sent to the right available Pod, so no patron waits too long.
Want to update the recipe? Deployment manages a rolling upgrade, introducing new containers gradually, so service never goes down.
Volumes persist the dough between station rebuilds, Secrets keep the sauce recipe safe, and ConfigMaps post the menu outside each kitchen.

Kubernetes Kitchen: Visualized

Here’s my attempt to make a conceptual diagram representing the analogy and flow:

+-------------------------+
|    Kubernetes API       |<------+
|          Server         |       |
+-----------+-------------+       |
            |                     |
            v                     |
+-----------+-----------------+   |
|        Controller Manager   |<--+
+----------------------------+
            |
            v
+--------------------------+
|       Scheduler          |
+-----------+--------------+
            |
            v                 +--------------+
+-----------+-------------+   |   ETCD      |
|    Nodes (Restaurants)  |---| (Recipe DB) |
+-------------------------+   +--------------+
| +---------------+      |
| |   Kubelet     |      |
| +---------------+      |
| | Kube Proxy    |      |
| +---------------+      |
| |ContainerRuntime|     |
| +---------------+      |
| | Pods          |      |
| +------|-------+       |
|        v               |
|   Containers (Dishes)  |
+------------------------+
|
+---> ConfigMap, Secret, Volume (pantry, safe, fridge)
|
+---> Service (front counter)

(Visualization: Each control plane component “manages” the restaurant network, while each node is a kitchen staffed with all elements needed to make, package, and serve your signature dishes.)

As your restaurant chain grows, Kubernetes tirelessly orchestrates every kitchen—ensuring every customer gets a hot, perfectly prepped meal at scale, every time.

Imagine Kubernetes as your ultimate franchise operations HQ—scaling, managing, and securing every station, recipe, and service window, so your team can focus on creating the world's best dining experience.

Want to learn what powers the Kubernetes? read my blog on Containers in Plain English: The Shipping Container of Tech 🚢🍱

📖🧠📚Sources, Guides, and Inspiration📖🧠📚:

Kubernetes Components and Architecture
https://kubernetes.io/docs/concepts/overview/components/
https://kubernetes.io/docs/concepts/architecture/
https://spacelift.io/blog/kubernetes-architecture
https://spot.io/resources/kubernetes-architecture/11-core-components-explained/
https://sysdig.com/learn-cloud-native/components-of-kubernetes/
Analogy & Flow: Restaurant Scenario
https://kodekloud.com/blog/day-4-deployments-replicasets-how-kubernetes-runs-and-manages-your-app/
Deployments, ReplicaSets, and Pods: Lifecycle and Scaling
https://zeet.co/blog/kubernetes-deployment-vs-pod
https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
https://www.linkedin.com/posts/gabrielokom_kubernetes-deployment-replicaset-activity-7249259381932920832-Bkn0

Part2: Kubernetes Backup on Managed Services: What Changes When You Use EKS?

Hardeep Singh Tiwana — Mon, 23 Jun 2025 16:12:03 +0000

In my previous blog post, I covered Kubernetes backup strategies for self-managed clusters, highlighting cost, security, and availability. But what happens when you’re using a managed Kubernetes service like Amazon Elastic Kubernetes Service (EKS)? Let’s dive into the key differences and best practices for backing up Kubernetes on managed platforms.

The Big Shift: Managed Control Plane

With managed Kubernetes services like Amazon EKS, AWS handles the control plane—including etcd, the API server, and scheduler. You don’t have direct access to etcd or the control plane components. This means you can’t perform traditional etcd snapshots as you would on a self-managed cluster. Instead, your backup strategy must focus on what you can control:

Kubernetes objects (Deployments, Services, ConfigMaps, Secrets, etc.)
Persistent data (EBS volumes used by your workloads)
Application configurations and manifests

What to Back Up on EKS

Kubernetes Objects: Anything you create or manage via the Kubernetes API—workloads, configurations, and policies.
Persistent Volumes: Data stored on EBS volumes attached to your pods.
Networking and Security: Ingress, Network Policies, and RBAC rules.
Application Data: For databases or stateful apps, use application-aware backups for consistency.

How to Back Up on EKS

Use Velero for Kubernetes Object Backup

Velero is the go-to tool for backing up and restoring Kubernetes resources on EKS. It works directly with the Kubernetes API, so it’s perfect for managed services where you can’t access etcd. Velero can back up:

All resources in a namespace or across the cluster
Persistent volumes (with the right configuration)
Custom resources and configurations

Velero supports scheduling, retention policies, and can store backups in S3, which integrates well with AWS security and cost controls.

Back Up Persistent Data

For stateful applications, use Velero’s volume snapshot feature to back up EBS volumes. This ensures your data is protected and can be restored if needed. You can also use application-specific backup tools for databases (e.g., pg_dump, mysqldump) and store the output in S3.

Automate and Test

Schedule regular backups to minimize data loss.
Automate retention to delete old backups and control costs.
Test restores to ensure your backups are valid and your recovery process works.

Security and Availability

Encryption: Use AWS KMS to encrypt backups at rest and in transit.
Immutable Backups: Store backups in S3 with Object Lock to prevent tampering or deletion.
Multi-Region Storage: Replicate backups across regions for disaster recovery.
Access Control: Use IAM and RBAC to restrict who can create, delete, or restore backups.

Cost Considerations

Storage Tiering: Move older backups to cheaper storage like S3 Glacier to save money.
Incremental Backups: Only back up changed data to reduce storage and bandwidth costs.
Retention Policies: Automatically delete old backups to avoid unnecessary charges.

What You Can’t Back Up

Control Plane/etcd: Managed by AWS, not accessible for direct backup.
Node-Level State: Unless you use custom tools or scripts, node-level state is typically not backed up by default.

Summary Table

Backup Target	Self-Managed Kubernetes	EKS (Managed Kubernetes)
etcd/Control Plane	Yes (manual snapshots)	No (managed by AWS)
Kubernetes Objects	Yes (Velero, etcdctl)	Yes (Velero via API)
Persistent Volumes	Yes (Velero, volume snapshots)	Yes (Velero, EBS snapshots)
Application Data	Yes (app-aware tools)	Yes (app-aware tools, S3 storage)
Networking/Security	Yes (Velero, GitOps)	Yes (Velero, GitOps)

Best Practices

Use Velero for disaster recovery and migration.
Automate backups and retention to control costs.
Encrypt and protect backups with AWS security features.
Test your restore process regularly.
Store backups in multiple regions for resilience.

Additional Resources

In summary:
When using managed Kubernetes services like EKS, your backup strategy shifts to focus on Kubernetes objects, persistent data, and application configurations—leveraging tools like Velero and AWS storage features for a robust, cost-effective, and secure approach.

Part1: Kubernetes Backup Strategies: Balancing Cost, Security, and Availability

Hardeep Singh Tiwana — Mon, 23 Jun 2025 15:54:17 +0000

Backing up a Kubernetes cluster is a critical task for any organization running containerized workloads. However, it’s not just about what you back up—it’s also about how you do it, how much it costs, and how you ensure your backups are secure and available when needed. This post brings together best practices for Kubernetes backups, with a focus on cost efficiency, robust security, and high availability.

What to Back Up in Kubernetes

A comprehensive backup strategy for Kubernetes should include:

Cluster Configuration and State
- etcd database: Stores all cluster data and is essential for disaster recovery.
- Kubernetes objects: Deployments, StatefulSets, Services, ConfigMaps, Secrets, and custom resources.
- Manifests: Store in version control (e.g., Git) for easy recovery and versioning.
Persistent Data
- Persistent Volumes (PVs) and Persistent Volume Claims (PVCs): Critical for stateful applications.
- Application data: Use application-aware backups for databases and other stateful workloads.
Networking and Security
- Services, Ingress, Network Policies: Ensure consistent access and security post-restore.

How to Back Up Kubernetes

Tools and Methods

etcd Snapshots: Use etcdctl to create and restore snapshots.
Velero: Open-source tool for backup, restore, and disaster recovery.
Volume Snapshots: Use Kubernetes’ VolumeSnapshot API for point-in-time backups of persistent data.
GitOps: Store manifests and configuration in Git for declarative management.

Example: Velero Backup Command

velero backup create my-backup \
  --include-namespaces prod \
  --storage-location=s3 \
  --ttl 720h \      # 30-day retention
  --snapshot-volumes \
  --volume-snapshot-locations aws-us-east-1

Cost Optimization Strategies

Backing up persistent data can become expensive if not managed carefully. Here are ways to reduce costs:

Storage Tiering: Move older backups to cheaper storage tiers (e.g., AWS S3 Glacier).
Incremental Backups: Only back up changed data to minimize storage and network costs.
Retention Automation: Automatically delete outdated backups using tools like Velero’s ttl parameter.
Deduplication & Compression: Reduce backup size with tools like Kasten K10 or TrilioVault.
Frequency Tuning: Align backup schedules with business needs—daily instead of hourly for non-critical workloads.

Cost Factor	High-Cost Approach	Optimized Approach
Storage	Premium SSD ($$)	Tiered + compressed ($)
Retention	Manual ($$)	Automated (free/low)
Backup Frequency	Hourly ($$)	Daily/weekly ($)

Security Best Practices

Security is a critical aspect of backup management:

Encryption: Enable AES-256 encryption in transit (TLS) and at rest (e.g., AWS KMS).
Immutable Backups: Use WORM-compliant storage (e.g., AWS S3 Object Lock) to prevent tampering.
Access Control: Apply RBAC and IAM policies to restrict backup access; audit with CloudTrail.
Integrity Checks: Validate backups with checksums and periodic test restores.

Security Measure	Description
Encryption	Data encrypted in transit and at rest
Immutable Backups	Backups cannot be altered or deleted
Access Control	Only authorized users can access backups
Integrity Checks	Regular validation and test restores

Availability Considerations

Ensuring backups are available when needed is just as important as creating them:

Multi-Region Replication: Store backups across multiple regions or availability zones.
Disaster Recovery Drills: Regularly test restore procedures to ensure backups are valid.
Immutable Infrastructure: Use Velero with etcd snapshots for cluster-state recovery.

Availability Feature	Description
Multi-Region Storage	Backups stored in multiple geographic locations
Regular Test Restores	Ensures recoverability and backup integrity
Immutable Infrastructure	Prevents accidental or malicious changes

Cost-Security-Availability Tradeoff Table

Goal	High-Cost Approach	Optimized Approach
Storage	Premium SSD ($$)	Tiered + compressed ($)
Security	Custom encryption ($$)	Cloud-managed KMS + IAM ($)
Availability	Real-time replication ($$)	Multi-region + weekly snaps ($$$)

Key Takeaways

Back up both cluster state (etcd) and persistent data (PVs/PVCs).
Use tools like Velero and Kubernetes’ VolumeSnapshot API for automation.
Optimize costs with storage tiering, incremental backups, and automated retention + Storage Lifecycle Management policies.
Ensure security with encryption, immutable backups, and strict access control.
Guarantee availability with multi-region storage and regular test restores.

Resources

By following these guidelines, you can create a robust, cost-effective, and secure backup strategy for your Kubernetes clusters—ensuring your workloads are always protected and recoverable.

Continue to Part2

Kubernetes Scheduling: podAntiAffinity vs. topologySpreadConstraints

Hardeep Singh Tiwana — Wed, 18 Jun 2025 18:37:37 +0000

When it comes to deploying resilient and highly available applications in Kubernetes, scheduling constraints are key. Two powerful tools for controlling pod placement are podAntiAffinity and topologySpreadConstraints. While both help manage pod distribution, they serve different purposes and offer distinct advantages. Let’s break down what each does, how they differ, and when to use them.

Introduction

Imagine you’re running a critical application on Kubernetes. You want to ensure that your pods are spread across different nodes or zones to avoid downtime if a single node fails. This is where scheduling constraints come into play. Kubernetes offers several mechanisms for this, but two of the most important are podAntiAffinity and topologySpreadConstraints.

What Is `podAntiAffinity`?

podAntiAffinity is a scheduling rule that prevents certain pods from being co-located on the same node or topology domain (like a zone). It’s designed for scenarios where you want to maximize fault tolerance by ensuring that no two instances of your application run on the same node or zone.

How it works:

Strict separation: You can specify that a pod should not run on the same node as another pod with a certain label.

Topology key: Uses a topologyKey (e.g., kubernetes.io/hostname for nodes, topology.kubernetes.io/zone for zones).

Enforcement: Can be set as requiredDuringSchedulingIgnoredDuringExecution (strict) or preferredDuringSchedulingIgnoredDuringExecution (best effort).

Use case:

Use podAntiAffinity when you absolutely must prevent pods from being on the same node or zone—such as for database replicas or critical microservices.

What Are `topologySpreadConstraints`?

topologySpreadConstraints are a more flexible way to control pod distribution. Instead of just preventing co-location, they allow you to specify how evenly pods should be distributed across topology domains (nodes, zones, regions).

How it works:

Even distribution: You can define a maxSkew to set the maximum allowable difference in pod count between domains.

Topology key: Uses topologyKey to specify the domain (node, zone, etc.).

Flexibility: You can configure whether to allow scheduling if constraints can’t be met (whenUnsatisfiable).

Hierarchical control: Works across multiple levels (nodes within zones, zones within regions).

Use case:

Use topologySpreadConstraints when you want to balance pod distribution for high availability, load balancing, or cost optimization, and are willing to tolerate some imbalance if necessary.

Comparison Table

Feature	podAntiAffinity	topologySpreadConstraints
Strict separation	Yes	No (but can be close)
Even distribution	Not guaranteed	Yes (configurable)
Topology flexibility	Specific (node, zone)	Hierarchical or flat
Scheduling flexibility	No (can block scheduling)	Yes (can allow skew)

Can You Use Both?

Absolutely! Combining podAntiAffinity and topologySpreadConstraints gives you the best of both worlds: strict separation where needed, and balanced distribution for overall resilience.

When to Use Each

podAntiAffinity: When you must prevent pods from being on the same node or zone (e.g., to avoid single points of failure).

topologySpreadConstraints: When you want to balance pod distribution across your cluster for high availability, load balancing, or cost optimization.

Conclusion

Understanding the differences between podAntiAffinity and topologySpreadConstraints is crucial for designing robust Kubernetes deployments. Use podAntiAffinity for strict separation and topologySpreadConstraints for flexible, balanced distribution. Together, they help you build resilient, highly available applications.

How does VolumeMount work in Kubernetes? What happens in the backend?

Hardeep Singh Tiwana — Sun, 04 May 2025 23:34:06 +0000

Understanding Kubernetes VolumeMounts, PersistentVolumeClaims, and StorageClasses (with YAML Examples)

Persistent storage is essential for stateful applications in Kubernetes. To manage storage dynamically and reliably, Kubernetes uses a combination of VolumeMounts, PersistentVolumeClaims (PVCs), and StorageClasses. In this post, we’ll demystify how these components work together, and provide practical YAML examples to help you implement them in your clusters.

1. VolumeMounts: Connecting Storage to Containers

A VolumeMount specifies where a volume should appear inside a container. It references a volume defined at the Pod level and maps it to a directory inside the container.

YAML Example:

apiVersion: v1
kind: Pod
metadata:
  name: volume-mount-example
spec:
  containers:
    - name: app
      image: nginx
      volumeMounts:
        - name: my-storage
          mountPath: /usr/share/nginx/html
  volumes:
    - name: my-storage
      emptyDir: {}  # Ephemeral storage for demonstration

How it works:

The volume my-storage is mounted inside the nginx container at /usr/share/nginx/html.
Any files written to that path are stored in the emptyDir volume, which is deleted when the Pod is removed.

2. PersistentVolumeClaims and PersistentVolumes: Decoupling Storage from Pods

PersistentVolume (PV):

A cluster-wide resource representing a piece of storage.

PersistentVolumeClaim (PVC):

A request for storage by a user or application.

YAML Example:

# PersistentVolume (PV)
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-example
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /mnt/data  # For demo purposes; use real storage in production

---
# PersistentVolumeClaim (PVC)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-example
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

How it works:

The PV is a piece of storage available in the cluster.
The PVC requests 1Gi of storage with ReadWriteOnce access.
Kubernetes binds the PVC to the PV if their requirements match.

3. StorageClasses: Automating and Tiering Storage

A StorageClass defines how to provision storage dynamically. It specifies the provisioner (such as a cloud provider’s CSI driver) and parameters for the storage backend.

YAML Example:

# Example StorageClass for AWS EBS
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-csi-storageclass # This name is used by PersistentVolumeClaim
provisioner: ebs.csi.aws.com # Example for AWS EBS CSI provisioner
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3 # Or your preferred EBS volume type
  iops: "5000" # Example IOPS
  throughput: "250" # Example Throughput
  encrypted: "true" # Example encryption

How it works:

When a PVC requests storageClassName: ebs-csi-storageclass, Kubernetes uses this StorageClass to dynamically provision a new PV.

4. Bringing It All Together: Pod Using a PVC and StorageClass

Here’s how you’d use a PVC (which in turn uses a StorageClass) in a Pod:

YAML Example:

# PersistentVolumeClaim using a StorageClass
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic-pvc # This name is used by POD in "volumes" section
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: ebs-csi-storageclass # see reference to StorageClass name above

---
# Pod mounting the PVC
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
    - name: my-container
      image: nginx:latest
      volumeMounts:
        - name: my-ebs-volume #This name and name in volumes section should match
          mountPath: /mnt/data
  volumes:
    - name: my-ebs-volume # This name and name in volumeMounts section should match
      persistentVolumeClaim:
        claimName: dynamic-pvc #see reference in PersistentVolumeClaim above

Workflow:

The PVC (dynamic-pvc) requests 5Gi storage using the ebs-csi-storageclass StorageClass.
Kubernetes dynamically provisions a PV using the StorageClass.
The Pod references the PVC in its volumes section.
The PVC is mounted inside the container at /mnt/data.

5. Lifecycle and Backend Process

Provisioning: If dynamic provisioning is used, the StorageClass’s provisioner creates the storage when a PVC is created.
Binding: The PVC is bound to a PV (either static or dynamically provisioned).
Mounting: When the Pod is scheduled, the kubelet mounts the storage to the container’s filesystem at the specified mountPath.
Reclaiming: When the PVC is deleted, the PV’s reclaimPolicy determines if the storage is deleted, retained, or recycled.

Summary Table

Component	Purpose	Defined By	Key Fields
VolumeMount	Mounts storage inside a container	Pod spec (user)	name, mountPath
PersistentVolume	Cluster storage resource	Admin/Kubernetes	capacity, accessModes
PersistentVolumeClaim	Request for storage	User	resources, accessModes, storageClassName
StorageClass	Storage “profile” for dynamic provisioning	Admin	provisioner, parameters

Kubernetes Storage: Common Issues, Troubleshooting, and Limitations

Building on the fundamentals of VolumeMounts, PersistentVolumeClaims, and StorageClasses, it’s crucial to understand the real-world challenges that teams face when running stateful workloads in Kubernetes. Here, we’ll cover common issues, troubleshooting strategies, and key limitations you should be aware of.

Common Issues with Kubernetes Storage

1. PersistentVolumeClaim (PVC) Not Bound

The PVC remains in a Pending state and is not bound to any PersistentVolume (PV).
Causes include mismatched storage size, access modes, or StorageClass between the PVC and available PVs, or insufficient underlying storage resources.

2. Volume Mount Failures

Pods may fail to start with errors related to mounting the volume.
This can be due to incorrect volume definitions, unavailable storage backends, or node failures.

3. Storage Plugin/CSI Issues

Problems with the Container Storage Interface (CSI) driver, such as outdated versions or plugin crashes, can prevent volumes from being provisioned or mounted.

4. Pod Stuck in Pending or CrashLoopBackOff

Storage-related configuration errors can cause pods to remain in Pending or repeatedly crash (CrashLoopBackOff), especially when required volumes are not available or properly mounted.

5. Deletion and Reclaim Policy Problems

PVs may not be deleted or released as expected due to misconfigured reclaim policies, leading to orphaned resources and wasted storage.

Troubleshooting Kubernetes Storage Issues

Step-by-Step Troubleshooting:

Check Pod Events and Status
- Use kubectl describe pod to view events and error messages related to volume mounting or PVC binding.
Inspect PVC and PV Status
- Run kubectl get pvc and kubectl describe pvc to check if the PVC is bound and to see any error messages.
- Use kubectl get pv to examine the status and properties of PersistentVolumes.
Verify StorageClass and CSI Driver
- Ensure the correct StorageClass is referenced and the CSI driver is running and up to date (kubectl get pod -n kube-system | grep csi).
Review Node and Network Health
- Check node status and network connectivity to the storage backend, as network issues can prevent volume attachment.
Check for Resource Constraints
- Ensure there are enough resources (CPU, memory, storage) available on nodes to support the requested volumes.
Zone-aware Auto Scaling
- If your workloads are zone-specific you'll need to create separate nodegroups for each zone. This is because the cluster-autoscaler assumes that all nodes in a group are exactly equivalent. So, for example, if a scale-up event is triggered by a pod which needs a zone-specific PVC (e.g. an EBS volume), the new node might get scheduled in the wrong AZ and the pod will fail to start.
Logs and Observability
- Examine logs from the affected pod and CSI driver for detailed error information. Use monitoring tools to track resource usage and events.
Manual Remediation
- If a node fails, remove it with kubectl delete node to trigger pod rescheduling and volume reattachment.
- Deleting and recreating pods or PVCs can sometimes resolve transient issues.

Limitations of Kubernetes Storage

1. Storage Backend Compatibility

Not all storage solutions support every Kubernetes feature (e.g., ReadWriteMany access mode is not available on many block storage backends).

2. Dynamic Provisioning Constraints

Dynamic provisioning relies on properly configured StorageClasses and CSI drivers. Misconfiguration or lack of support for certain features can lead to failed provisioning. also see Zone-aware Auto Scaling in Troubleshooting section above.

3. Data Durability and Redundancy

Kubernetes itself does not provide data replication or backup; this is the responsibility of the storage backend or external tools.

4. Performance Overheads

Storage performance depends on the underlying infrastructure. Network-attached storage may introduce latency compared to local disks.

5. Scaling and Resource Quotas

Storage scalability is limited by the backend and resource quotas. Over-provisioning or lack of quotas can lead to resource contention and degraded performance.

6. Security and Access Controls

Fine-grained access controls for storage resources may be limited, especially when using some legacy or simple backends.

Example: Troubleshooting a PVC Not Bound

# Check PVC status
kubectl get pvc

# Describe the PVC for detailed events and errors
kubectl describe pvc dynamic-pvc

# Check available PVs and their properties
kubectl get pv

# If needed, check StorageClass and CSI driver status
kubectl get storageclass
kubectl get pod -n kube-system | grep csi

Best Practices

Always match PVC requests (size, access mode, StorageClass) to available PVs.
Monitor pod, PVC, and PV events regularly.
Keep CSI drivers up to date and monitor their health.
Configure storage quotas and limits to avoid resource exhaustion.
Choose storage backends that align with your application’s durability, performance, and scalability needs.

When a Kubernetes Pod that uses a PersistentVolumeClaim (PVC) is deleted-whether due to failure, scaling, or an update-the underlying PersistentVolume (PV) and its data are preserved. Here’s how Kubernetes reattaches the volume to a new Pod:

PVC and PV Binding: The PVC remains bound to the PV as long as the PVC resource exists. The binding is tracked by the claimRef field in the PV, which references the PVC.
Pod Replacement: When a new Pod is created (for example, by a Deployment or StatefulSet), and it references the same PVC in its .spec.volumes, Kubernetes schedules the Pod and ensures the volume is reattached and mounted at the specified path inside the new container.
Volume Attachment: The kubelet on the target node coordinates with the storage backend (via the CSI driver or in-tree volume plugin) to attach the PV to the node where the new Pod is scheduled. Once attached, the volume is mounted into the container at the path specified in volumeMounts.
Data Persistence: Because the PV is persistent and not tied to any single Pod, the new Pod sees the same data as the previous Pod. This enables seamless failover or rolling updates without data loss.

Final Thoughts

Kubernetes storage is powerful and flexible. By combining VolumeMounts, PersistentVolumeClaims, and StorageClasses, you can decouple your applications from the underlying storage, automate provisioning, and ensure your workloads have the storage performance and reliability they need.

DEV Community: Hardeep Singh Tiwana

NAT Gateways Killing Your Container Costs? Amazon ECR VPC endpoints to the Rescue

💡 Key Takeaways

Let's start with the Brutal Math: NAT vs. Endpoints Head-to-Head

NAT Gateway vs VPC Endpoints Cost Comparison

VPC Endpoint Configuration:

NAT Gateway Configuration:

Example use case with assumptions

Prerequisites:

Hunt Down Those Hidden ECR Pull Fees

Using VPC Flow Logs to Track and Validate ECR Traffic Costs

Step 1: Enable VPC Flow Logs

Step 2: Identify Top HTTPS Destinations

Step 3: Identify S3 and ECR Service IP Ranges

Step 4: Calculate NAT Gateway ECR+S3 Traffic

Step 5: Calculate Monthly Cost Impact

Step 6: Validate After VPC Endpoint Deployment

❌ But this validation method has problems ❌

So what would correct validation queries look like?

🎯 What success looks like 🎯

Step 7: Correlate with Cost Explorer

Understanding the Three-Endpoint Architecture

Before and After: Understanding The Traffic Flow

Before endpoints

After, deploy

Cost After VPC Interface Endpoints: $$1,823.80/month

New Cost Breakdown:

NAT Gateway Costs:

What You Need to Deploy:

Required Interface Endpoints (per AZ):

Required Gateway Endpoint (VPC-wide - For ECR image layer storage - FREE):

A quick and dirty example Terraform code"

Validation:

💡 Pro Tip: The S3 Gateway endpoint is critical but FREE.

Why Does This Work So Well?

Another Implementation detail to keep in mind:

Troubleshooting:

Troubleshooting Flow Logs Analysis

Reality Check

Next Steps

Reducing EKS cross-AZ cost using Cilium

The Challenge of Cross-AZ Traffic on AWS

How Cilium Limits Cross-AZ Transfer Costs

Practical Cilium Setup on AWS EKS

Benefits of Using Cilium for Cross-AZ Optimization

Challenges to Consider

Conclusion

Further Reading and Official Resources

Istio in Simple English: Imagine Your Apps Living in a Smart City 🚦🏙️

Welcome to Istio City: The Smart Traffic & Security Authority 🚦🏙️

The Istio Smart City Architecture: Two Big Departments 🏢🧠

1. Control Plane: The City Hall 🏛️

2. Data Plane: The Traffic Controllers on the Roads 🚓

The Traffic Tools of Istio City 🛠️

Sidecar Mode 🛺 vs Ambient Mesh Mode 🚕

Sidecar Mode 🛺 (Each Shop Has Its Own Traffic Cop)

Ambient Mesh Mode 🚕 (Smart Roads with Patrol Cars)

Why Choose Istio? The City’s Edge In Microservices Management 🌟

Challenges in Running Istio City 🚧

Diagram: Istio Smart City Analogy

Closing Thoughts 🍞➡️🚦➡️🌆

Further Reading

Let me know if you found this helpful or have any questions! :)

Containers in Plain English: The Shipping Container of Tech 🚢🍱

Overview

What is a Container? (The Bento Box of Software 🍱)

Key Technologies (with Friendly Analogies)

How Containers Are Different from Traditional Boxes (Virtual Machines)

Why People Love Containers (Benefits)

Some Real Challenges (The Sour Bits)

Wrapping Up

📖🧠📚Sources, Guides, and Inspiration📖🧠📚:

Understanding Kubernetes in Simple English: What would Kubernetes look like if it was a global restaurant franchise?

Welcome to Kubernetes Kitchen 🍳🎛️!

Kubernetes: The Management Backbone 🍳🎛️

The Control Plane: Where Strategy Happens 🍳🎛️

On the Kitchen Floor: The Worker Nodes 🍽️🏢

Special Components for Smooth Operations

🚀 Scaling the Chain: A Day in the Franchise 🚀

Kubernetes Kitchen: Visualized

What Is `podAntiAffinity`?

What Are `topologySpreadConstraints`?