Is AWS Down? How to Check AWS Status and Fix Issues
Last Updated: February 13, 2026
Amazon Web Services (AWS) powers approximately 33% of the global cloud infrastructure market and supports millions of businesses worldwide, from startups to Fortune 500 companies. When AWS goes down, the impact is catastrophic: e-commerce sites lose revenue, SaaS applications become unavailable, APIs stop responding, mobile apps crash, and critical business operations grind to a halt. Whether you're a DevOps engineer managing production infrastructure, a developer deploying applications, a CTO monitoring business-critical systems, or a business owner relying on AWS for operations, this guide helps you quickly determine if AWS is experiencing an outage and what to do about it.
Quick Status Check: Is AWS Down Right Now?
Before troubleshooting, verify AWS's current status:
- Check API Status Check: Visit apistatuscheck.com/api/aws for real-time AWS monitoring across all major services
- AWS Service Health Dashboard: Check AWS's official status at status.aws.amazon.com
- Test Multiple Regions: AWS operates in 30+ regions — check if your specific region (us-east-1, eu-west-1, etc.) is affected
- Try Different Services: Test EC2 vs. S3 vs. Lambda independently — partial outages are common
- Check DownDetector: downdetector.com/status/amazon-web-services for user-reported issues
If multiple sources confirm problems, AWS is likely experiencing an outage. If only you're affected, follow the troubleshooting steps below.
💡 Pro tip: Sign up for free alerts to get notified the moment AWS goes down — critical for SRE teams running production workloads, e-commerce during peak sales, and financial services with strict SLAs.
AWS Infrastructure: What Can Break
AWS is composed of 200+ services that can fail independently. The most critical services include:
| Service | What It Does | Impact When Down |
|---|---|---|
| EC2 | Virtual servers and compute instances | Applications can't start, scale, or process requests |
| S3 | Object storage for files, backups, static assets | Websites break, file uploads fail, data inaccessible |
| Lambda | Serverless function execution | APIs fail, automation stops, event processing halts |
| RDS | Managed relational databases (MySQL, PostgreSQL, etc.) | Data layer fails, read/write operations timeout |
| CloudFront | Global CDN for content delivery | Websites slow down or show 5xx errors globally |
| Route 53 | DNS and domain management | Domain names stop resolving, traffic can't reach apps |
| API Gateway | API management and routing | REST/WebSocket APIs return errors or timeout |
| DynamoDB | NoSQL database | Real-time data access fails, high-traffic apps crash |
| ECS/EKS | Container orchestration (Docker/Kubernetes) | Containers can't start, scale, or communicate |
| SQS/SNS | Message queuing and notifications | Async processing stops, notifications don't send |
| Elastic Load Balancer | Traffic distribution across instances | Traffic routing fails, health checks stop working |
| IAM | Identity and access management | Authentication fails, API calls get permission errors |
| CloudWatch | Monitoring and logging | Can't see metrics, alarms don't trigger, blind to issues |
| VPC | Virtual private cloud networking | Network connectivity lost between resources |
Key insight: AWS often has region-specific outages where one region (e.g., us-east-1) fails while others remain operational. The us-east-1 region (Northern Virginia) is AWS's largest and most prone to cascading failures. Multi-region architectures are critical for high availability.
Common AWS Error Messages and What They Mean
| Error | Meaning | Fix |
|---|---|---|
EC2: InsufficientInstanceCapacity |
No available capacity in availability zone | Try different AZ or instance type, or wait for capacity |
S3: 503 SlowDown |
Too many requests to S3 bucket | Implement exponential backoff, use CloudFront CDN |
S3: Access Denied |
IAM permissions issue or bucket policy | Verify IAM policy, bucket ACL, and block public access settings |
Lambda: Service Unavailable |
Lambda control plane outage | Check AWS status, retry with exponential backoff |
Lambda: Task timed out after X seconds |
Function exceeded timeout limit | Increase timeout in function config or optimize code |
RDS: could not connect to server |
Database unreachable or failed over | Check security groups, verify RDS is running, check region status |
CloudFront: 502 Bad Gateway |
Origin (EC2/S3/ALB) not responding | Check origin health, verify security groups allow CloudFront |
CloudFront: 503 Service Unavailable |
CloudFront distribution or origin issue | Check origin status, verify custom headers, check AWS status |
Route 53: SERVFAIL |
DNS resolution failure | Check hosted zone config, verify nameservers, check AWS status |
API Gateway: 502 Bad Gateway |
Lambda or backend integration failure | Check Lambda logs, verify integration timeout settings |
API Gateway: 429 Too Many Requests |
Rate limit exceeded | Implement throttling, request limit increase |
DynamoDB: ProvisionedThroughputExceededException |
Exceeded read/write capacity | Enable auto-scaling or use on-demand billing |
DynamoDB: ServiceUnavailable |
DynamoDB control plane issue | Check AWS status, implement retry logic |
ECS: Service is unable to place a task |
No EC2 capacity or Fargate capacity exhausted | Check cluster capacity, try different AZ |
EKS: cannot reach cluster |
Control plane unreachable | Check AWS status for EKS, verify security groups and VPC config |
SQS: QueueDoesNotExist |
Queue deleted or wrong region | Verify queue name and region, check IAM permissions |
SNS: EndpointDisabled |
Endpoint bounced or failed too many times | Re-subscribe endpoint, verify endpoint is reachable |
IAM: AccessDenied |
Insufficient permissions | Review IAM policy, check resource-based policies |
IAM: InvalidClientTokenId |
AWS credentials invalid or rotated | Refresh credentials, check access key is active |
VPC: Network is unreachable |
Routing table, NAT gateway, or IGW issue | Check route tables, verify NAT/IGW attached and running |
Common AWS CLI Error Messages
| CLI Error | Meaning | Fix |
|---|---|---|
Unable to locate credentials |
AWS CLI not configured | Run aws configure with access key and secret key |
Could not connect to the endpoint URL |
Wrong region or endpoint | Verify --region flag matches resource location |
An error occurred (DryRunOperation) |
Dry-run mode enabled | Remove --dry-run flag to execute for real |
A client error (RequestLimitExceeded) |
API throttling | Implement exponential backoff and retry logic |
You are not authorized to perform this operation |
IAM permission missing | Check IAM policy for required actions |
Invalid ID or does not exist
|
Resource not found or wrong region | Verify resource ID and region parameter |
SSL validation failed |
Certificate issue | Update AWS CLI or use --no-verify-ssl (not recommended for production) |
timed out |
Network connectivity or AWS service issue | Check internet, verify security groups, check AWS status |
Step-by-Step Troubleshooting
Step 1: Confirm It's Not Just You
- Check apistatuscheck.com/api/aws for real-time status across all AWS services
- Visit status.aws.amazon.com and filter by your region
- Search Twitter/X for "AWS down" or "us-east-1 down" to see if others report issues
- Ask teammates or check internal monitoring (CloudWatch, Datadog, New Relic)
- Try from a different network or device (VPN, mobile hotspot)
- Check if other AWS regions are affected (try launching resources in us-west-2 if us-east-1 fails)
- Review DownDetector for geographic patterns and affected services
Step 2: Identify Which AWS Service Is Affected
- EC2: Try launching a new instance or connecting to existing ones via SSH/RDP
-
S3: Test bucket access with AWS CLI:
aws s3 ls s3://your-bucket-name - Lambda: Check function execution logs in CloudWatch or trigger a test event
- RDS: Try connecting to database with mysql/psql client or check RDS console
- CloudFront: Test distribution URL directly and check origin health
-
Route 53: Test DNS resolution:
nslookup yourdomain.comordig yourdomain.com - API Gateway: Call API endpoint with curl and check response codes
-
DynamoDB: Test table access with AWS CLI:
aws dynamodb scan --table-name YourTable --limit 1
Step 3: Fix EC2 Instance Issues
EC2 Instances Not Launching
# Check available capacity in different AZs
aws ec2 describe-instance-type-offerings \
--location-type availability-zone \
--filters Name=instance-type,Values=t3.medium \
--region us-east-1
# Try launching in different AZ
aws ec2 run-instances \
--image-id ami-xxxxx \
--instance-type t3.medium \
--subnet-id subnet-xxxxx \
--region us-east-1
# Use different instance type if capacity unavailable
# Try t3.small instead of t3.medium, or switch to m5/c5 family
Cannot Connect to EC2 Instance
# Verify instance is running
aws ec2 describe-instances --instance-ids i-xxxxx
# Check security group allows SSH (port 22) or RDP (port 3389)
aws ec2 describe-security-groups --group-ids sg-xxxxx
# Test network connectivity
ping <instance-public-ip>
telnet <instance-public-ip> 22
# Connect via Session Manager (doesn't require SSH)
aws ssm start-session --target i-xxxxx
# Check system logs for boot issues
aws ec2 get-console-output --instance-id i-xxxxx
EC2 Status Checks Failing
- System Status Check Failed: AWS infrastructure issue — stop and start instance to migrate to new host
- Instance Status Check Failed: OS or application issue — check console output, reboot instance
- Auto Scaling not launching instances: Check capacity, verify launch template, check IAM service role
- Spot instances terminated: Spot price exceeded bid or AWS reclaimed capacity — use On-Demand or diversify instance types
Step 4: Fix S3 Access Issues
S3 Access Denied Errors
# Test bucket access
aws s3 ls s3://your-bucket-name
# Check bucket policy and ACL
aws s3api get-bucket-policy --bucket your-bucket-name
aws s3api get-bucket-acl --bucket your-bucket-name
# Verify IAM permissions
aws iam get-user-policy --user-name your-username --policy-name your-policy
# Test with presigned URL (bypasses IAM)
aws s3 presign s3://your-bucket-name/file.txt --expires-in 3600
# Check Block Public Access settings (can override bucket policy)
aws s3api get-public-access-block --bucket your-bucket-name
S3 Slowdown or Timeout
- 503 SlowDown: Reduce request rate, implement exponential backoff
- Use multipart upload: For files >100MB, use multipart upload API
- Enable Transfer Acceleration: Faster uploads through CloudFront edge locations
- Use S3 in same region: Cross-region S3 access is slower
- Implement retry logic: S3 occasionally has transient errors — retry with backoff
S3 Website or Static Hosting Issues
# Verify bucket website configuration
aws s3api get-bucket-website --bucket your-bucket-name
# Check if index.html exists and is public
aws s3 ls s3://your-bucket-name/index.html
aws s3api get-object-acl --bucket your-bucket-name --key index.html
# Test website endpoint directly
curl http://your-bucket-name.s3-website-us-east-1.amazonaws.com
Step 5: Fix Lambda Function Failures
Lambda Timeouts
# Check function logs in CloudWatch
aws logs tail /aws/lambda/your-function-name --follow
# Increase timeout (max 15 minutes)
aws lambda update-function-configuration \
--function-name your-function-name \
--timeout 300
# Increase memory (more memory = more CPU)
aws lambda update-function-configuration \
--function-name your-function-name \
--memory-size 1024
# Check if function is throttled
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Throttles \
--dimensions Name=FunctionName,Value=your-function-name \
--start-time 2026-02-13T00:00:00Z \
--end-time 2026-02-13T23:59:59Z \
--period 3600 \
--statistics Sum
Lambda Cold Start Issues
- Provisioned Concurrency: Pre-warm functions to eliminate cold starts
- Keep functions warm: Use CloudWatch Events to invoke every 5 minutes (costs apply)
- Optimize package size: Smaller deployment packages start faster
- Use Lambda layers: Share common dependencies across functions
- Switch to container images: Better cold start performance for large dependencies
Lambda VPC Connectivity Issues
- ENI creation delays: Lambda in VPC can take 10-30 seconds for first invocation
- NAT Gateway required: Private subnet Lambda needs NAT Gateway for internet access
- Security groups: Verify Lambda security group allows outbound traffic
- VPC endpoints: Use VPC endpoints for S3, DynamoDB to avoid NAT Gateway costs
- Consider removing VPC: If you don't need private resources, run Lambda outside VPC
Step 6: Fix RDS Connection Failures
Cannot Connect to RDS Database
# Check RDS instance status
aws rds describe-db-instances --db-instance-identifier your-db-instance
# Verify security group allows traffic from your IP/instance
aws ec2 describe-security-groups --group-ids sg-xxxxx
# Test connection from EC2 instance in same VPC
mysql -h your-db-instance.xxxxx.us-east-1.rds.amazonaws.com -u admin -p
# Check if RDS is in same VPC/subnet
aws rds describe-db-subnet-groups --db-subnet-group-name your-subnet-group
# Enable public accessibility (not recommended for production)
aws rds modify-db-instance \
--db-instance-identifier your-db-instance \
--publicly-accessible
RDS Performance Issues
-
Too many connections: Check
max_connectionsparameter and connection pooling - High CPU/memory: Upgrade instance class or optimize queries
- Storage full: Increase allocated storage or enable autoscaling storage
- Read replica lag: Check replication lag in CloudWatch metrics
- IOPS exhausted: Upgrade to Provisioned IOPS or gp3 storage
- Parameter group issues: Review database parameter group settings
RDS Failover or Unavailability
- Multi-AZ failover: Automatic failover takes 60-120 seconds — use endpoint, not IP
- Backup restoration: Point-in-time recovery creates new instance, update connection strings
- Maintenance window: AWS performs updates during maintenance window — schedule carefully
- Snapshot restoration: Restoring from snapshot takes 10-30 minutes depending on size
Step 7: Fix CloudFront 5xx Errors
502 Bad Gateway
# Check origin health (EC2, ALB, S3)
curl -I https://your-origin.example.com
# Verify origin security group allows CloudFront IP ranges
# Download from: https://ip-ranges.amazonaws.com/ip-ranges.json
# Check CloudFront distribution settings
aws cloudfront get-distribution --id EXAMPLEID
# Review origin response timeout (default 30 seconds)
# Increase if origin is slow to respond
# Check custom headers required by origin
# Verify CloudFront sends expected headers
503 Service Unavailable
- Origin unavailable: EC2/ALB/S3 is down or overloaded
- Lambda@Edge timeout: Edge function took too long to execute
- Distribution disabled: Check if distribution is deployed and enabled
- Origin shield overloaded: Temporary shield issue — usually resolves quickly
504 Gateway Timeout
- Origin slow: Origin took >60 seconds to respond (CloudFront maximum)
- Optimize origin: Reduce origin response time or cache more aggressively
- Check origin keep-alive: Ensure origin supports persistent connections
- Lambda@Edge timeout: Edge function exceeded execution limit
Step 8: Fix Route 53 DNS Issues
Domain Not Resolving
# Test DNS resolution
nslookup yourdomain.com
dig yourdomain.com
# Check Route 53 hosted zone
aws route53 list-hosted-zones
aws route53 list-resource-record-sets --hosted-zone-id Z1234567890ABC
# Verify nameservers match registrar
aws route53 get-hosted-zone --id Z1234567890ABC
whois yourdomain.com
# Test from different DNS servers
dig @8.8.8.8 yourdomain.com
dig @1.1.1.1 yourdomain.com
# Check health checks if using failover routing
aws route53 get-health-check-status --health-check-id abc12345
DNS Propagation Delays
- TTL not expired: DNS changes take TTL seconds to propagate (default 300s)
- Registrar nameserver update: Nameserver changes take 24-48 hours globally
- ISP DNS caching: Some ISPs ignore TTL and cache longer
- Flush local DNS cache: Clear your computer's DNS cache to see changes immediately
Health Check Failures
- Endpoint unreachable: Verify target is accessible from internet
- String matching: If using string matching, ensure response contains expected string
- Health check interval: Shorter intervals (10s vs 30s) detect failures faster
- Health check regions: Route 53 checks from multiple regions — all must pass
Step 9: Fix API Gateway Errors
502 Bad Gateway
# Check Lambda integration
aws apigateway get-integration \
--rest-api-id abc123 \
--resource-id xyz789 \
--http-method GET
# Review Lambda execution logs
aws logs tail /aws/lambda/your-function-name --follow
# Verify Lambda has correct permissions
aws lambda get-policy --function-name your-function-name
# Test Lambda directly (bypass API Gateway)
aws lambda invoke \
--function-name your-function-name \
--payload '{"test": "data"}' \
response.json
429 Too Many Requests
- Throttle limits exceeded: Default 10,000 requests/second per account per region
- Request usage plan increase: Contact AWS support for higher limits
- Implement caching: Enable API Gateway caching to reduce backend load
- Use client-side throttling: Implement retry with exponential backoff
Integration Timeout
- 29-second limit: API Gateway has hard 29-second timeout
- Optimize backend: Lambda/backend must respond within 29 seconds
- Use async pattern: For long operations, return immediately and process asynchronously
- Increase Lambda timeout: Ensure Lambda timeout is less than 29 seconds
Step 10: Network and VPC Troubleshooting
VPC Connectivity Issues
# Check route tables
aws ec2 describe-route-tables --filters "Name=vpc-id,Values=vpc-xxxxx"
# Verify Internet Gateway attached
aws ec2 describe-internet-gateways --filters "Name=attachment.vpc-id,Values=vpc-xxxxx"
# Check NAT Gateway status
aws ec2 describe-nat-gateways --filter "Name=vpc-id,Values=vpc-xxxxx"
# Test security group rules
aws ec2 describe-security-groups --group-ids sg-xxxxx
# Check network ACLs (often overlooked)
aws ec2 describe-network-acls --filters "Name=vpc-id,Values=vpc-xxxxx"
# Use VPC Flow Logs to debug traffic
aws ec2 describe-flow-logs --filter "Name=resource-id,Values=vpc-xxxxx"
Common VPC Misconfigurations
- No route to internet: Missing Internet Gateway or NAT Gateway route
- Wrong subnet type: Private subnet can't reach internet without NAT
- Security group vs NACL: Security groups are stateful, NACLs are stateless
- Overlapping CIDR blocks: VPC peering fails if CIDR blocks overlap
- VPC endpoint missing: S3/DynamoDB access from private subnet needs VPC endpoint
Step 11: IAM Permission Errors
Access Denied Troubleshooting
# Decode authorization error message
aws sts decode-authorization-message --encoded-message <error-message>
# Check current user identity
aws sts get-caller-identity
# List attached policies
aws iam list-attached-user-policies --user-name your-username
aws iam list-user-policies --user-name your-username
# Simulate IAM policy
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::123456789012:user/your-username \
--action-names s3:GetObject \
--resource-arns arn:aws:s3:::your-bucket-name/*
Common IAM Issues
- Explicit deny wins: Deny statements override allow statements
- Resource-based policies: S3 buckets, Lambda functions have their own policies
- Service Control Policies: SCPs can block actions even if IAM policy allows
- Session policies: Assumed roles can have additional restrictions
- Permission boundaries: Limit maximum permissions of IAM entities
- MFA required: Some actions require MFA authentication
Step 12: Check AWS Service Health Dashboard
If AWS is genuinely down:
- Visit status.aws.amazon.com
- Filter by your region (us-east-1, eu-west-1, etc.)
- Check "Service Health" tab for current incidents
- Review "Event Log" for historical outages
- Subscribe to RSS feed for specific services and regions
- Most AWS outages resolve within 1-4 hours
- Set up alerts on API Status Check to know when it's back
Historical AWS Outages
Notable Incidents
December 2021 — us-east-1 Multi-Service Outage
The largest AWS outage in years affected us-east-1 for 7+ hours. EC2, Lambda, RDS, and DynamoDB all degraded simultaneously. The incident took down Netflix, Disney+, Robinhood, Coinbase, and thousands of other services. Root cause was internal network device failure that cascaded across availability zones. AWS's status page initially showed "green" for 30+ minutes while services were clearly down.
November 2020 — Kinesis Data Streams Failure
Kinesis outage in us-east-1 lasted 8+ hours and cascaded to dependent services including CloudWatch, Lambda, and Cognito. The incident prevented new service deployments across AWS because internal tools relied on Kinesis. Companies couldn't even diagnose the issue because CloudWatch logging was broken. This outage exposed AWS's own internal dependencies on shared infrastructure.
September 2020 — Cognito Authentication Outage
AWS Cognito failed globally for 4+ hours, preventing users from logging into thousands of applications. Mobile apps, SaaS platforms, and consumer applications all lost authentication capabilities. The incident occurred during US business hours, maximizing impact. Companies with custom authentication systems were unaffected, highlighting the risk of fully managed services.
July 2019 — us-east-1 EC2 Overheating
Physical infrastructure overheating in a single us-east-1 data center caused EC2 instance failures for 6+ hours. Instances randomly terminated without warning. Elastic Load Balancers failed to route traffic correctly. Auto Scaling groups couldn't launch replacement instances due to capacity constraints. The incident demonstrated that even AWS's physical infrastructure can fail catastrophically.
February 2017 — S3 Total Outage in us-east-1
S3 went completely offline in us-east-1 for 4+ hours due to a typo in a maintenance command. The outage took down websites, APIs, and services across the internet. AWS's own status page couldn't update because it used S3 for storage. Companies learned that "S3 is always available" was not true, and multi-region replication became standard practice.
September 2015 — DynamoDB Outage
DynamoDB experienced a 5+ hour outage in us-east-1 affecting high-traffic applications and gaming platforms. Read and write operations failed across multiple availability zones. AWS's incident response was slow, with initial status updates taking 90+ minutes. The outage demonstrated that even NoSQL "infinitely scalable" databases can fail completely.
October 2012 — Hurricane Sandy Impact
While not a service failure, Hurricane Sandy caused widespread internet connectivity issues affecting AWS us-east-1. Some customers lost connectivity to their instances despite AWS infrastructure remaining operational. The incident highlighted the importance of multi-region architectures for disaster recovery.
April 2011 — EBS Outage (Original us-east-1 Disaster)
The infamous EBS outage that lasted multiple days and destroyed customer data. A network misconfiguration triggered a cascading failure in EBS replication. Many startups lost their entire databases permanently. AWS learned from this incident and improved EBS reliability significantly, but it remains a cautionary tale about cloud infrastructure risks.
Outage Patterns
- us-east-1 most vulnerable: AWS's largest and oldest region fails more frequently
- Cascading failures: One service failure (Kinesis) often breaks dependent services (CloudWatch, Lambda)
- Internal dependencies: AWS's own tools rely on AWS services, creating circular dependencies during outages
- Status page delays: AWS status page often shows "green" for 15-60 minutes after actual outage begins
- Communication gaps: Initial incident communication is often vague and delayed
- Multi-AZ not enough: Even multi-AZ deployments can fail during region-wide outages
- Shared infrastructure: Managed services (RDS, Lambda) share underlying infrastructure that can fail simultaneously
- Peak hours: Outages during US business hours (9 AM - 5 PM EST) have maximum impact
What to Use When AWS Is Down
| Need | Alternative |
|---|---|
| Compute (EC2) | Google Cloud Compute Engine, Azure VMs, DigitalOcean Droplets |
| Object Storage (S3) | Google Cloud Storage, Azure Blob Storage, Backblaze B2, Cloudflare R2 |
| Serverless (Lambda) | Google Cloud Functions, Azure Functions, Cloudflare Workers |
| Database (RDS) | Google Cloud SQL, Azure Database, self-hosted on other clouds |
| CDN (CloudFront) | Cloudflare, Fastly, Akamai, Google Cloud CDN |
| DNS (Route 53) | Cloudflare DNS, Google Cloud DNS, NS1, Dyn |
| Container orchestration | Google GKE, Azure AKS, self-hosted Kubernetes |
| Message queues (SQS) | Google Pub/Sub, Azure Service Bus, RabbitMQ, Redis |
| NoSQL (DynamoDB) | Google Firestore, Azure Cosmos DB, MongoDB Atlas |
| Static site hosting | Netlify, Vercel, Cloudflare Pages, GitHub Pages |
For DevOps/SRE Teams During AWS Outages
If you're running production infrastructure on AWS:
- Multi-region architecture: Deploy critical services across 2+ AWS regions
- Multi-cloud strategy: Use Google Cloud or Azure as failover for critical workloads
- Health checks: Implement deep health checks that test actual functionality, not just HTTP 200
- Circuit breakers: Automatically fail over to backup systems when AWS degrades
- Status monitoring: Integrate API Status Check into incident response workflows
- Runbooks: Document manual failover procedures for AWS outages
- Cost awareness: Understand that multi-region costs 2-3x more but prevents revenue loss
- Terraform/IaC: Infrastructure as code enables rapid deployment to alternative regions/clouds
- Data replication: Continuously replicate critical data across regions (S3 cross-region, RDS read replicas)
- Communication plan: Have status page ready to inform customers of AWS-related issues
For Development Teams During Outages
- Check region health: Try deploying to us-west-2 or eu-west-1 if us-east-1 is down
- Local development: Use LocalStack, AWS SAM local, or Docker Compose to work offline
- Test environments: Maintain test environment in different region from production
- Cache AWS CLI calls: Don't repeatedly query AWS APIs during outages (rate limiting)
- Document the outage: Log timestamps, error messages, and impact for post-incident reviews
- Alternative AWS accounts: Some teams maintain backup AWS accounts in different regions
For Businesses Relying on AWS
- Financial impact assessment: Calculate cost per hour of AWS downtime (e-commerce, SaaS revenue)
- SLA awareness: AWS doesn't guarantee 100% uptime — understand your actual SLA
- Business continuity plan: Document procedures for operating during AWS outages
- Customer communication: Prepare status page updates explaining AWS-related incidents
- Insurance: Consider cyber insurance that covers cloud provider outages
- Vendor diversification: Don't put all infrastructure eggs in one cloud provider basket
- Monitoring investment: Track AWS uptime with API Status Check
- Legal contracts: Understand AWS's limited liability for service disruptions
How API Status Check Helps Monitor AWS
API Status Check provides comprehensive AWS monitoring:
- Real-time uptime tracking: Monitor EC2, S3, Lambda, RDS, CloudFront, Route 53, and 20+ AWS services simultaneously
- Instant alerts: Email, SMS, Slack, Discord, webhook notifications within seconds of AWS degradation
- Region-specific monitoring: Track us-east-1, eu-west-1, ap-southeast-1, and all AWS regions independently
- Historical uptime data: See AWS outage trends, frequency, and duration over months/years
- Service-level granularity: Monitor specific services (Lambda in us-east-1) vs. entire regions
- API response time: Measure AWS API latency and detect performance degradation early
- Multi-region dashboards: Compare uptime across regions to identify geographic issues
- Integration support: Connect with PagerDuty, OpsGenie, Datadog, and incident management tools
- Compliance reporting: Generate uptime reports for SLA compliance and audits
- Cost correlation: Understand how AWS outages impact your business revenue and operations
Why This Matters for AWS Users
For SRE/DevOps Teams:
- Get alerted before customers complain about downtime
- Correlate application failures with AWS service health
- Make informed decisions during incidents (failover vs. wait)
- Track AWS reliability for vendor accountability
- Prove to stakeholders that outages were AWS-caused, not team-caused
For E-commerce/SaaS:
- Minimize revenue loss by detecting AWS issues instantly
- Communicate proactively with customers during AWS outages
- Understand AWS uptime trends for capacity planning
- Justify multi-region investments with historical outage data
For Engineering Managers:
- Track AWS as vendor risk in compliance frameworks
- Measure AWS impact on team productivity and on-call burden
- Make data-driven decisions about multi-cloud strategy
- Include AWS status in incident post-mortems
For CTOs/Decision Makers:
- Understand true cost of AWS downtime (revenue + reputation)
- Evaluate AWS vs. Google Cloud vs. Azure based on real uptime data
- Justify infrastructure investments (multi-region, multi-cloud)
- Present AWS reliability metrics to board and investors
Try it free — monitor AWS and 100+ other critical services with instant alerts.
Frequently Asked Questions
Is AWS down right now?
Check apistatuscheck.com/api/aws for real-time AWS status across all major services and regions, or visit AWS's official status page at status.aws.amazon.com. Also search Twitter/X for "AWS down" or your specific region like "us-east-1 down". If multiple sources confirm issues, AWS is likely experiencing an outage. Remember that AWS often has region-specific outages — us-east-1 may be down while eu-west-1 works perfectly.
Why can't I connect to my EC2 instance?
First, check if EC2 is experiencing an outage in your region at status.aws.amazon.com. If it's just you, common causes include: security group not allowing SSH (port 22) or RDP (port 3389) from your IP address, instance stopped or terminated, incorrect SSH key, VPN/firewall blocking access, or instance status checks failing. Try connecting via AWS Systems Manager Session Manager which doesn't require SSH port access.
Why is my Lambda function timing out?
Lambda timeouts are usually caused by: function exceeding configured timeout limit (max 15 minutes), cold starts in VPC (can add 10-30 seconds), inefficient code or database queries, waiting for external API that's slow/down, or insufficient memory allocation (Lambda CPU scales with memory). Check CloudWatch Logs for execution duration. During AWS outages, Lambda may experience degraded performance or "Service Unavailable" errors — check AWS status page for Lambda service health.
How long do AWS outages usually last?
Minor AWS outages typically resolve within 1-2 hours. Major regional outages (like us-east-1 incidents) can last 4-8 hours. The worst historical outages have lasted 24+ hours for complete service restoration. Service-specific outages (just Lambda or just S3) tend to resolve faster than multi-service cascading failures. Set up alerts at apistatuscheck.com to know immediately when services recover.
Why is S3 giving me Access Denied errors?
S3 Access Denied errors are usually caused by: missing IAM permissions for s3:GetObject or s3:PutObject, bucket policy blocking your request, S3 Block Public Access settings preventing access, incorrect bucket name or region, requester pays bucket requiring request payer header, or VPC endpoint policy restrictions. Use aws s3api get-bucket-policy and aws iam simulate-principal-policy to debug. During S3 outages, you may see 503 errors instead of Access Denied.
Why is CloudFront returning 502 or 503 errors?
CloudFront 502/503 errors typically indicate: origin (EC2/ALB/S3) is down or unreachable, origin security group not allowing CloudFront IP ranges, origin taking >60 seconds to respond (timeout), custom SSL certificate invalid or expired, or Lambda@Edge function failing. Check your origin health first with direct curl request. During AWS outages affecting EC2 or S3, CloudFront will return 5xx errors because the origin is unavailable. Use API Status Check to confirm if it's an AWS-wide issue.
Why isn't my Route 53 domain resolving?
DNS resolution failures are usually caused by: nameservers at registrar don't match Route 53 hosted zone, DNS records incorrectly configured (wrong IP, missing A record), TTL hasn't expired after recent changes (wait TTL seconds), Route 53 health checks failing for failover routing, or registrar account locked/suspended. Use dig yourdomain.com and whois yourdomain.com to diagnose. During Route 53 outages, DNS queries may return SERVFAIL or timeout completely.
What should I do during an AWS outage?
Check status.aws.amazon.com to confirm the outage and affected services/regions. If you have multi-region setup, failover to healthy region. Communicate with customers via status page about AWS-related issues. Avoid making infrastructure changes during outages (deployments, scaling) as they may fail unpredictably. Monitor API Status Check or AWS status page for recovery updates. Document the incident for post-mortem review. For critical services, consider having manual procedures ready (static maintenance page, direct database access, etc.).
How can I prevent AWS outages from affecting my business?
Deploy critical services across multiple AWS regions (multi-region architecture). Use multiple availability zones within each region (multi-AZ). Consider multi-cloud strategy with Google Cloud or Azure as fallback. Implement health checks and automatic failover with Route 53 or external DNS. Use CloudFront CDN to cache content and reduce origin dependency. Maintain database replicas in different regions. Have runbooks for manual failover procedures. Monitor AWS health with API Status Check for early warning. Remember: multi-region costs 2-3x more but prevents 100% revenue loss during regional outages.
Why does AWS status page show green when my services are down?
AWS's status page often has 15-60 minute delays before showing incidents. The status page itself has failed during outages (S3 outage in 2017). AWS defines "degraded" differently than complete failure — your use case may be broken while AWS shows yellow, not red. Some AWS internal services affect customers but aren't shown on status page. Use third-party monitoring like API Status Check, DownDetector, and Twitter/X to get real-time community reports. Trust your own monitoring over AWS status page during suspected outages.
Never miss an AWS outage again. Get free instant alerts when AWS or any critical service goes down. Essential for SRE teams, e-commerce platforms, SaaS applications, and any business running production workloads on AWS.
Top comments (0)