I have conducted over 200 cloud architect interviews. These are the questions that actually separate candidates who get offers from those who do not.
Architecture & Design (Asked in Every Interview)
1. How would you design a multi-region disaster recovery strategy?
Strong answer: Define RPO/RTO first. Active-passive for cost optimization, active-active for zero downtime. Use Route 53 health checks for failover. Database: Aurora Global Database or DynamoDB Global Tables. Mention the cost tradeoff explicitly.
2. Explain the difference between horizontal and vertical scaling. When would you choose each?
Vertical: bigger instance (simple, has a ceiling). Horizontal: more instances (complex, unlimited). Most production workloads need horizontal. Stateless services scale horizontally. Databases often start vertical then move to read replicas.
3. How do you design for high availability?
Multi-AZ at minimum, multi-region for critical systems. No single points of failure. Auto-scaling groups. Health checks at every layer. Database replication. Load balancer across AZs. Mention the "blast radius" concept.
Security (Asked 90% of the Time)
4. How do you implement least privilege access in AWS?
IAM policies with specific actions on specific resources. No wildcards in production. Use IAM roles over access keys. SCPs at the org level. Regular access reviews. Permission boundaries for delegated administration.
5. How would you handle secrets management?
AWS Secrets Manager or HashiCorp Vault. Never in code, never in environment variables on disk. Automatic rotation. Audit trail. Application retrieves secrets at runtime, not build time.
Cost Optimization (Increasingly Common)
6. A team's AWS bill jumped 40% last month. How do you investigate?
Cost Explorer by service → by tag → by usage type. Check for: untagged resources, oversized instances, idle resources, missing Reserved Instance coverage, S3 lifecycle policies, NAT Gateway data transfer. I once found a $3,000/month Elasticsearch cluster running in dev that nobody was using.
Behavioral / Leadership
7. Tell me about a time you had to push back on a technical decision from leadership.
Use STAR format. Show data-driven reasoning. Demonstrate that you can disagree respectfully and propose alternatives. The best answer shows you changed the outcome; the second-best shows you committed fully after being overruled.
Quick Fire (Common in Phone Screens)
8. VPC peering vs Transit Gateway? → Peering for simple 1:1, Transit Gateway for hub-and-spoke at scale.
9. S3 Standard vs Glacier? → Access frequency. Standard for frequent, Glacier for archival (minutes to hours retrieval).
10. Blue-green vs canary deployment? → Blue-green for instant rollback, canary for gradual traffic shifting with monitoring.
Full set of 50 questions with detailed answers: citadelcloudmanagement.com/pages/free-courses
What is the hardest interview question you have been asked? I will add the best ones to the collection.
Top comments (0)