Top 30 Cloud Architect Interview Questions and Answers [2026]

#architecture #career #cloud #interview

I have been on both sides of the cloud architect interview table. As a hiring manager at Lockheed Martin and Cigna Healthcare, I conducted over 200 technical interviews for cloud architecture roles. As a candidate, I went through interview loops at three Fortune 500 companies and two government contractors.

Each question below includes the answer I would accept from a senior candidate, with the depth and specificity that separates a hire from a rejection.

Foundational Architecture Questions

1. What is the difference between high availability and fault tolerance?

High availability minimizes downtime through redundancy. A system with 99.99% availability (52 minutes of downtime per year) is highly available. It may experience brief interruptions during failover but recovers quickly.

Fault tolerance means the system continues operating without any interruption when a component fails. Fault tolerance is more expensive because it requires active-active redundancy rather than active-passive.

2. Explain the CAP theorem and how it applies to cloud database selection.

The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency, Availability, and Partition tolerance.

In practice, partition tolerance is non-negotiable. The real choice is:

CP systems (DynamoDB strongly consistent, Cloud Spanner): sacrifice availability during partitions. Use for financial transactions.
AP systems (DynamoDB eventually consistent, Cassandra): sacrifice consistency during partitions. Use for social feeds, session stores.

3. How do you design a multi-region active-active architecture?

Key challenges: data replication, conflict resolution, and routing.

Data layer: globally distributed database (DynamoDB Global Tables, CockroachDB, Cloud Spanner) or cross-region replication with conflict resolution
Application layer: identical stacks per region with feature flags for regional rollouts
Routing: Route 53 latency-based routing or Cloudflare load balancing
Conflict resolution: last-writer-wins, vector clocks, or application-level merge logic
Testing: regular "region evacuation" drills

4. How would you migrate a monolithic application to microservices?

I use the Strangler Fig pattern, not a big-bang rewrite:

Map domains using domain-driven design. Identify bounded contexts
Extract incrementally, starting with the domain that has the clearest API boundary
Separate the shared database into per-service databases with eventual consistency through events
Introduce an API gateway to route between monolith and new services
Implement distributed tracing before extracting services
Budget 6-18 months. Teams that try 3 months end up with a distributed monolith

5. Containers vs. serverless -- when do you choose each?

Dimension	Containers	Serverless
Startup time	Seconds to minutes	Milliseconds to seconds
Max execution	Unlimited	15 minutes (Lambda)
Cost model	Per-hour (even when idle)	Per-invocation + duration
State	Stateful possible	Stateless by design
Best for	Long-running services	Event-driven processing, variable-load APIs