Peter Eskandar for AWS Community Builders

Posted on Oct 14, 2024 • Edited on Mar 28

Mastering the AWS Phone Interview: Key Technical Concepts and Preparation Guide

The AWS Phone Interview is a critical step in landing a role at Amazon Web Services (AWS). Known for its rigorous process, the interview assesses not only your technical acumen but also how well you align with AWS' Leadership Principles.Preparing for this stage can seem daunting, but with the right approach, you can navigate the process with confidence and set yourself up for success.

This article is actually based on my own study notes that I prepared while getting ready for the AWS phone interview. These notes helped me pass the interview, and I hope they will be helpful for you as well. While I focused on the technical aspects of the interview, you should also expect to be asked behavioral questions based on Amazon’s Leadership Principles. In this guide, however, I will walk you through the main technical topics you’ll likely encounter.

It’s important to note that for most of the content and screenshots in this article, credits go to Adrian Cantril’s Tech Fundamentals Course and the BeSA Program, which have been instrumental in structuring this guide.

Check out Adrian Cantril's Tech Fundamentals Course here.
Learn more about the BeSA Program here.

Your feedback is invaluable! If you find these study notes helpful or if you have any suggestions on how to improve them, please feel free to share.

OSI Model Introduction :

The OSI (Open Systems Interconnection) model is a conceptual framework that describes the communication functions of a telecommunication or computer system. The model is divided into seven layers, each of which performs a specific function in the transmission of data between networked devices. These layers are designed to provide a standardized approach to network communication, allowing different types of devices to communicate with each other regardless of their underlying hardware or software.

1. Layer 1 – Physical :
Layer 1, also known as the physical layer, is the first and lowest layer in the OSI model. It is responsible for the physical transmission of data over a communication channel, such as a wire, cable, or wireless signal. This layer defines the physical characteristics of the communication medium, including voltage levels, cable types, and signaling methods, to ensure that data is transmitted accurately and reliably between devices.

Main characteristics :
- Physical Shared Medium
- Define Standards for transmitting onto the medium
- Define Standards for receiving from the medium
- No Access Control
- No uniquely identified devices
- No Device to Device Communication “Everything is broadcast using transmission”
- Layer2 – Data Link adds a lot of intelligence on top of layer 1

2. Layer 2 – Data Link :

Layer 2, also known as the data link layer, is the second layer in the OSI model. It is responsible for the reliable transfer of data between two nodes on the same network segment. This layer defines protocols for the access and use of the physical network, such as addressing, flow control, and error detection and correction.

Main characteristics :
- Adds a controlled access to the physical medium
- Use MAC Addresses for both Source and Destination
- In case of broadcast the Destination MAC address is all FFs “ff:ff:ff:ff:ff:ff”
- The EtherType field in the Frame defines which L3 Protocol is used “For Example : Internet Protocol IP or Address Resolution Protocol ARP”
- FCS Attribute is for Frame Check Consequence : which is an error-detecting code added to a frame in a communication protocol. Frames are used to send payload data from a source to a destination.

Advantages :
- Identifiable devices
- Media Access Control
- Collision Detection
- Unicast 1:1
- Broadcast 1:All
- Switched – Like Hubs with Super Powers (Layer 2)

3. Layer 3 – Network Layer :

Layer 3, also known as the network layer, is the third layer in the OSI model. It is responsible for the end-to-end delivery of data between devices on different network segments. This layer defines protocols for logical addressing, routing, and fragmentation of data packets to ensure that they are delivered to the correct destination across different networks.

Layer 3 connects multiple local Layer2 LANs widely together using IP Addresses by handling logical addressing, routing, and forwarding of packets across different networks.

How a Router Defines it’s Next HOP ?
- Communications between different routers are made on Layer2, Source Router wraps the packet in a frame and adds the source and destinations MAC Addresses, the packet itself doesn’t change the frame though.
- The MAC Address of the Destination router can be obtained using Address Resolution Protocol ARP

Layer 3 Summary :
- IP Addresses (IPv4/v6) – Cross Network Addressing
- ARP – Find the MAC Address for this IP
- Route – Where to forward this packet
- Router Tables – Multiple Routes
- Router – Moves packer from SRC to DST – Encapsulating in L2 on the way
- Device to Device communication over the internet
- No method for channels of communications, just SRC IP to DST IP
- Can be delivered Out of Order “for packets ordering Layer4 should be used”

4. Layer 4 & 5 – Transport & Session :

Layers 4 and 5, also known as the transport and session layers respectively, are responsible for managing the end-to-end communication between applications running on different devices.

Layer 4, the transport layer, is responsible for ensuring that data is delivered reliably and efficiently from one device to another. It does this by establishing connections between applications on different devices, and managing flow control and error recovery. This layer defines two main protocols: the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP).

Layer 5, the session layer, is responsible for managing the communication sessions between applications running on different devices. It defines protocols for session establishment, maintenance, and termination, as well as for managing session security and synchronization.

Why use Layer 4 "Layer3 Problems" ?
- With Layer3 – There are no communication channels – packets have a source and destination IP but no method of splitting by APP or Channel
- No Flow Control – if the source transmits faster than the destination can receive it can saturate the destination causing packet loss
- Each Packet is routed independently – Per Packet routing can introduce delays to packets en route. Different packets can experience different delays
- Routing Decisions are per packet – Different routes can result in Out of Order packet at the destination. L3 provides no ordering mechanism

Transmission Control Protocol (TCP) vs User Datagram Protocol (UDP) :

Factor	TCP	UDP
Best For	Web Browsing HTTP/HTTPS File Transfer SSH Email or Texting SMTP	Video Chat Live Streaming Online Gaming
Connection type		Connection-less No connection is needed to start and end a data transfer
Data sequence	Can sequence data (send in a specific order)	Cannot sequence or arrange data
Data retransmission	Can retransmit data if packets fail to arrive	No data retransmitting. Lost data can’t be retrieved
Delivery	Delivery is guaranteed	Delivery is not guaranteed
Check for errors	Thorough error-checking guarantees data arrives in its intended state	Minimal error-checking covers the basics but may not prevent all errors
Broadcasting	Not supported	Supported
Speed	Slow, but complete data delivery	Fast, but at risk of incomplete data delivery

Network Address Translation :

NAT is a fundamental technology used in computer networking to allow multiple devices to share a single public IP address. This is essential for businesses and individuals who have more devices than available public IP addresses.

NAT is designed to overcome IPv4 shortages
Also provides some Security Benefits
Translates private IPv4 Addresses to Public
Types :
- Static NAT : 1 private IP to 1 fixed public IP (this is how AWS IGW works)
- Dynamic NAT : 1 private to 1st available Public, The Router maintains a NAT Table, it maps private IP : Public IP. Public IP allocations are temporary allocations from a a Public IP Pool
- Port Address Translation (PAT) : many private to 1 public (AWS NAT Gateway & Home Router)

Distributed Denial of Service (DDos) :

A Distributed Denial of Service (DDoS) attack is a type of cyber attack where multiple compromised devices, often from different locations, are used to flood a targeted website or network with traffic, causing the website or network to become unavailable to its users. In a DDoS attack, the attackers usually exploit vulnerabilities in the devices or applications to gain control over them and use them to generate a massive amount of traffic. DDoS attacks can be financially motivated or politically motivated, and they can cause significant damage to the targeted organization or individual, such as loss of revenue, loss of reputation, or even loss of sensitive data.

Attacks designed to overload websites
Compete against legitimate connections
Distributed – hard to block individual Ips/Ranges
Examples :
- Application Layer – HTTP Flood
- Protocol Attack – SYN Flood
- Volumetric – DNS Amplification

Encryption :

Encryption is the process of converting data into a form that is unreadable known as a ciphertext to unauthorized users. It is an essential technique for protecting the confidentiality and integrity of data both in transit and at rest.
Encryption is a two-way process. Data can be encrypted (to protect it) and later decrypted (to access it).

- Encryption at Rest :

Definition: Encryption at rest refers to the protection of data stored on a disk, database, or any storage medium when it is not actively being used or accessed.
Purpose: The goal is to protect data from unauthorized access in case the storage medium is compromised (e.g., if a hard drive is stolen or a server is hacked).
How It Works: Data is encrypted before it is written to storage and decrypted when it is read back into memory. Common methods include full-disk encryption, database encryption, and file-level encryption.
Examples: Encrypting files on a hard drive, encrypting a database, or using a cloud provider's storage encryption features.
Tools/Technologies: BitLocker (Windows), FileVault (macOS), Transparent Data Encryption (TDE) for databases, Amazon S3 Server-Side Encryption.

- Encryption in Transit :

Definition: Encryption in transit refers to the protection of data as it moves across a network, from one system to another, such as from a client to a server.
Purpose: The goal is to protect data from being intercepted or tampered with while it is being transmitted across potentially insecure networks (e.g., the Internet).
How It Works: Data is encrypted before it is sent over the network and decrypted upon arrival. Secure protocols like TLS/SSL are commonly used to encrypt data in transit.
Examples: HTTPS for secure web browsing, VPNs for secure remote access, SSH for secure command-line access, encrypted emails.
Tools/Technologies: TLS/SSL (used in HTTPS), Secure Shell (SSH), IPsec for VPNs, SMTP over TLS for email.

Different Types of Encryption :

- Asymmetric Encryption :

Definition: Two keys are used—a public key for encryption and a private key for decryption.
Examples: RSA (Rivest–Shamir–Adleman), ECC (Elliptic Curve Cryptography).
Use Cases: Often used for secure key exchange, digital signatures, and in scenarios where secure key distribution is challenging.

- Symmetric Encryption :

Definition: The same key is used for both encryption and decryption.
Examples: AES (Advanced Encryption Standard), DES (Data Encryption Standard).
Use Cases: Faster and efficient for large amounts of data, but key management is critical.

- Hybrid Encryption :

Definition: Combines both symmetric and asymmetric encryption to leverage the benefits of both.
Examples: TLS (Transport Layer Security) in HTTPS uses asymmetric encryption to exchange a symmetric key, which is then used for the session.

- Envelop Encryption :

Envelope encryption is a technique used to secure data by encrypting it with multiple layers of keys. In envelope encryption, a symmetric data encryption key is generated for each piece of data or user, and it is then encrypted with a separate key known as the key encryption key (KEK). This KEK is typically an asymmetric key, allowing for greater security and flexibility in key management. The encrypted data and encrypted data encryption key (DEK) are then stored separately, providing an additional layer of security.

Encryption In Transit "SSL & TLS" :

Secure Socket Layer (SSL) and Transport Layer Security (TLS) are two cryptographic protocols used to provide secure communication over the internet. SSL was developed by Netscape in the mid-1990s and TLS is its successor. These protocols are used to secure web traffic, email, instant messaging, and other types of internet traffic. SSL and TLS use a combination of symmetric and asymmetric encryption to encrypt data, ensuring that information transmitted over the internet is secure and cannot be intercepted by unauthorized parties.

Benefits :
- Privacy & Data Integrity Between Client & Server
- Privacy – Communications are Encrypted
- Asymmetric and then Symmetric encryption
- Identity (Server or Client/Server) Verification
- Reliable Connection – Protects against alteration

- How it works ?

Client Hello :
- The client initiates the connection by sending a "Client Hello" message to the server.
- This message includes the client's supported TLS versions, cipher suites (encryption algorithms), and a randomly generated number (client random).
Server Hello :
- The server responds by sending a “Hello Server” message to the client
- It includes the TLS version, cipher suite and a random generated number (server random)
- At this point, the Client & the Server have agreed how to communicate and the client has The Server Public Certificate, the certificate contains The Server Public Key
- The server can also request a client certificate for mutual authentication.
Server Certificate Verification :
- Trust Chain: Verifies the certificate chain up to a trusted CA. “The Client verifies that the Server Certificate is issued by a Public Certificate Authority trusted by the Operation system or the Browser”
- Expiration: Ensures the certificate is within its valid date range.
- Domain Match: Confirms the certificate is valid for the domain being accessed.
- Revocation: Checks if the certificate has been revoked.
- Signature: Verifies the certificate’s integrity with the CA’s public key
Client Key Exchange :
- The client generates a "pre-master secret" and encrypts it using the server's public key (from the server's certificate).
- The encrypted pre-master secret is sent to the server.
Session Keys Generation :
- Based on the Encryption Algorithm, both the Client & the Server use the pre-master secret along with the random values for the Client & the Server to generate the same session keys for encryption and decryption “symmetric encryption keys”
Finished Messages :
- The client sends a "Finished" message, encrypted with the session key, to confirm that the handshake was successful.
- The server responds with its own "Finished" message, also encrypted.

Hashing :

Hash functions are mathematical algorithms that transform input data into a fixed-length string of characters, called a hash or message digest. Hashing is the process of applying a hash function to data to produce a unique and irreversible representation of the original data.
Hash functions are widely used in computer security and cryptography for data integrity and authentication, digital signatures, password storage, and more.

The main characteristics of a hash function are its one-way property, where it is easy to compute the hash value of the input data but computationally infeasible to reconstruct the original data from the hash value, and its collision resistance, where it is highly unlikely for two different inputs to produce the same hash value.

Examples: SHA-256, MD5 (though MD5 is not recommended for security purposes anymore “collision – where two different pieces of information can generate the same hash/digest message”).

DNS – Domain Name System :

Turns domain names into IP addresses, which allow browsers to get to websites and other internet resources.

- What happens when you enter a domain name like Netflix.com in your web browser ?

Browser Cache Check “Modern browsers have their own DNS Cache”:
- The browser first checks its internal cache to see if it has recently resolved IP address for that domain name
- If found, it uses this cached IP address to establish the connection. If not, it proceeds to the next step
Operating System (OS) Cache Check :
- If the browser doesn’t contain the IP address, the browser sends a query to the operating system’ DNS resolver (usually part of the networking stack)
- The OS checks it’s own cache for a recently resolved IP address and if found it returns the IP to the web browser.
Host File Check :
- If the OS cache doesn’t have the IP address, the DNS resolver checks the local “hosts” file.
- If “Netflix.com” is listed in the hosts file, the corresponding IP is returned to the browser.
DNS Query to DNS Server :
- If the IP address isn’t found in the hosts file, the DNS resolver queries the configured DNS Server (often provided by the ISP or a public DNS like Google’s 8.8.8.8)
- The query first goes to DNS resolver server (often called a recursive resolver)
Recursive DNS Resolution :
- If the DNS resolver server doesn’t have the IP address in it’s cache, it begins a recursive query :
  - Root Name Servers : The DNS resolver contacts one of the Root Name Servers to find out which Name Server is authoritative for the top-level domain “TLD” .com
  - TLD Name Servers : The Root Name Server responds with the address of the of the .com TLD Name Servers
  - Authoritative DNS Server : The DNS resolver then queries the .com TLD server, which responds with the IP Address of the authoritative Name Server for Netflix.com
  - Final DNS Query : The DNS Resolver queries the authoritative Name Server for Netflix.com which returns the IP address of the that domain.
Connection Establishment :
- The browser uses the IP address to establish a connection to the web server at www.netflix.com, typically over HTTPS.
Caching the results :
- The resolved IP address is cached at each level (browser cache, OS cache, DNS server cache) to speed up future requests.

Multi-Tenant Applications :

Designing a multi-tenant web application involves creating an architecture where multiple customers (tenants) share the same application instance, while their data remains isolated and secure.

Or in other words, talking about multi-tenancy in a SaaS environment is when to offer a service to multiple Group of Users “The so called Tenants” sharing similar experience while ensuring that their data remains isolated and secure, so no tenant can access data of the other tenants.

There are multiple Architectural Models :

The Silo Model “Isolated Tenancy” :
- Description : Each tenant has its own instance of the application, including its own database, application logic and underlying application infrastructure.
- Characteristics :
  - Data Isolation : Complete Physical Isolation
  - Customization : High as it’s tenant has it’s own environment
  - Scalability : Scale by adding more resources
  - Cost : Higher due to separate and non shared resources between different tenants
  - Management Complexity : High, as every tenant infrastructure should be managed individually.
Pool Model :
- Description : All Tenants share the same instance of the application and the underlying database, and data separation is done logically.
- Characteristics :
  - Data Isolation : Logically, data is stored in the same database and partitioned using the tenant_id or data can be stored in different schemas inside the same database
  - Customization : Limited, as all tenants share the same application instance
  - Scalability : High, since resource are shared
  - Cost : Optimized, as resources are shared among multiple tenants with a minimum risk that resources become idle
- Management Complexity : Lower with a single instance to manage
Bridge Model “Hybrid Model” :
- Description : A Hybrid approach where the application layer is shared between different tenants but each tenant has it’s own database.
- Characteristics :
  - Data Isolation : Physical Data Isolation but shared application logic
  - Customization : Moderate with shared logic
  - Scalability : Balanced, combinig shared and isolated resources
  - Cost : Moderate as application layer is still shared among different tenants
  - Management Complexity : Moderate with shared application layer

Databases :

When to Use NoSQL Databases:
- Flexibility: NoSQL databases are more flexible as the are schema-less or have flexible schemas, making them ideal for unstructured or semi-structured data. This is useful for scenarios where data formats may evolve over time, like user profiles or content management systems.
- Scalability: NoSQL databases are designed for horizontal scaling “adds more servers”, making them better suited for handling large volumes of data and high traffic loads, particularly in distributed environments.
- Consistency: Loss of Consistency due to it’s distributed nature and peer to peer replication
- Performance: NoSQL databases can offer better performance for certain types of workloads, such as real-time data processing, by allowing for denormalized data storage and optimized queries for specific use cases.
- Examples: MongoDB, Cassandra, DynamoDB.
When to Use SQL Databases:
- Structured Data: SQL databases are ideal when working with structured data that fits well into tables with a predefined schema. They are well-suited for applications requiring complex queries, joins, and transactions.
- Easy Querying: Allow Easy Querying between multiple tables through tables relationships
- Scalability : Can Scale only vertically by increasing the size of the used instances “Ram & CPU”
- ACID Compliance: SQL databases guarantee ACID (Atomicity, Consistency, Isolation, Durability) properties, making them ideal for applications where data integrity and consistency are critical, such as financial systems.
- ACID Transactions: SQL Transactions are a group of statements that are executed atomically this means that they are either all executed or not executed
- Relational Data: When your data is highly relational, and you need to maintain complex relationships between entities (e.g., foreign keys), SQL databases are more appropriate.
- Examples: MySQL, PostgreSQL, Microsoft SQL Server.

Thank you for reading through the whole article! If you found it helpful, please consider adding a like or sharing it with others who might benefit from it.