Secure Web Application Deployment on AWS: A Capstone Journey

#aws #cloud #infrastructure

As a junior cloud engineer, I recently completed a capstone project deploying a simple web application on AWS. The goal was to demonstrate secure access, automated configuration, high availability with load balancing, and proper documentation. This article outlines the architecture, key decisions, tools used, and the challenges I encountered along the way. By sharing this, I hope to help others navigate similar setups in AWS.

Architecture Diagram

The architecture follows best practices for security and availability in AWS. It includes a Virtual Private Cloud (VPC) with public and private subnets. The bastion host resides in the public subnet for secure SSH access. The two web servers are in private subnets, ensuring no direct internet exposure. An Application Load Balancer (ALB) in the public subnet distributes traffic to the web servers. Ansible automates deployment from my local machine via the bastion.

Here's a visual representation of the setup:

(Note: This diagram illustrates a similar robust AWS VPC design, including public subnets with ALB and NAT gateways, and private subnets with servers. In my project, the bastion is an additional EC2 in the public subnet, and Ansible connects via ProxyJump.)

Why I Used a Bastion Host

A bastion host serves as a secure gateway, or "jump server," between the public internet and private resources in the VPC. In this project, I used it for several reasons:

Enhanced Security: The web servers have no public IPs and are isolated in private subnets. Direct SSH from my local machine to the web servers isn't possible without exposing them. The bastion, with a public IP and restricted security group (allowing SSH only from my IP), minimizes the attack surface. All access to private instances routes through it, allowing me to audit and control connections.
Compliance and Best Practices: This aligns with AWS security recommendations, like the principle of least privilege. It prevents sensitive servers from being exposed directly to the internet, reducing the risk of brute-force attacks and unauthorized access.
Simplified Management: With SSH agent forwarding and ProxyJump configured, I could connect seamlessly from my local machine to the private servers via the bastion, without needing VPNs or more complex setups like AWS Session Manager.

Without the bastion, I'd risk compromising security or complicating access, which defeats the project's objectives.

How Ansible Simplified Deployment

Ansible played a pivotal role in automating web server configuration, ensuring consistency and efficiency. Here's how it streamlined the process:

Declarative Automation: I wrote a simple playbook (deploy.yml) that defined the desired state: install NGINX, start and enable the service, and deploy a custom HTML page with dynamic variables (like hostname and IP). Ansible handled the execution over SSH, eliminating manual steps on each server.
Idempotency and Reusability: Running the playbook multiple times doesn't cause issues—Ansible checks if changes are needed before applying them. This was crucial for testing and iterations.
Remote Execution via Bastion: Using ProxyJump in the inventory file (inventory.ini), Ansible connected from my local Mac to the private servers through the bastion without installing Ansible on the bastion itself. This kept the setup lightweight.
Efficiency Gains: What could have taken hours manually (logging into each server, running apt commands, copying files) was reduced to a single command: ansible-playbook -i inventory.ini deploy.yml. It ensured both servers had identical configurations.

Overall, Ansible turned a repetitive, error-prone task into a reliable, scalable process, demonstrating automation skills as per the project objectives.

Difference Between Direct EC2 Access vs Load Balancer Access

In this deployment, I enforced that all web traffic goes through the ALB, highlighting key differences from direct EC2 access:

Security: Direct access to EC2 instances (e.g., via public IPs) exposes them to the internet, increasing vulnerability to attacks. In my setup, web servers have no public IPs—traffic must pass through the ALB, which acts as a single entry point. Security groups further restrict inbound HTTP to only the ALB's group, preventing unauthorized direct hits.
High Availability and Scalability: Direct access ties users to a single instance's IP; if it fails, the app goes down. The ALB distributes traffic across multiple instances (round-robin by default), providing failover. Refreshing the ALB DNS showed alternating server details, proving load balancing. Scaling is easier—add more instances to the target group without changing endpoints.
Performance and Features: ALB handles SSL termination, path-based routing, and health checks (ensuring only healthy instances receive traffic). Direct access lacks this; you'd need to manage it on a per-instance basis.
User Experience: Users access a single, stable DNS (ALB's), not individual IPs. In testing, http:// worked seamlessly, while attempting direct IPs failed as intended.

This setup meets the project's requirement for high availability and secure, indirect access.

Challenges Faced and How I Solved Them

The project wasn't without hurdles, but troubleshooting built my skills. Here are the main ones:

SSH Connection Timeouts to Bastion: Initial SSH from my Mac timed out. This was due to the security group not allowing inbound SSH from my IP. Solution: Edited the bastion's SG to add an inbound rule for TCP 22 from "My IP" (AWS auto-detects it).
Permission Denied (Publickey) from Bastion to Web Servers: Network connectivity worked, but auth failed because the bastion lacked the private key. Solution: Used SSH agent forwarding (ssh -A) from my Mac, propagating the key through the bastion. For Ansible, added ForwardAgent yes in ~/.ssh/config.
Ansible Connectivity Errors ("Connection closed by UNKNOWN port 65535"): Manual SSH worked, but Ansible's ProxyJump failed. Solution: Ensured agent forwarding in inventory args or config file, added StrictHostKeyChecking=accept-new, and tested with verbose ping (ansible ... -m ping -vvv).
ALB Health Checks Failing: Targets stayed unhealthy despite NGINX setup. Reason: Web SG didn't allow inbound HTTP from ALB SG. Solution: Added a rule to web SG for TCP 80 from ALB SG. Also verified NGINX with curl localhost and adjusted success codes to 200-399 if needed.
NGINX Not Installed Initially: Playbook didn't run due to connection issues. Once fixed, re-running deployed everything automatically—no manual installs.

These challenges focused on debugging security groups, SSH configurations, and tool integrations. Patience and verbose logging (e.g., -vvv) were key to resolutions.

In conclusion, this project solidified my understanding of AWS networking, automation, and security. With the app live via ALB and all objectives met, it's a solid foundation for future cloud engineering work. If you're tackling something similar, start with clear security groups and test connections early!