When I first started learning cloud computing, I assumed AWS was simply a collection of services like Amazon S3, EC2, Lambda, and DynamoDB.
Over time, I realized that cloud computing is much less about memorizing services and much more about understanding how distributed applications are deployed, communicate, and operate in production.
To better understand these concepts without incurring cloud costs, I built a document processing pipeline locally using FastAPI, Docker, Docker Compose, Floci (LocalStack), Amazon S3, Amazon SQS, DynamoDB, and Nginx.
The objective wasn't to reproduce AWS perfectly—it was to understand the architectural principles that remain the same whether an application runs on a laptop or inside an EC2 instance.
The complete project is available here:
GitHub Repository
https://github.com/micheal000010000-hub/aws-document-processing-pipeline/tree/release/v7.0
Looking Beyond Individual AWS Services
One realization stood out while working on this project.
Learning individual services is useful, but understanding how they collaborate is far more valuable.
In production, applications are rarely a single process.
Instead, they consist of multiple independent components, each responsible for a specific task:
- Web servers
- Application servers
- Object storage
- Databases
- Message queues
- Background workers
Understanding how these components communicate is what transforms cloud services from isolated tools into a complete system.
Thinking About the Internet
The Internet allows computers located anywhere in the world to communicate using the TCP/IP protocol suite.
Now imagine that instead of communicating with another personal computer, your browser is communicating with a Linux server running continuously inside a cloud provider's data center.
That machine has:
- CPU
- Memory
- Storage
- Network connectivity
and behaves just like any other Linux computer.
An Amazon EC2 instance is essentially one such virtual Linux machine.
Deploying an application simply means copying your application onto that server and running it.
From Domain Name to Server
Suppose a user visits:
https://example.com
Before any application receives the request, the browser performs a DNS lookup.
DNS translates a human-readable domain name into a public IP address.
Once the IP address is known, the browser establishes a connection with the destination server.
This entire process happens before the application itself becomes involved.
IP Addresses, MAC Addresses, and NAT
Every network packet contains important addressing information.
At the network layer:
- Source IP Address
- Destination IP Address
At the data-link layer:
- Source MAC Address
- Destination MAC Address
One concept that became much clearer while studying networking was that MAC addresses only exist within a local network.
Packets travelling across the Internet never keep the same MAC address.
Each router forwards the packet by replacing the Layer 2 addressing information while preserving the end-to-end IP addresses.
Similarly, home routers perform Network Address Translation (NAT), replacing private addresses with a public address before forwarding traffic onto the Internet.
Understanding these concepts made cloud networking feel much less mysterious.
Why Ports Exist
Knowing the destination IP address isn't enough.
A Linux server may be running many different applications simultaneously.
For example:
22 → SSH
80 → HTTP
443 → HTTPS
5432 → PostgreSQL
6379 → Redis
8000 → FastAPI
Every process listens on a specific port.
When packets arrive, the Linux kernel examines the destination port and forwards the request to the appropriate application.
This explains why URLs such as:
http://localhost:8000
explicitly communicate with a FastAPI application running on port 8000.
Local Development
During development, frontend and backend applications often run independently.
For example:
Frontend
↓
localhost:3000
Backend
↓
localhost:8000
This setup is convenient because developers can work on each application separately.
However, exposing multiple ports directly to users becomes impractical in production.
Why Production Looks Different
Imagine asking users to remember:
example.com:3000
example.com:8000
example.com:9090
for different services.
That quickly becomes difficult to manage.
Instead, production systems expose a single public entry point.
This is where reverse proxies become important.
Understanding Nginx
One of the concepts that initially seemed confusing was the idea of a reverse proxy.
Eventually, it became much simpler after realizing that Nginx is simply another Linux process listening on ports 80 and 443.
Rather than exposing every application individually, only Nginx is exposed.
It receives incoming requests and forwards them internally to the correct application.
Conceptually:
Browser
│
▼
Nginx
│
├────────► Frontend
│
└────────► FastAPI
The browser communicates only with Nginx.
The backend applications remain hidden from direct Internet access.
Besides routing requests, Nginx also centralizes configuration, improves security, and simplifies deployment.
Understanding HTTPS
HTTPS initially sounded like a completely different protocol.
In reality, it is simply:
HTTP + TLS
A Certificate Authority (CA) issues a certificate for a domain after verifying ownership.
That certificate is configured inside Nginx.
When a browser connects:
- A TLS handshake occurs.
- The certificate is validated.
- Encryption keys are negotiated.
- HTTP communication becomes encrypted.
An interesting observation is that the FastAPI application itself remains unchanged.
Nginx performs TLS termination and forwards ordinary HTTP requests to the backend.
This separation allows backend services to focus entirely on application logic.
Building the Architecture Locally
Although the application currently runs on a local Linux machine, its architecture closely resembles a production deployment.
Browser
│
▼
Nginx
│
▼
FastAPI
│
├────────► Amazon S3
├────────► Amazon SQS
│ │
│ ▼
│ Background Worker
│ │
▼ ▼
Amazon DynamoDB
Using Floci made it possible to experiment with AWS-compatible services without requiring an active AWS account.
The same architectural principles remain applicable when deploying to EC2.
Lessons Learned
Working through this project connected many concepts that previously felt unrelated.
Some of the most valuable lessons included:
- Understanding Docker and Docker Compose.
- Building a multi-container application.
- Using AWS-compatible services locally through Floci.
- Learning event-driven architecture with Amazon SQS.
- Separating file storage from metadata storage.
- Running background workers independently from the API.
- Understanding Linux processes and ports.
- Connecting DNS, TCP/IP, NAT, and reverse proxies into one mental model.
- Understanding HTTPS and TLS termination.
- Organizing applications using production-style architecture.
Perhaps the biggest realization was this:
Cloud computing is not just about learning cloud services. It is about understanding the systems that connect them.
Once those architectural principles become clear, moving the same application from a local Linux machine to an EC2 instance becomes largely a deployment exercise rather than a redesign.
Final Thoughts
This project fundamentally changed the way I think about cloud computing.
Instead of seeing AWS as a catalog of independent services, I now see it as an ecosystem where networking, Linux, containers, storage, messaging, databases, and application servers collaborate to build scalable systems.
Learning these architectural principles has been far more valuable than memorizing individual services, because those principles remain applicable regardless of the cloud provider or deployment environment.
GitHub Repository
The complete project is available here:
Repository:
https://github.com/micheal000010000-hub/aws-document-processing-pipeline/tree/release/v7.0
Feedback and suggestions are always welcome.
Top comments (0)