When learning cloud computing, it's tempting to jump straight into individual services like Amazon S3, DynamoDB, or Lambda. While understanding each service is important, I found that the bigger lesson wasn't about the services themselves—it was about how applications are designed to evolve over time.
To explore this, I started building a document processing backend locally using FastAPI, Floci (an open-source AWS emulator), Docker, and boto3.
What began as a simple file upload endpoint gradually evolved into a small document management backend capable of uploading, listing, downloading, and deleting documents while keeping the application architecture modular.
The goal wasn't just to interact with AWS services. It was to understand how good backend design allows applications to grow without requiring major rewrites.
Starting Simple
The application originally had a single responsibility:
Accept a document through an API.
The initial architecture was straightforward:
Client
│
▼
FastAPI
│
▼
uploads/
Uploaded files were simply written to a local directory.
Although this worked, the API endpoint became tightly coupled to the storage implementation.
Changing the storage mechanism later would require modifying the route itself.
Introducing a Service Layer
To reduce that coupling, the storage logic was extracted into a dedicated service.
The architecture became:
Client
│
▼
FastAPI
│
▼
Storage Service
│
▼
Local Storage
Although the application's behavior remained the same, this introduced an important software engineering principle:
Separation of Concerns.
The API became responsible for handling HTTP requests.
The storage service became responsible for managing files.
Swapping Local Storage for Amazon S3
Once the storage logic was isolated, replacing the implementation became surprisingly simple.
Instead of saving files locally, the storage service was updated to use Amazon S3 through the AWS SDK (boto3).
The architecture changed to:
Client
│
▼
FastAPI
│
▼
S3 Storage Service
│
▼
Amazon S3 (Floci)
The API endpoints themselves didn't need to change.
Only the storage implementation changed.
That was one of the biggest takeaways from this project.
Making Infrastructure Self-Initializing
Another improvement was avoiding manual infrastructure setup.
Rather than assuming the S3 bucket already existed, the application now checks for it during startup and creates it if necessary.
Conceptually:
Application Starts
│
▼
Check Bucket
│
▼
Create If Missing
This makes the application easier to run on a fresh machine and reduces manual setup.
Evolving Beyond File Uploads
Initially, the project focused only on uploading files.
As development progressed, it evolved into a small document management backend.
The application now supports:
- Uploading documents
- Listing stored documents
- Downloading documents
- Deleting documents
The current API exposes endpoints such as:
POST /uploadGET /documentsGET /download/{filename}DELETE /documents/{filename}
One realization stood out during this stage:
Good architecture doesn't eliminate future changes—it makes future changes easier to implement.
Because storage logic was already isolated behind a service layer, adding new endpoints required very little modification to the existing code.
Adding DynamoDB
Managing files solved only part of the problem.
Applications also need to manage information about those files.
To prepare for that, a dedicated DynamoDB service was introduced.
Its responsibilities include:
- Generating unique document identifiers
- Recording upload timestamps
- Managing document metadata
The current architecture looks like this:
Client
│
┌───────────┴───────────┐
▼ ▼
Upload / Download List / Delete
│ │
└───────────┬───────────┘
▼
FastAPI
┌───────┴────────┐
▼ ▼
S3 Storage DynamoDB Service
▼ ▼
Amazon S3 Amazon DynamoDB
At the moment, document storage is fully functional, while the DynamoDB service provides the foundation for future metadata management.
More Than Learning AWS
Although the project uses services like Amazon S3 and DynamoDB, the most valuable lessons weren't AWS-specific.
It reinforced several software engineering concepts:
- Separation of Concerns
- Service Layer Pattern
- Dependency Isolation
- Infrastructure Initialization
- Modular Backend Design
- Storage Abstraction
These ideas apply regardless of whether the backend eventually uses Amazon S3, Azure Blob Storage, Google Cloud Storage, or even a local filesystem.
The Bigger Lesson
One realization stood out throughout this project:
Cloud engineering isn't just about learning cloud services.
It's about designing applications that can evolve as requirements change.
The project started as a simple upload endpoint.
Over time it gained support for listing, downloading, and deleting documents without requiring a redesign of the application.
That flexibility came from separating responsibilities early rather than tightly coupling implementation details together.
For example:
Local Storage
│
▼
Amazon S3
The API doesn't need to know which implementation is being used.
As long as the interface remains consistent, the underlying storage mechanism can evolve independently.
That flexibility is what makes production systems easier to extend, test, and maintain.
Final Thoughts
Building applications has been one of the most effective ways for me to learn cloud engineering.
Working through real architectural decisions made concepts like Amazon S3, DynamoDB, and the AWS SDK feel much more intuitive than simply reading documentation.
More importantly, this project reinforced that good cloud applications are built not just on cloud services, but on sound software engineering principles.
The cloud services may change over time.
A well-designed architecture makes those changes far less painful.
GitHub Repository
The complete project is available here:
Repository: https://github.com/micheal000010000-hub/aws-document-processing-pipeline/tree/Document_Processing_Pipeline
Feedback, suggestions, and contributions are always welcome.
Top comments (0)