Micheal Angelo

Posted on Jun 25

Learning AWS by Building a Local Document Processing Pipeline (Without an AWS Account)

#devops #aws #opensource #cloud

Cloud computing often feels difficult to learn because many tutorials focus on individual services in isolation.

You create an S3 bucket in one tutorial, invoke a Lambda function in another, and experiment with DynamoDB somewhere else. While each service makes sense individually, it can still be hard to understand how they work together in a real application.

Instead of learning services one by one, I wanted to build something that connected them together.

Even better, I wanted to do it without creating an AWS account or worrying about cloud costs.

That's where Floci, an open-source AWS emulator, came in.

The Goal

The objective wasn't to recreate AWS perfectly.

It was to understand the interaction between services by building a simple document processing pipeline.

The architecture looked like this:

User
   │
   ▼
Upload Document
   │
   ▼
Amazon S3
   │
   ▼
AWS Lambda
   │
Extract Metadata
   │
   ▼
Amazon DynamoDB

Although the final automatic Lambda → DynamoDB write couldn't be completed due to a networking limitation inside Floci, the overall architecture mirrors how the same workflow would be built on AWS.

Why Learn AWS Locally?

Running AWS services locally offers several advantages while learning:

No cloud costs
Safe experimentation
Fast iteration
Ability to inspect every component
Easy debugging

Using the AWS CLI against a local endpoint also helped reinforce an important idea:

The AWS CLI is simply a client that sends API requests. Whether those requests go to Amazon's cloud or a local emulator depends on the configured endpoint.

What Each Service Taught Me

Amazon S3

The first service I explored was Amazon S3.

Rather than thinking of S3 as "cloud storage," it became much easier to understand it as object storage.

A bucket acts as a container, while every uploaded file is stored as an object.

Practical exercises included:

Creating buckets
Uploading files
Listing bucket contents
Downloading objects
Deleting objects

These simple operations clarified how applications persist documents before any further processing occurs.

Amazon DynamoDB

Once files could be stored, the next step was understanding structured data.

Unlike S3, DynamoDB doesn't store files—it stores records.

Creating tables, inserting items, retrieving data, and scanning tables helped reinforce the difference between object storage and NoSQL databases.

Instead of storing the document itself, DynamoDB became the place to store information about the document.

AWS Lambda

Lambda introduced a completely different mindset.

Instead of managing servers, code is packaged and uploaded as a deployment artifact.

The Lambda function processed uploaded documents and generated metadata such as:

Document ID
Filename
File size
Upload timestamp

This was also where I encountered some of the most interesting debugging challenges.

Debugging Was the Real Teacher

Building the project wasn't just about writing code.

It involved understanding how different environments interact.

Some issues I encountered included:

Missing AWS CLI inside the Lambda runtime
Updating deployment packages correctly
Lambda timeout while communicating with DynamoDB
Docker networking behaviour inside Floci

Each issue forced me to understand the difference between:

My Linux machine
Docker containers
Lambda runtime environments
AWS SDK (boto3)
AWS CLI

Those distinctions aren't always obvious from documentation alone, but debugging made them much clearer.

Understanding IAM

IAM was another concept that became easier through practice.

Rather than viewing it as just another AWS service, I started thinking of IAM as the system that answers three questions:

Who is making the request?
What action is being performed?
Is that action allowed?

Learning about users, groups, policies, and roles also clarified why Lambda functions execute with an IAM role instead of inheriting permissions automatically.

The Bigger Picture

One realization stood out throughout the project:

AWS services don't communicate because they're "inside AWS."

They communicate through well-defined APIs.

Whether the services are running in Amazon's cloud or emulated locally, the interaction model remains largely the same.

Understanding those interactions felt much more valuable than memorizing individual commands.

What This Project Reinforced

Working through this pipeline reinforced several ideas:

Building teaches more than reading documentation.
Debugging is part of learning cloud computing.
IAM is fundamentally about identities and permissions.
Lambda runs inside an isolated execution environment.
Cloud services are loosely coupled and communicate through APIs.

Final Thoughts

Cloud computing can seem overwhelming because of the sheer number of services available.

Building even a small end-to-end workflow makes those services feel much less abstract.

By connecting object storage, serverless compute, databases, and identity management into a single project, I gained a much clearer understanding of how these pieces fit together.

For anyone beginning their cloud journey, building a small pipeline—even locally—can often teach far more than reading documentation alone.

GitHub Repository

If you'd like to explore the project or contribute, here's the repository:

https://github.com/micheal000010000-hub/aws-document-processing-pipeline

Feedback, suggestions, and contributions are always welcome.

DEV Community