DEV Community

Cover image for Turning My Notes Into Audio with AWS (Serverless + Terraform)
Sherif sani
Sherif sani

Posted on

Turning My Notes Into Audio with AWS (Serverless + Terraform)

About a month ago, I had this random thought while on the bus to school:

“What if I could just listen to my handwritten notes instead of reading them?”

I wanted something simple, cost-effective, and since I already spend a lot of time working with AWS, it made sense to build it there.

After a bit of brainstorming, I landed on the perfect stack:

  • Amazon Textract → Extract text from my images.
  • Amazon Polly → Turn that text into natural-sounding speech.
  • AWS Lambda + API Gateway → Keep it serverless and cost-friendly.
  • Amazon S3 → Store my uploaded images and generated audio files.

And because I’m all about infrastructure-as-code, I deployed the whole thing with Terraform.

The Architecture

Here’s a high-level look at how it works:

Architecture Diagram

Simple, serverless, and cheap to run, which is exactly what I needed.

Why These Services?

  • Amazon S3 – Stores images and audio files.
  • Amazon Textract – Extracts text from scanned or photographed documents.
  • Amazon Polly – Turns text into speech.
  • AWS Lambda – Runs the logic without managing servers.
  • API Gateway – Provides a REST API endpoint for the client.
  • Terraform – Automates the entire setup so it’s reproducible.

Provisioning with Terraform

Instead of manually clicking around in the AWS Console, I used Terraform to define everything:

  • S3 buckets for images and audio
  • Lambda function with IAM permissions
  • API Gateway endpoint
  • Access for Textract and Polly

Deployment was as simple as:

terraform init
terraform apply
Enter fullscreen mode Exit fullscreen mode

A few minutes later, everything was live. No console clicking. No missed resources.

How the App Works

  1. The user uploads an image via the API Gateway endpoint.
  2. The image is stored in S3.
  3. Textract extracts the text from the image.
  4. Polly converts the extracted text into an MP3.
  5. The MP3 is saved to another S3 bucket.
  6. A pre-signed URL is generated so the user can listen or download it.

When I tested it with a snapshot of my lecture notes, I got an audio file back in seconds. The best part? Since it’s all serverless, I only pay for what I use — perfect for a student budget.


Final Thoughts

This was a fun way to mix AWS AI services, serverless architecture, and Terraform into a single project. Now I can “read” my notes while walking, commuting, or even cooking — and it costs just a few cents per use.

That said… Polly’s voice still sounds a bit robotic at times. But for this use case, it works well enough, and the build was worth it.

If you want to try it out or see how it’s built:

Top comments (0)