Ever wished your PDFs could just read themselves aloud?
In this project, I built a fully automated, serverless PDF-to-Speech system on AWS where uploading a PDF instantly generates an audio narration.
No servers. No manual processing. Just upload → listen.
In One Line
Upload a PDF to S3 → extract text using Textract → convert text to speech using Polly → save audio back to S3.
Steps to be followed:
1.Created a single Amazon S3 bucket with two folders: input/ for PDF uploads and output/ for generated audio files.
2.Configured an IAM role for Lambda with permissions to access S3, Amazon Textract, Amazon Polly, and CloudWatch logs.
Policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"textract:StartDocumentTextDetection",
"textract:GetDocumentTextDetection"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"polly:SynthesizeSpeech"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::pdf-narrator-ak/*"
},
{
"Effect": "Allow",
"Action": [
"logs:*"
],
"Resource": "*"
}
]
}
3.Created an AWS Lambda function (Python 3.10) and attached the IAM role to enable secure service access.
lambda_function.py
import boto3
import time
import uuid
textract = boto3.client('textract')
polly = boto3.client('polly')
s3 = boto3.client('s3')
BUCKET_NAME = "pdf-narrator-ak"
def lambda_handler(event, context):
# 1️⃣ Get PDF details
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
print(f"PDF received: {key}")
# Ensure only input folder triggers processing
if not key.startswith("input/"):
return {"statusCode": 200, "message": "Not an input file"}
# 2️⃣ Start Textract job
response = textract.start_document_text_detection(
DocumentLocation={
'S3Object': {
'Bucket': bucket,
'Name': key
}
}
)
job_id = response['JobId']
print(f"Textract Job ID: {job_id}")
# 3️⃣ Wait for Textract to finish
text = ""
while True:
result = textract.get_document_text_detection(JobId=job_id)
status = result['JobStatus']
if status == "SUCCEEDED":
for block in result['Blocks']:
if block['BlockType'] == "LINE":
text += block['Text'] + " "
break
elif status == "FAILED":
raise Exception("Textract failed")
time.sleep(5)
print("Text extraction completed")
# 4️⃣ Convert text to speech (limit handled)
speech = polly.synthesize_speech(
Text=text[:3000],
OutputFormat="mp3",
VoiceId="Joanna"
)
# 5️⃣ Save MP3 to output folder
audio_key = f"output/audio_{uuid.uuid4()}.mp3"
s3.put_object(
Bucket=bucket,
Key=audio_key,
Body=speech['AudioStream'].read(),
ContentType="audio/mpeg"
)
print(f"Audio saved to {audio_key}")
return {
"statusCode": 200,
"message": "PDF converted to speech",
"audio_file": audio_key
}
4.Added an S3 event trigger to invoke the Lambda function whenever a PDF is uploaded to the input/ folder.

Implemented Lambda logic to extract text from uploaded PDFs using Amazon Textract (asynchronous processing).
5.Processed the extracted text and sent it to Amazon Polly to generate natural-sounding speech.
6.Stored the generated MP3 audio file in the output/ folder of the same S3 bucket.
7.Monitored execution flow, errors, and logs using Amazon CloudWatch.
8.Verified successful execution by downloading and playing the generated audio file from S3.
Connect With Me
👤 Akash S
☁️ AWS | Cloud | AI Projects
✍️ Writing about real-world cloud learning







Top comments (0)