In this project, I built a fully serverless, event-driven AI pipeline on AWS that automatically converts a PDF document into translated speech audio.
Whenever a PDF is uploaded to an S3 bucket:
- Text is extracted using Amazon Textract
- The extracted text is translated using Amazon Translate
- The translated text is converted into speech using polly
- The final audio file is stored back in S3
All of this happens automatically, without any manual trigger or server management.
PDF Upload (Amazon S3)
↓
AWS Lambda (Triggered by S3 event)
↓
Amazon Textract (OCR)
↓
Amazon Translate (Language Translation)
↓
Amazon Polly (Text to Speech)
↓
Audio Output Stored in S3
AWS Services Used
- Amazon S3 – File storage and event trigger
- AWS Lambda – Serverless compute
- Amazon Textract – Extract text from PDFs
- Amazon Translate – Translate extracted text
- Amazon Polly – Convert text into speech
- AWS IAM – Secure access control
- Amazon CloudWatch – Logging and monitoring
Step-by-Step Project Flow :
Step 1: Upload PDF to S3
- A PDF file is uploaded to the input/ folder of the S3 bucket.
- This upload event automatically triggers the Lambda function.
Step 2: Extract Text with Textract
- Lambda starts an asynchronous Textract job to extract text from the uploaded PDF.
- Textract reads the PDF directly from S3 and returns the extracted text.
Extracted text using lambda and logged by cloud watch.
Step 3: Translate the Extracted Text
- The extracted English text is passed to Amazon Translate.
- The text is translated into a target language (for example, Tamil).
Step 4: Convert Translated Text to Speech
- The translated text is sent to Amazon Polly.
- Polly generates a natural-sounding MP3 audio file using a language-appropriate voice.
Step 5: Store the Audio Output
- The generated MP3 file is saved in the output/ folder of the same S3 bucket.
- The entire process completes automatically in a few seconds.
Codes to perform lambda and role permissions.
lambda_function.py
import boto3
import time
import uuid
textract = boto3.client('textract')
translate = boto3.client('translate')
polly = boto3.client('polly')
s3 = boto3.client('s3')
BUCKET_NAME = "pdf-translate-speech-ak"
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
print(f"PDF uploaded: {key}")
if not key.startswith("input/"):
return {"statusCode": 200, "message": "Not an input file"}
response = textract.start_document_text_detection(
DocumentLocation={
'S3Object': {
'Bucket': bucket,
'Name': key
}
}
)
job_id = response['JobId']
print(f"Textract Job ID: {job_id}")
extracted_text = ""
while True:
result = textract.get_document_text_detection(JobId=job_id)
status = result['JobStatus']
if status == "SUCCEEDED":
for block in result['Blocks']:
if block['BlockType'] == "LINE":
extracted_text += block['Text'] + " "
break
elif status == "FAILED":
raise Exception("Textract failed")
time.sleep(5)
print("Text extraction completed")
translated = translate.translate_text(
Text=extracted_text[:5000], # safeguard
SourceLanguageCode="en",
TargetLanguageCode="ta"
)
translated_text = translated['TranslatedText']
speech = polly.synthesize_speech(
Text=translated_text[:3000],
OutputFormat="mp3",
VoiceId="Aditi"
)
audio_key = f"output/translated_audio_{uuid.uuid4()}.mp3"
s3.put_object(
Bucket=bucket,
Key=audio_key,
Body=speech['AudioStream'].read(),
ContentType="audio/mpeg"
)
print(f"Audio saved: {audio_key}")
return {
"statusCode": 200,
"audio_file": audio_key
}
Role and it's Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"textract:StartDocumentTextDetection",
"textract:GetDocumentTextDetection"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"translate:TranslateText"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"polly:SynthesizeSpeech"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::pdf-translate-speech-ak/*"
},
{
"Effect": "Allow",
"Action": [
"logs:*"
],
"Resource": "*"
}
]
}
Output Translated Audio:
Connect With Me
👤 Akash S
☁️ AWS | Cloud | AI Projects
✍️ Writing about real-world cloud learning






Top comments (0)