Hello World 👋
I’m Vasil, a DevOps Engineer with a passion for building reliable, scalable, and well-architected cloud platforms. With hands-on experience across cloud infrastructure, CI/CD, observability, and platform engineering, I enjoy turning complex operational challenges into clean, automated solutions.
I’ve been working with AWS Cloud for over 5 years, and I believe it’s high time I start exploring AI on AWS more deeply. Through these posts, I plan to share practical learnings, real-world experiences, and honest perspectives from my journey in DevOps, Cloud, and now AI.
Without further delay — let’s dive in 🚀
Introduction
Processing large volumes of documents like PDFs, scanned images, invoices, or contracts is a common challenge in many businesses. AWS provides managed services that can significantly reduce the manual effort required to extract meaningful text and context from uncategorized documents.
In this article, we’ll walk through a practical pattern for document processing using AI services on AWS:
Amazon Textract — for optical character recognition (OCR) and structured text extraction
Amazon Bedrock — for applying foundation models for understanding and summarization
Amazon S3 — for document storage
This pattern is useful for building Intelligent Document Processing (IDP) pipelines that go beyond simple OCR, adding structure, insights, and context using AI.
Prerequisites
To follow this article, you need:
- AWS account access
- AWS CLI configured (I’ll be using Cloudshell)
Permissions for:
- Amazon S3
- Amazon Textract
- Amazon Bedrock
- A document (PDF or image) stored in an S3 bucket
Architecture Overview
Here’s the high-level flow we’ll follow:
- Upload document to Amazon S3
- Use Amazon Textract to extract raw text and layout from the document
- Feed the extracted content into Amazon Bedrock foundation models for summarization or structured extraction (Note: You can use a Lambda function here but we will not be using it in this article to keep it simple)
- Store or present the processed results
Step-by-Step Implementation
Step 1: Create a S3 bucket for documents
We’ll start by creating an S3 bucket to store input documents and AI outputs.
aws s3 mb s3://aws-textract-demo-vasil
(You can replace the suffix with your name to make the bucket unique)
Step 2: Upload Document to S3
Upload your document (PDF, PNG, JPG, etc.) to an S3 bucket.
aws s3 cp path/to/local/document.pdf s3://your-bucket/
Note: If you are going to use Cloudshell like me then you’ll need to upload the file to your Cloudshell shell from the options on the top-right corner>Click Actions>Click ‘Upload file’
For e.g
aws s3 cp invoice.pdf s3://aws-textract-demo-vasil/invoice.pdf
You can upload any invoice you have or download a sample invoice pdf/image from Google. For this example, I used my own invoice that I got after a purchase.
Step 3: Extract Text with Amazon Textract
Textract converts documents into machine-readable text.
aws textract detect-document-text \
--region us-east-1 \
--document '{
"S3Object": {
"Bucket": "<your-bucket-name>",
"Name": "invoice.pdf"
}
}' > textract-output.json
aws textract detect-document-text \
--region us-east-1 \
--document '{
"S3Object": {
"Bucket": "aws-textract-demo",
"Name": "invoice.pdf"
}
}' > textract-output.json
And… drumroll 🥁…we hit our first error!
An error occurred (AccessDeniedException) when calling the DetectDocumentText operation: User: arn:aws:iam::393078901895:user/kk_labs_user_465822 is not authorized to perform: textract:DetectDocumentText with an explicit deny in a service control policy
What’s Happening Here?
In a standard AWS account, this error can be resolved by granting the required Textract permissions via IAM.
However, in managed sandbox environments (such as training labs), Textract APIs are often explicitly restricted using service or identity-based policies to control cost and data exposure.
Because of this, the following steps assume a standard AWS account with appropriate permissions.
Note: The outputs shown below are representative of real Amazon Textract responses.
Sample Textract output
{
"Blocks": [
{
"BlockType": "LINE",
"Text": "Invoice Number: INV-1023",
"Confidence": 99.2
},
{
"BlockType": "LINE",
"Text": "Total Amount: $1,250.00",
"Confidence": 98.7
},
{
"BlockType": "LINE",
"Text": "Due Date: 15 Aug 2025",
"Confidence": 97.9
}
]
}
This produces structured JSON containing:
- detected text
- layout information
- confidence scores
Textract doesn’t just do OCR — it understands document structure. That makes it a much better input for LLMs than raw text extraction tools.
Step 4: Prepare text for Bedrock
At this stage, you have raw text from Textract which may include:
- words
- line breaks
- positional information
- nested JSON frames
Your goal is to extract the plain, relevant text and concatenate multiple pages into a single text blob. LLMs work best with clean, structured input, so we focus only on the meaningful lines.
You can do this using Python scripts, CLI tools like jq, or if you prefer automation use a Lambda function that processes new documents automatically. Using Lambda is optional and not required for this guide.
jq -r '.Blocks[] | select(.BlockType=="LINE") | .Text' textract-output.json > cleaned_text.txt
Check the content:
cat cleaned_text.txt
Output:
Invoice Number: 12345
Date: 2025-12-27
Total Amount: $1,234.56
Step 5: Feed Textract Output into Amazon Bedrock
We’ll use Meta Llama 3 (3B Instruct) via Amazon Bedrock to interpret the document.
What We’ll Ask the Model
“Summarize this invoice and extract key details.”
Bedrock CLI Invocation
# Safely read the cleaned text and escape it for JSON
PROMPT=$(jq -Rs . < cleaned_text.txt)
# Build the full prompt with instruction
FULL_PROMPT=$(jq -Rn --arg txt "$PROMPT" '$txt + "\n\nPlease summarize this invoice and provide only the key details in a structured format (Invoice Number, Date, Total Amount)."')
# Invoke Bedrock
aws bedrock-runtime invoke-model \
--region us-east-1 \
--model-id us.meta.llama3-2-3b-instruct-v1:0 \
--content-type application/json \
--accept application/json \
--cli-binary-format raw-in-base64-out \
--body "{
\"prompt\": $FULL_PROMPT,
\"max_gen_len\": 300,
\"temperature\": 0.3
}" \
response.json
-
jq -Rs . < cleaned_text.txtescapes all newlines/quotes in your text. -
jq -Rn — arg txt “$PROMPT”appends your instruction safely to the JSON string. -
— bodycontains valid JSON with “prompt” including your instruction.
This method avoids all the bash parsing issues and ensures the model gets the full instruction properly.
Model Response
Finally, do cat response.json to see the response. You might see some generic output as well. The generic output is coming from the model itself, and it happens because:
- The content in cleaned_text.txt is very small or too generic (in our example, just a few lines like Invoice Number: 12345)
- The Llama 3.2 Instruct model interprets short prompts in a “template/illustrative” style and fills in plausible generic text.
What We Just Built
✅ Stored documents in S3
✅ Extracted structured data with Textract
✅ Used an LLM to reason over documents
✅ Built a logical AI pipeline — step by step
All without deploying a single Lambda function.
Where This Goes in the Real World
With automation, this same pipeline can:
- Process invoices automatically
- Extract contract clauses
- Power document search
- Feed ERP or finance systems
- Trigger workflows based on document content Add Lambda or Step Functions later — the core AI flow stays the same.
Final Thoughts
Well done. Seriously — give yourself a pat 👏
At this point, you’ve successfully used generative AI on AWS:
- No apps
- No infrastructure
- Just clean, composable AI services
And yes — DO NOT forget to clean up resources that you created during this walkthrough to avoid unexpected cost.










Top comments (0)