Challenge: Office Staff Drowning in Paperwork
Every morning, Mai arrives at the government office where she works. Her desk is already piled high with paper documents - citizen applications, official reports, meeting minutes, and forms that need to be digitized.
Like thousands of office workers, Mai faces an exhausting daily routine:
- 💢Type for hours: Manually transcribing each document into Word
- 💢 Hunt for errors: Reviewing for typos and formatting mistakes
- 😫 Work overtime: Staying late to finish the backlog
- 😫 Feel exhausted: Repeating the same tedious task every single day
A single 5-page official document takes 30-45 minutes to transcribe. With dozens arriving daily, Mai is overwhelmed.
What if there was a better way?
The Vision: AI-Powered Document Processing
I wanted to build a solution where office staff could:
- Take photos with their smartphone camera
- Wait 30 seconds while AI processes them
- Download perfectly formatted Word documents
- Done! Ready to review and distribute
But I had a problem: I didn't know where to start.
- How do I use AWS Bedrock for OCR?
- How do I create properly formatted Word documents?
- How do I preserve Vietnamese document headers?
- How do I build a web interface?
- How do I deploy on AWS?
That's when I turned to Kiro — my AI coding assistant.
Building with Kiro: The Journey
Step 1: The Initial Request
I started with a simple question to Kiro:
Me: "Given the AWS Bedrock Converse API and Claude Sonnet 4, build an AI solution to OCR images from a folder and convert them to Word DOCX files. Prefer Python. Search the internet for references."
Kiro immediately got to work:
Within minutes, Kiro had:
- Fetched AWS Bedrock Converse API documentation
- Researched python-docx library
- Created complete project structure
- Generated working code
Step 2: Fine-Tuning for Vietnamese Documents
The first version worked, but Vietnamese official documents have a special two-column header format that wasn't captured correctly.
My Request:
Me: "The phrase 'CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM' should be in a table with two columns. Left column has organization info, right column has the country header and date."
Kiro's Response:
Kiro immediately:
- Updated the OCR prompt to detect two-column headers
- Modified create_docx.py to parse structured markup
- Created borderless tables for proper formatting
- Tested the changes automatically
# Kiro's clever solution - Structured OCR prompt
STRUCTURED_OCR_PROMPT = """
For Vietnamese official document headers with two columns, output as:
[HEADER_TABLE]
[LEFT]
organization line 1
organization line 2
Số: xx /TB-xxx
[/LEFT]
[RIGHT]
CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM
Độc lập - Tự do - Hạnh phúc
Tp. xxx, ngày xx tháng xx năm xxxx
[/RIGHT]
[/HEADER_TABLE]
"""
Result: Perfect two-column headers!
Step 3: Intelligent Multi-Page Grouping
I wanted images with similar names (doc-1.png, doc-2.png) to be combined into one Word file (doc.docx).
My Request:
Me: "If images have names with the same prefix but different numbers after a hyphen (e.g. tb-1, tb-2), group them into one DOCX file (e.g. named 'tb') with multiple pages."
Kiro's Implementation:
Kiro created a smart filename parser:
def get_prefix_and_page(filename: str) -> tuple:
"""
Extract prefix and page number from filename.
Examples:
'invoice-1.png' -> ('invoice', 1)
'report_2.jpg' -> ('report', 2)
'standalone.png' -> ('standalone', 0)
"""
stem = Path(filename).stem
match = re.match(r'^(.+?)[-_](\d+)$', stem)
if match:
return match.group(1), int(match.group(2))
return stem, 0
Result: Automatic multi-page document creation!
Step 4: Building the Web Interface
Manual CLI was working, but I needed a user-friendly web interface.
My Request:
Me: "Deploy a website that accepts image uploads, stores them in a S3 bucket, processes with Bedrock, outputs DOCX files, and provides download links."
Kiro's Response:
Within minutes, Kiro built:
- Complete Flask web application
- Drag & drop upload interface
- S3 integration for storage
- Session management for multi-user support
- Download functionality with presigned URLs
- HTML templates
@app.route('/upload', methods=['POST'])
def upload_files():
"""Handle file upload and processing"""
files = request.files.getlist('files')
session_id = str(uuid.uuid4())[:8]
# Upload to S3
for file in files:
s3_key = f"images/{session_id}/{filename}"
s3_client.upload_file(local_path, S3_BUCKET, s3_key)
# OCR with Bedrock
text = ocr_image_with_bedrock(local_path)
ocr_results[filename] = text
# Create grouped DOCX files
create_grouped_documents(ocr_results)
Step 5: UI/UX Enhancement
The first UI worked but looked basic. I wanted something modern.
My Request:
Me: "The dashboard icons and colors look unacceptable. Search for modern color schemes and upgrade the UI."
Kiro transformed the interface with:
- Modern dark theme with purple/blue gradients
- Glassmorphism effects on cards
- Animated background patterns
- Mobile-responsive design
- Success animations for feedback
/* Kiro's modern design system */
:root {
--primary: #6366f1;
--secondary: #0ea5e9;
--accent: #8b5cf6;
--gradient: linear-gradient(135deg, #6366f1 0%, #8b5cf6 50%, #0ea5e9 100%);
}
Result: Beautiful professional interface!
You can upload 2 image files with naming convention such as - and click Convert to DOCX.
It will take a while, around 1 minute to complete.
Demo
Please watch this demo at https://haianh-sing.s3.ap-southeast-1.amazonaws.com/2026-01-06+21-56-16.mp4.
What We Built Together
After working with Kiro for just 1 hour, we had a complete, production-ready Intelligent Document Processing platform that would have taken days to build manually.
The Power of AI-Assisted Development
Features Kiro Built
- AI-Powered OCR: Claude Sonnet 4 with 95%+ accuracy
- Mobile-First: Works with any smartphone camera
- Vietnamese Support: Preserves official document formatting
- Multi-Page: Automatic grouping by filename
- Web Interface: Beautiful, responsive dark theme design
- Cloud-Native: AWS S3, Bedrock, EC2 Graviton (ARM64)
- Secure: IAM roles with least-privilege access
- Cost-Effective: $0.003 per page
- Production-Ready: Deployed to AWS with monitoring
Design Decisions & Technical Trade-offs
Why Amazon Bedrock Converse API Instead of Amazon Textract?
A natural question is: Why not use Amazon Textract, AWS native OCR service? The answer comes down to format diversity. Textract works best when document structure is predictable: fixed field positions, consistent layouts, standard form types. Vietnamese administrative documents, however, come in dozens of formats across different government agencies, each with its own header arrangement, table style, and text flow. There is no single template to target.
Amazon Bedrock with Claude Sonnet 4 understands document content rather than document structure. Instead of detecting fields by position, it reads the document the way a human would: inferring what a two-column header means, recognizing that a line starting with "Số:" is a document reference number, and preserving the semantic meaning of each section regardless of how it is laid out. This flexibility is what makes the solution work across the full variety of real-world Vietnamese official documents without requiring any per-format configuration.
In tradeoff, the usage cost for using Bedrock Claude is higher than using Textract.
Limitation: Complex Tables
The solution handles standard document text and the two-column Vietnamese header format well, but complex inline tables remain a known limitation. When a document contains a data table, such as a budget breakdown or a meeting attendance list. The current pipeline extracts the text content but loses the tabular structure in the output DOCX.
I have prototyped a workaround: detecting table regions in the source image, passing them to Claude Sonnet via Bedrock to generate the corresponding Python openpyxl code, building the table as a separate Excel file, and then embedding it back into the Word document. The approach works in isolation but has not yet been integrated into the main pipeline. The next iteration of this solution will include this as an automated step, making the output DOCX fully faithful to the original document structure, tables included.
Getting Started: Deploy Your Own System
Prerequisites
AWS Account Setup:
- AWS Account with Bedrock access in
us-west-2 - Enable Claude Sonnet 4 model in Bedrock console
- Create IAM user with admin access (for initial setup)
Local Development:
- Python 3.9 or higher installed
- AWS CLI configured (
aws configure) - Git installed
Quick Start (5 Minutes)
Deployment on EC2
Step 1: Create IAM Role
# File iam-trust-policy.json containing trust policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3BucketAccess",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject",
"s3:HeadBucket"
],
"Resource": [
"arn:aws:s3:::ocr-to-docx",
"arn:aws:s3:::ocr-to-docx/*"
]
},
{
"Sid": "BedrockInvoke",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": "*"
}
]
}
# Create role with trust policy
aws iam create-role \
--role-name Image2Docx-EC2-Role \
--assume-role-policy-document file://iam-trust-policy.json
# Attach permissions policy
aws iam put-role-policy \
--role-name Image2Docx-EC2-Role \
--policy-name Image2Docx-Policy \
--policy-document file://iam-policy.json
# Create instance profile
aws iam create-instance-profile \
--instance-profile-name Image2Docx-EC2-Profile
# Add role to profile
aws iam add-role-to-instance-profile \
--instance-profile-name Image2Docx-EC2-Profile \
--role-name Image2Docx-EC2-Role
Step 2: Launch EC2 Instance
# Launch Graviton instance (ARM64 - cost efficient)
aws ec2 run-instances \
--image-id ami-xxxxxxxxx \ # Amazon Linux 2023 ARM64
--instance-type t4g.micro \
--key-name your-key-pair \
--security-group-ids sg-xxxxxxxxx \
--iam-instance-profile Name=Image2Docx-EC2-Profile \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=Image2Docx-Server}]'
Step 3: Configure Security Group
# Allow inbound HTTP on port 8080
aws ec2 authorize-security-group-ingress \
--group-id sg-xxxxxxxxx \
--protocol tcp \
--port 8080 \
--cidr 0.0.0.0/0
# Allow SSH for administration
aws ec2 authorize-security-group-ingress \
--group-id sg-xxxxxxxxx \
--protocol tcp \
--port 22 \
--cidr your-ip-address/32
Step 4: Deploy Application
# SSH to instance
ssh -i "your-key.pem" ec2-user@<EC2-PUBLIC-IP>
# Update system
sudo yum update -y
# Install dependencies
sudo yum install -y git python3-pip
# Clone repository
git clone https://github.com/PNg-HA/Image2Docx.git
cd Image2Docx
# Install Python packages
pip3 install -r requirements.txt
# Run application in background
nohup python3 app.py > app.log 2>&1 &
Step 5: Access Your Application
Open browser and navigate to:
http://<EC2-PUBLIC-IP>:8080
Success! Your intelligent documentation platform is live.
Conclusion: From Vision to Reality in 1 Hour with Kiro
We started with a simple question: "Can we help office staff escape the paperwork trap?"
The answer is a resounding YES. But what's even more remarkable is HOW we did it.
This platform is my contribution to eliminating paperwork drudgery and freeing office workers. But more importantly, this project proves that anyone with vision and AI assistance can build transformative solutions.
What will you build with Kiro?















Top comments (3)
Wonderful
niceeeeeee
🤗