π Hey there! This is Pratik, a Senior DevOps Consultant with a strong background in automating and optimizing cloud infrastructure, particularly on AWS. Over the years, I have designed and implemented scalable solutions for enterprises, focusing on infrastructure as code, CI/CD pipelines, cloud security, and resilience. My expertise lies in translating complex cloud requirements into efficient, reliable, and cost-effective architectures.
Looking Beyond OCR
Traditional OCR systems are great at extracting text but they fall short when the task shifts from reading content to recognizing visual patterns. What if your goal isnβt to read a document, but to determine whether a specific reference image appears within it regardless of where itβs placed, how itβs rotated, or how itβs scaled?
In this article, weβll build an AI-powered system using AWS Lambda, S3, and Amazon Bedrock that can:
- Upload a reference image
- Upload a target document (Image)
- Detect whether the reference image exists inside the target file
- Return a true/false result with high accuracy
Architecture Diagram
Architecture Explanation
The system follows a simple serverless flow using Amazon S3, AWS Lambda, and Amazon Bedrock.
Step-by-Step Flow
1. User Uploads Files to S3
- The user uploads:
- A reference image (e.g., logo/signature)
- A target file (image)
- Both are stored in Amazon S3
2. S3 Triggers Lambda
- Once the files are uploaded, an event notification triggers the Lambda function
- Lambda acts as the orchestrator
3. Lambda Processes Data
-
Fetches:
- Reference image
- Target file
Converts them into base64 format
Prepares a structured prompt
4. Bedrock Performs Analysis
- Lambda sends both images to Amazon Bedrock
- A multimodal model analyzes:
- Shape
- Orientation
- Position
- Partial visibility
5. Result Returned
- Bedrock responds with:
- "true" β Image found
- "false" β Image not found
The result will be stored in the S3 bucket.
Step-by-Step Implementation
1οΈβ£ Create S3 Bucket Structure
Go to Amazon S3 and create bucket with name βimage-matching-bucketβ
Inside it, create folders:
reference-images/
target-files/
Upload:
β’ Reference image β reference-images/logo.png
β’ Target file β target-files/doc1.png
2οΈβ£ Create IAM Role for Lambda
Go to AWS Identity and Access Management and Create a role with:
Permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:GetObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::iimage-matching-bucket/*"
},
{
"Action": [
"bedrock:InvokeModel",
"aws-marketplace:Subscribe",
"aws-marketplace:ViewSubscriptions"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
3οΈβ£ Create Lambda Function
Go to AWS Lambda
- Runtime: Node.js 18+
- Attach IAM role created above
Β
Lambda Code is available on GitHub Repository
π GitHub Repository:
Intelligent Image Presence Detection using AWS Bedrock
Detect whether a reference image exists inside a target document (image) using AI-powered visual reasoning with AWS services.
This project goes beyond traditional OCR by identifying images even if they are:
- Rotated
- Resized
- Partially visible
- Positioned anywhere
π Full article:
https://dev.to/pratik_26/amazon-bedrock-image-analysis-tutorial-with-aws-lambda-4mi9
π Architecture
This solution uses:
- Amazon S3 β Store reference and target files
- AWS Lambda β Process and orchestrate logic
- Amazon Bedrock β Perform multimodal AI analysis
π Flow
User β Upload to S3 β S3 Event β Lambda β Bedrock β Result Store in S3 Bucket (true/false)
Features
- Detect image presence inside documents
- Supports images and PDFs
- Works with rotated, scaled, or partial matches
- Fully serverless and scalable
- Uses multimodal AI (no traditional CV required)
Tech Stack
- Node.js (AWS Lambda)
- AWS SDK v3
- Amazon Bedrock (Claude Vision / Multimodal Model)
- Amazon S3
π Project Structure
βββ lambda/
β βββ index.jsβ¦Feel free to clone the repository and try it in your own AWS environment.
4οΈβ£ Configure S3 Trigger
In S3 β Properties β Event Notifications
Add event:
- Event type:
POST - Prefix:
target-files/ - Destination: Lambda
This ensures Lambda runs automatically when a file is uploaded.
5οΈβ£ Test the Flow
Upload:
- Reference image β already uploaded
- Upload new target file
Expected Result:
Lambda runs β Bedrock analyzes β returns β Stores in S3 Bucket
{
"result": "true"
}
Key Takeaways
- You can go beyond OCR using multimodal AI
- Serverless architecture makes it scalable and cost-efficient
- Prompt engineering plays a critical role in accuracy
- This approach is ideal when visual context matters more than text
Conclusion
By combining AWS Lambda, Amazon S3, and Amazon Bedrock, we built an intelligent system that can understand and compare images inside documents, something traditional OCR tools struggle with.
This is a step toward building AI-native document processing systems that can reason visually, not just read text.
Letβs Keep the Conversation Going
Have thoughts, questions, or any experience with Generative AI to share? I would love to hear from you! Feel free to leave a comment or connect with me on LinkedIn. Let's learn and grow together as a community of builders.
Keep exploring, keep automating and see you in the next one!






Top comments (0)