In this blog post, we will not only digitize an image or document. We will also build a system that makes sense of its content. For example;
- A book page,
- A handwritten post-it,
- A health insurance card,
- Or we will extract, make sense of and analyze text from documents such as product reviews.
To do this, we will integrate three different AWS services step by step:
The Purpose of This Blog Post:
To build a system that not only extracts text from images or documents, but also automatically understands what a text means.
By bringing together AWS's Textract, Comprehend and Bedrock services;
We will extract text from images,
We will find the emotion and key phrases of the text,
We will interpret the text with advanced models and capture what the text means.
With this blog post, you will learn step by step how to extract text and meaning from documents such as book pages, handwritten notes, cards or comments that you may encounter in real life.
At the same time, you will see how to use these services with Python on SageMaker Notebook in a practical way and be able to run AI-supported analyses with your own data.
What Will This Blog Post Bring You?
You will not only extract text from images, but also be able to make sense of the emotion and content of these texts.
You will see the difference of AI (Artificial Intelligence) services Textract, Comprehend and Bedrock in a practical way.
You will be able to perform advanced visual-meaning extraction using the Claude 3 Haiku model.
You will learn how to solve scenarios such as handwriting, document, comment or card analysis in the real world.
You will experience how these services are combined with Python in a SageMaker Notebook environment.
AWS Services Used in This Blog Post
Prerequisites
An AWS account and an IAM user with Administrator Access,
Access to Claude 3 Haiku model (requested for free from Bedrock panel),
Basic familiarity with Python and AWS services,
Knowledge of writing and running code in Jupyter Notebook environment.
Outlines
We will implement this project step by step in the following order:
Part 1 – Setting up the SageMaker Notebook
Part 2 – Defining the IAM Role and Required Permissions
Part 3 – Extracting Text from Images with Amazon Textract
Part 4 – Interpreting Text with Amazon Comprehend
Part 5 – Deriving Insights form Images with Bedrock Claude 3
Part 1 – Setting up the SageMaker Notebook
Why are we doing this step?
In this part, we’ll create a SageMaker Notebook Instance, an environment where we can write code on AWS. Here, we’ll run our Python code and step-by-step test the services ‘Textract’, ‘Comprehend’, and ‘Bedrock’.
Note: SageMaker runs an EC2 (virtual server) in the background. Therefore, we should stop and delete the instances we create when we’re finished.
Steps:
— — —
Open the AWS Management Console.
Type ‘Bedrock’ in the search bar in the AWS Management Console and open the Bedrock service.
Click the ‘Model Access’ tab in the left hand menu.
On the screen that opens, click ‘Available to request’ next to the ‘Claude 3 Haiku’ model.
Then click ‘Request Model Access’.
Scroll to the bottom of the page and click ‘Next’.
Submit the request by clicking ‘Submit’.
** The approval process typically takes a few seconds to a few minutes.
** You will be notified via email when it is approved.
** Access is generally not denied for these models.
** Exceptions may apply only for very large and expensive models.
** Once you have completed these steps, you are now ready to access Bedrock models.
… — INFORMATION — …
In this blog post, SageMaker provides a working environment (Jupyter Notebook) on AWS where we will write and run all our code.
All calls to Bedrock models like Claude 3 Haiku, as well as to Textract or Comprehend services, will be made through this notebook.
In summary:
-Bedrock provides the intelligence,
-Textract extracts the data,
-Comprehend performs the analysis,
-SageMaker is where all of this comes together and works.
… —END OF INFORMATION — …
In the ‘AWS Management Console’, type ‘SageMaker’ in the search bar and open the ‘SageMaker’ service.
Click the ‘Go to Amazon SageMaker AI’ button.
Click the ‘Notebooks’ tab in the left menu.
Then click the ‘Create notebook instance’ button.
On the screen that opens, make the following selections/fill in the information:
Notebook instance name : my-notebook-instance
Notebook instance type : ml.t3.medium
IAM role : Create a new role
- The IAM role you cre. : None | Click the 'Create role' button.
Keep the other settings as is.
Then click the ‘Create notebook instance’ button.
Waiting Time
— — — — — — —
- The notebook instance’s status will appear as ‘Pending’. Generally, within 3–5 minutes, the notebook instance’s status will change to ‘InService’.
Tip: At this point, you can go to the ‘IAM roles’ section, find the created role, and add access permissions to services such as ‘Textract’, ‘Comprehend’ and ‘Bedrock’ (we’ll grant these permissions in the next part).
Check Point: Is the Notebook Ready?
— — — — — — — — — — — — — — — — — — —
-Refresh the page to check if the notebook instance’s status has changed to ‘InService’.
- When the status changes to ‘InService’, click on ‘Open Jupyter’ or ‘Open JupyterLab’ on the right to enter the workspace.
-With this step, you now have a Python workspace.
-In the next step, we will grant this environment permissions to use the ‘Textract’, ‘Comprehend’ and ‘Bedrock’ services.
What Did We Do in This Step?
— — — — — — — — — — — — — — —
-We created a ‘SageMaker Notebook Instance’ where we will write and run all the code we will use in the project.
-This notebook, which provides us with a Python runtime on AWS, will be at the heart of the project.
-We completed the ‘Bedrock model access request’ required to use advanced models like ‘Claude 3 Haiku’.
-Once our SageMaker instance is ready, we will be able to run ‘Textract’, ‘Comprehend’ and ‘Bedrock’ services together in this environment.
_This now provides us with a robust workspace where we can perform all analysis, model calls, and data processing steps.
In the next step, we will define the necessary permissions (IAM policies) for this environment to communicate with the correct services._
Part 2: Defining the IAM Role and Required Permissions
Why are we doing this step?
- In this part, we will define the necessary IAM permissions (permission policies) for our SageMaker Notebook environment to work seamlessly with services like Textract, Comprehend, and Bedrock.
Note: While running, SageMaker uses an IAM role to access AWS services. If you do not add the necessary permissions to this role, you will receive an AccessDenied error when running the code.
Steps:
— — —
While creating or after creating the machine named ‘my-notebook-instance’, type ‘IAM’ in the search bar in the ‘AWS Management Console’ and open the ‘IAM’ service.
Click the ‘Roles’ tab in the left hand menu.
To find the role we just created, type ‘sagemaker’ in the search bar and enter the most recently created role that contains ‘sagemaker’ in its name (similar to ‘AmazonSageMaker-ExecutionRole-20250513T170284’).
Click the ‘Add permissions’ button.
Then select ‘Attach policies’.
Type ‘textract’ in the search bar and select the policy named ‘AmazonTextractFullAccess’.
Then type ‘comprehend’ in the search bar and select the policy named ‘ComprehendFullAccess’.
Finally, type ‘bedrock’ in the search bar and select the policy named ‘AmazonBedrockFullAccess’.
Then click the ‘Add permissions’ button.
Note:
— — —
-These permissions ensure that our code within SageMaker can communicate directly with the necessary AWS services.
-Otherwise, you will receive an authorization error on every service call.
Check Point: Is Everything Correct?
— — — — — — — — — — — — — — — — — — —
-When you click on the role, three different policies (Textract, Comprehend and Bedrock) should now appear in the Permissions tab.
-If these appear and your notebook instance is in ‘InService’ status, you’re ready to write code!
What Did We Do in This Step?
— — — — — — — — — — — — — — —
-We defined the necessary IAM permissions for the SageMaker Notebook environment to communicate securely and authorized with AWS AI services.
-Through this step, our code will now be able to seamlessly use the Textract, Comprehend and Bedrock services.
In the next step, we will write our first code and perform the process of ‘Extracting text from an image with Textract’.
Part 3: Extracting Text from Images with Amazon Textract
Why are we doing this step?
In this part, we’ll use AWS’s OCR (Optical Character Recognition) service, Textract, to extract text from image files.
This means we’ll automatically recognize text from images like a book page, handwritten note, or ID card and save it as a ‘.txt file’.
Note: Textract only extracts text from an image; it doesn’t analyze its meaning. Comprehend will handle that in the next part.
Steps:
— — —
Go back to the ‘SageMaker’ service.
(If you started creating the role while the Notebook instance’s status was ‘Pending’) Wait until the status of the Notebook instance named ‘my-notebook-instance’ you just created changes to ‘InService’.
After the status of the Notebook instance named ‘my-notebook-instance’ changes to ‘InService’, click ‘Open JupyterLab’ in the ‘Actions’ section.
Right-click in the area below ‘Name’ and ‘Modified’ on the left and select ‘New Notebook’.
** In the window that opens, you will see a box titled ‘Select Kernel’. This allows you to select the software environment (kernel) to use for the notebook.
The ‘conda_python3’ option is selected in the drop-down menu. (This is a Conda environment that supports Python 3 and is compatible with the code we will write.) Keep it as is.
Then, click the ‘Select’ button in the bottom right.
Tip: If you check the ‘Always start the preferred kernel’ box, this kernel will be selected automatically every time.
Press ‘CTRL + S’ to save the notebook you opened. (This will allow you to change the file name as well as save the file.)
In the window that opens, change the file name to ‘AWS AI Services.jpynb’ and click the ‘Rename and Save’ button.
Copy and paste the following code into the ‘AWS AI Services.jpynb’ file:
import os
import boto3
from tqdm import tqdm
from PIL import Image
def get_image_files(directory):
"""Get all jpg and png files in the given directory."""
return [f for f in os.listdir(directory) if f.lower().endswith(('.jpg', '.png'))]
def should_process_file(file_path):
"""Check if the file should be processed (i.e., no corresponding txt file exists)."""
txt_path = os.path.splitext(file_path)[0] + '.txt'
return not os.path.exists(txt_path)
def extract_text_from_image(image_path):
"""Extract text from the image using Amazon Textract."""
client = boto3.client('textract')
with open(image_path, 'rb') as image:
response = client.detect_document_text(Document={'Bytes': image.read()})
extracted_text = []
for item in response['Blocks']:
if item['BlockType'] == 'LINE':
extracted_text.append(item['Text'])
return '\n'.join(extracted_text)
def save_text_to_file(text, file_path):
"""Save the extracted text to a file."""
txt_path = os.path.splitext(file_path)[0] + '.txt'
with open(txt_path, 'w', encoding='utf-8') as f:
f.write(text)
def process_images_in_directory(directory):
"""Process all images in the given directory."""
image_files = get_image_files(directory)
for image_file in tqdm(image_files, desc="Processing images"):
image_path = os.path.join(directory, image_file)
if should_process_file(image_path):
extracted_text = extract_text_from_image(image_path)
save_text_to_file(extracted_text, image_path)
# Usage in Jupyter notebook
directory = '.' # Current directory
process_images_in_directory(directory)
- After pasting the code, save the changes you made to the ‘AWS AI Services.jpynb’ file by clicking the floppy disk icon in the upper left corner or pressing ‘CTRL + S’.
… — INFORMATION — …
GENERAL PURPOSE OF THE CODE:
— — — — — — — — — — — — — — — — — — —
- This Python code scans a folder for image files in ‘.jpg’ or ‘.png’ format. It reads the text inside the image using the ‘Amazon Textract’ service and saves the text as a ‘.txt’ file.
… — END OF INFORMATION — …
- Then, drag and drop the image file named ‘Book.jpg’ from the ‘https://drive.google.com/drive/folders/1VPOZLPVdCd9uBMhT9O_pDSuunybspgEL?usp=sharing’ link to the area below the ‘Name’ and ‘Modified’ text on the left.
** It is very important that the ‘Book.jpg’ image file is in the same directory as the notebook named ‘AWS AI Services.jpynb’.
Click the ‘Run’ button.
In a few seconds, you will see a file named ‘Book.txt’ created in the same directory as ‘AWS AI Services.jpynb’ and ‘Book.jpg’. (Compare the original image and the txt file.)
** Congratulations! You have now set up a system that can automatically extract text from images.
** This way, Textract will properly extract the text from the image.
What did Textract do in this step?
— — — — — — — — — — — — — — — — — — —
-Amazon Textract detected the text in the ‘Book.jpg’ image, automatically read each line, and digitized it, allowing us to save it as a ‘.txt’ file. In other words, it converted the text in the image into plain text that we can use as text.
-Now let’s do the same for another image. Drag and drop the image file named ‘Handwriting.jpg’ (containing post-it notes) located in the ‘https://drive.google.com/drive/folders/1VPOZLPVdCd9uBMhT9O_pDSuunybspgEL?usp=sharing’ link to the area under the ‘Name’ and ‘Modified’ text on the left.
-Select all the code in the notebook with ‘CTRL + A’ and then click the ‘Run’ button.
-Within a few seconds, you’ll see a file named ‘Handwriting.txt’ created in the same directory as ‘AWS AI Services.jpynb’ and ‘Handwriting.jpg’.
-You’ll see that this image has been converted to text successfully to a large extent. (Compare the original image and the txt file. There may be minor errors. As humans, we don’t always have perfect handwriting.)
What We Did at This Stage?
— — — — — — — — — — — — — —
-We automatically recognized the text within the image using the ‘Textract’ service in SageMaker Notebook.
-We generated digital text files with a ‘.txt’ extension from files like ‘.jpg’ and ‘.png’.
-Now that we have the raw text, in the next step, we’ll use the ‘Comprehend’ service to analyze the meaning of this text (keywords, sentiment analysis, etc.).
Part 4: Interpreting Text with Amazon Comprehend
Why are we doing this step?
-In this part, we will analyze the plaintexts we obtained with ‘Textract’ in the previous step and extract:
- ‘keywords’,
- ‘sentiment analysis’.
-We will perform this process using ‘Amazon Comprehend’, AWS’s natural language processing (NLP) service.
Note: Comprehend understands the text in ‘.txt’ files, detects important phrases, and determines the mood of the sentence. So, now we’re starting to ask Comprehend, ‘What is this saying?’ about the text we have.
Steps:
— — —
Select the code in the notebook named ‘AWS AI Services.jpynb’ using ‘CTRL + A’ and then delete it by pressing ‘Delete’.
Copy and paste the following code into the ‘AWS AI Services.jpynb’ file (this time we’ll be working with the ‘Comprehend’ service):
import os
import boto3
from tqdm import tqdm
from PIL import Image
def get_image_files(directory):
"""Get all jpg and png files in the given directory."""
return [f for f in os.listdir(directory) if f.lower().endswith(('.jpg', '.png', '.jpeg'))]
def should_process_file(file_path):
"""Check if the file should be processed (i.e., no corresponding txt file exists)."""
txt_path = os.path.splitext(file_path)[0] + '.txt'
return not os.path.exists(txt_path)
def extract_text_from_image(image_path):
"""Extract text from the image using Amazon Textract."""
textract_client = boto3.client('textract')
with open(image_path, 'rb') as image:
response = textract_client.detect_document_text(Document={'Bytes': image.read()})
extracted_text = []
for item in response['Blocks']:
if item['BlockType'] == 'LINE':
extracted_text.append(item['Text'])
return '\n'.join(extracted_text)
def summarize_text(text):
"""Summarize the extracted text using Amazon Comprehend."""
comprehend_client = boto3.client('comprehend')
if len(text) > 5000:
text = text[:5000] # Amazon Comprehend has a limit of 5000 bytes per document
key_phrases_response = comprehend_client.detect_key_phrases(Text=text, LanguageCode='en')
key_phrases = [phrase['Text'] for phrase in key_phrases_response['KeyPhrases']]
sentiment_response = comprehend_client.detect_sentiment(Text=text, LanguageCode='en')
sentiment = sentiment_response['Sentiment']
summary = "Summary:\n" + '\n'.join(key_phrases[:5]) # Limiting to top 5 key phrases
summary += f"\n\nSentiment: {sentiment}"
return summary
def save_text_to_file(text, file_path):
"""Save the extracted text to a file."""
txt_path = os.path.splitext(file_path)[0] + '.txt'
with open(txt_path, 'w', encoding='utf-8') as f:
f.write(text)
def save_summary_to_file(summary, file_path):
"""Save the summary to a file with a '_summary' suffix."""
summary_path = os.path.splitext(file_path)[0] + '_summary.txt'
with open(summary_path, 'w', encoding='utf-8') as f:
f.write(summary)
def process_images_in_directory(directory):
"""Process all images in the given directory."""
image_files = get_image_files(directory)
for image_file in tqdm(image_files, desc="Processing images"):
image_path = os.path.join(directory, image_file)
if should_process_file(image_path):
extracted_text = extract_text_from_image(image_path)
save_text_to_file(extracted_text, image_path)
summary = summarize_text(extracted_text)
save_summary_to_file(summary, image_path)
# Usage in Jupyter notebook or standalone script
directory = '.' # Current directory
process_images_in_directory(directory)
- After pasting the code, save the changes you made to the ‘AWS AI Services.jpynb’ file by clicking the floppy disk icon in the upper left corner or pressing ‘CTRL + S’.
… — INFORMATION — …
GENERAL PURPOSE OF THE CODE:
— — — — — — — — — — — — — — — — — — —
-This code examines the images (jpg, png, jpeg) in a folder one by one:
- It reads the text in the image with Amazon Textract.
- It saves the read text to a ‘.txt’ file.
- It analyzes the text (keywords + sentiment) with Amazon Comprehend.
- It writes the analysis results to a separate ‘_summary.txt’ file.
Sample Scenario
— — — — — — — — —
Let’s say you have the following file in your folder: ‘notes.jpg’
When this code runs, it does the following:
1. ‘Textract’: Reads the text in ‘notes.jpg’.
2. ‘Save’: Writes/saves what it reads to the file ‘notes.txt’.
3. ‘Comprehend’: Determines the theme and sentiment of the text.
4. ‘Save’: Writes a summary and sentiment analysis to the file ‘notes_summary.txt’.
… — END OF INFORMATION — …
To delete the outputs of the previous code you used to work with the ‘Textract’ service, right-click in the area where the code is located. Then, select ‘Clear Outputs of All Cells’.
Select the files with the ‘.jpg’ and ‘.txt’ extensions (by holding down the CTRL key and holding down the left mouse button) in the area below the ‘Name’ and ‘Modified’ text on the left, and press the 'Delete' key.
Then, delete them by clicking the ‘Move to Trash’ button in the window that opens.
Then, drag and drop the image file named ‘Book.jpg’, located in the ‘https://drive.google.com/drive/folders/1VPOZLPVdCd9uBMhT9O_pDSuunybspgEL?usp=sharing’ link, into the area below the ‘Name’ and ‘Modified’ text on the left.
** It is very important that the ‘Book.jpg’ image file is in the same directory as the notebook named ‘AWS AI Services.jpynb’.
- Click the ‘Run’ button.
** Within a few seconds, you’ll see that a file named ‘Book.txt’ and another file named ‘Book_summary.txt’ have been created in the same directory as ‘AWS AI Services.jpynb’ and ‘Book.jpg’.
-Congratulations! In this step, we’ve extracted the text in the image and deciphered its meaning.
-First, ‘Amazon Textract’ read the text in ‘Book.jpg’, digitized this data, and created the ‘Book.txt’ file.
-Then, ‘Amazon Comprehend’ analyzed this text, extracted the ‘most important words’ from it, and determined the ‘overall sentiment’ (positive, negative, neutral).
-The results of this analysis were saved as the ‘Book_summary.txt’ file.
-Now we have both the ‘raw text’ and an analysis file that summarizes what the text says.
-If you specifically open and examine the ‘Book_summary.txt’ file, you’ll see that, in addition to the keywords, it says ‘Sentiment: NEGATIVE’. This analysis was performed by the ‘Comprehend’ service.
— — — — — — —
I can almost hear you saying, “Let’s see if that’s the case”.
Text in the ‘Book.jpg’ image:
“
For the past 33 years, I’ve looked in the mirror every morning and asked myself:
“If today were the last day of my life, would I want to do what I’m about to do today?”
And whenever the answer has been “No” for too many days in a row, I know I need to change something.
Steve Jobs”
** There’s a real negative sentiment in the text. Comprehend did a good job.
-Now let’s try the other images.
-We’ll examine the image named ‘SnackVault Reviews.jpg’. This image contains a screenshot of user reviews of a product.
… — INFORMATION — …
The product examined in the image:
- ‘SnackVault Anti-Theft Snack Safe’
This product is a theft-proof storage safe designed to prevent theft of snacks.
-According to user reviews, the product:
- Has a fingerprint scanner,
- Has an alarm system,
- Is so secure that users sometimes even struggle to access their own snacks.
… — END OF INFORMATION — …
Drag and drop the image file named ‘SnackVault Reviews.jpg’ located at the ‘https://drive.google.com/drive/folders/1VPOZLPVdCd9uBMhT9O_pDSuunybspgEL?usp=sharing’ link to the area below the ‘Name’ and ‘Modified’ text on the left.
Select all the code in the notebook with ‘CTRL + A’ and then click ‘Run’.
** In a few seconds, you’ll see that a file named ‘SnackVault Reviews.txt’ and another file named ‘SnackVault Reviews_summary.txt’ have been created in the same directory as ‘AWS AI Services.jpynb’ and ‘SnackVault Reviews.jpg’.
** Great! Now you can analyze not only individual quotes but also much longer and more complex texts, such as customer reviews.
** In this example, ‘Amazon Textract’ converted all the user reviews in the image to text and created ‘SnackVault Reviews.txt’.
** Amazon Comprehend then analyzed these reviews to identify prominent phrases and the overall tone of the reviews (‘generally positive, because the stars are high’).
** As a result, the ‘SnackVault Reviews_summary.txt’ file included the most frequently occurring product-related words and overall customer satisfaction. (SnackVault Reviews_summary.txt -> Sentiment: POSITIVE)
** As you can see, we can now both ‘read’ and ‘interpret’ the data extracted from the images!
-Now let’s try another image: ‘Health Insurance Card.jpg’
-So, what does this card contain?
-This image (‘Health Insurance Card.jpg’) shows an example of a health insurance card.
-The card’s content can be summarized as follows:
Insurance Information:
----------------------
- Insurance Company : MEDIOCRE INSURANCE COMPANY
- Insurance Plan : Gray Plan (Ind & Fam → Individual and Family Plan)
Subscriber Information:
-----------------------
- Subscriber Name : Jane Doe
- ID No : 012345678
- Group No : GP99987
- Member Name : Jane Doe
Medicine Information:
----------------------
- RxBIN : BIN1234
- RxPCN : 0000
- Rx OOP Max : INCL W/ MED (With medical coverage)
Financial Information:
----------------------
- Copay : $20 / $50
- Med Deductible : $1,000 / $2,000
- Med OOP Max : $20,000 / $40,000
Additional Information:
-----------------------
- A PPO (Preferred Provider Organization) logo appears in the bottom right corner of the card. This indicates that the insured is receiving services through a healthcare network.
* These types of cards are used for identity and insurance verification at the hospital or pharmacy.
Drag and drop the image file named ‘Health Insurance Card.jpg’ located in the ‘https://drive.google.com/drive/folders/1VPOZLPVdCd9uBMhT9O_pDSuunybspgEL?usp=sharing’ link to the area below the ‘Name’ and ‘Modified’ text on the left.
Select all the code in the notebook with ‘CTRL + A’ and then click ‘Run’.
-In a few seconds, you’ll see that a file named ‘Health Insurance Card.txt’ and another file named ‘Health Insurance Card_summary.txt’ have been created in the same directory as ‘AWS AI Services.jpynb’ and ‘Health Insurance Card.jpg’.
-Great, now we can not only extract the text from images, but also understand their content!
-In this example, ‘Amazon Textract’ has created the ‘Health Insurance Card.txt’ file by reading the information on the health insurance card.
-Amazon Comprehend then analyzed this text and determined that the content was descriptive, neutral, and informative, resulting in a result of ‘Sentiment: NEUTRAL’.
-The resulting ‘Health Insurance Card_summary.txt’ file contains the ‘keywords in the summary of the content’ and the ‘sentiment (neutral) of the content’ on the card.
-Comprehend’s neutral analysis, even for technical documents devoid of sentiment, is quite useful for accurate classification.
What Did We Do in This Step?
— — — — — — — — — — — — — — — —
-We analyzed the raw text extracted with Textract using Amazon Comprehend.
-For each image:
- We found the key phrases.
- We extracted the sentiment of the text (positive, negative, neutral, mixed).
-We created a separate ‘**_summary.txt*’ file for each analysis.
-Now we’ll move on to the next part. — > Comprehend only looks at the words, but Bedrock models also read the image and understand the context!
Part 5: Deriving Insights form Images with Bedrock Claude 3
Why are we doing this step?
In this part, we move beyond simply extracting text from an image or analyzing it; we move directly to interpreting the content and meaning of the image.
We will achieve this with the Claude 3 Haiku model running on Amazon Bedrock.
Claude 3 is an advanced artificial intelligence model that can read images and answer abstract questions like “What’s in this image?”
Note:
— — —
Textract only extracts the text,
Comprehend only finds the meaning of the text,
But Claude 3 directly understands the image and interprets it clearly.
This way, we can obtain ‘much more meaningful summaries’ of difficult data such as handwriting, messy notes, stickers, and scribbles.
Steps:
— — —
Before starting, in the tab bar, which contains the names of the ‘AWS AI Services.jpynb’ notebook and the other files we opened, hover over the ‘AWS AI Services.jpynb’ notebook and right-click.
Then select ‘Close All Other Tabs’.
Select the files with the ‘.jpg’ and ‘.txt’ extensions (by holding down the CTRL key and holding down the left mouse button) in the area below the ‘Name’ and ‘Modified’ text on the left, and press ‘Delete’.
Then, in the window that opens, click ‘Move to Trash’ to delete them.
Right-click on the notebook containing the codes and select ‘Clear Outputs of All Cells’.
Select the codes in the ‘AWS AI Services.jpynb’ notebook using ‘CTRL + A’, and then delete them by pressing ‘Delete’.
Copy and paste the following code into the ‘AWS AI Services.jpynb’ file (This time we will work with the ‘Bedrock’ service):
import os
import boto3
import json
import base64
from tqdm import tqdm
from PIL import Image
def get_image_files(directory):
"""Get all jpg and png files in the given directory."""
return [f for f in os.listdir(directory) if f.lower().endswith(('.jpg', '.png', '.jpeg'))]
def should_process_file(file_path):
"""Check if the file should be processed (i.e., no corresponding txt file exists)."""
txt_path = os.path.splitext(file_path)[0] + '.txt'
return not os.path.exists(txt_path)
def analyze_image_with_bedrock(image_path):
"""Analyze the image using Amazon Bedrock."""
bedrock_client = boto3.client('bedrock-runtime')
# Convert the image to base64
with open(image_path, 'rb') as image_file:
image_bytes = image_file.read()
encoded_image = base64.b64encode(image_bytes).decode()
# Prepare the payload according to the Bedrock API requirements
payload = {
"messages": [
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": encoded_image
}
},
{
"type": "text",
"text": "Explain the content of this image."
}
]
}
],
"max_tokens": 1000,
"anthropic_version": "bedrock-2023-05-31"
}
try:
response = bedrock_client.invoke_model(
modelId='anthropic.claude-3-haiku-20240307-v1:0', # Replace with your preferred model ID
contentType='application/json',
accept='application/json',
body=json.dumps(payload)
)
response_body = response['body'].read().decode('utf-8')
response_json = json.loads(response_body)
# Adjust this based on the actual structure of the response
analysis = response_json.get('message', {}).get('content', 'No analysis generated.')
# Fallback to the full response if 'content' is missing
if analysis == 'No analysis generated.':
analysis = response_body
return analysis
except Exception as e:
print(f"Error processing {image_path}: {e}")
return "Error occurred during analysis."
def save_analysis_to_file(analysis, file_path):
"""Save the analysis to a file with a '_summary' suffix."""
analysis_path = os.path.splitext(file_path)[0] + '_summary.txt'
with open(analysis_path, 'w', encoding='utf-8') as f:
f.write(analysis)
def process_images_in_directory(directory):
"""Process all images in the given directory."""
image_files = get_image_files(directory)
with tqdm(total=len(image_files), desc="Processing images") as pbar:
for image_file in image_files:
image_path = os.path.join(directory, image_file)
if should_process_file(image_path):
pbar.set_postfix({'Current file': image_file})
analysis = analyze_image_with_bedrock(image_path)
save_analysis_to_file(analysis, image_path)
pbar.update(1)
# Usage in Jupyter notebook or standalone script
directory = '.' # Current directory
process_images_in_directory(directory)
- After pasting the code, save the changes you made to the ‘AWS AI Services.jpynb’ file by clicking the floppy disk icon in the upper left corner or pressing ‘CTRL + S’.
… — INFORMATION — …
GENERAL PURPOSE OF THE CODE:
— — — — — — — — — — — — — — — — — — —
-This Python code takes images (e.g. ‘.jpg’, ‘.png’) in a folder and automatically interprets the content of that image using the ‘Amazon Bedrock’ service and saves the result as a ‘.txt’ file.
… — END OF INFORMATION — …
Drag and drop the image file named ‘Handwriting.jpg’ (containing the post-it notes) located in the ‘https://drive.google.com/drive/folders/1VPOZLPVdCd9uBMhT9O_pDSuunybspgEL?usp=sharing’ link to the area below the ‘Name’ and ‘Modified’ text on the left.
Select all the code in the notebook with ‘CTRL + A’ and then click ‘Run’.
Within a few seconds, you will see a file named ‘Handwriting_summary.txt’ created in the same directory as ‘AWS AI Services.jpynb’ and ‘Handwriting.jpg’.
Open the file named ‘Handwriting_summary.txt’ that emerged from Bedrock’s analysis.
-Its content will be as follows:
{“id”:”msg_bdrk_01MbSTzgwchR4U1eYxbnDUdn”,”type”:”message”,”role”:”assistant”,”model”:”claude-3-haiku-20240307",”content”:[{“type”:”text”,”text”:”The image appears to be a collection of sticky notes on a wall, displaying various concepts and elements related to web design and development. The sticky notes cover topics such as Responsive Web Design, Creativity, E-mail, Team, Innovation, Social Media, Plan, Design Strategy, Thinking, Analyze, User Experience, People Resource, Goals, and more. The hand in the image is pointing to a note that says \”Design Strategy\”, indicating that this is a key element in the overall process. design.”}],”stop_reason”:”end_turn”,”stop_sequence”:null,”usage”:{“input_tokens”:1033,”output_tokens”:127}}
-Great! Now we can not only read the text, but also ‘understand’ the image itself.
-The ‘Claude 3 Haiku’ model produced a ‘meaningful and summative interpretation’ by considering not only the text in the image but also its position, relationship, and context.
-Thanks to this analysis, we can gain insight into the entire image without having to extract the text individually.
-Claude 3 offers a significant advantage, especially with complex, handwritten, or irregularly placed content like post-its.
** So, AI now doesn’t just ‘read’ images; it also ‘interprets’ and gives them meaning.
-Let’s take another example.
Drag and drop the image file named ‘SnackVault Reviews.jpg’ located in the ‘https://drive.google.com/drive/folders/1VPOZLPVdCd9uBMhT9O_pDSuunybspgEL?usp=sharing’ link to the area below the ‘Name’ and ‘Modified’ text on the left.
Select all the code in the notebook with ‘CTRL + A’ and then click ‘Run’.
In a few seconds, you will see a file named ‘SnackVault Reviews_summary.txt’ created in the same directory as ‘AWS AI Services.jpynb’ and ‘SnackVault Reviews.jpg’.
Open the file named ‘SnackVault Reviews_summary.txt’ that emerged from Bedrock’s analysis.
-Its content will be as follows:
{“id”:”msg_bdrk_018L7XpZCh8TPLi7qybKg3uN”,”type”:”message”,”role”:”assistant”,”model”:”claude-3-haiku-20240307",”content”:[{“type”:”text”,”text”:”The image shows customer reviews for the SnackVault Anti-Theft Snack Safe. The reviews cover various aspects of the product, such as its effectiveness in securing snacks, the alarm sensitivity, and its usefulness for different users like parents and roommates. The reviews range from highly positive to moderately critical, providing a mix of perspectives on the snack safe’s features and performance.”}],”stop_reason”:”end_turn”,”stop_sequence”:null,”usage”:{“input_tokens”:1549,”output_tokens”:85}}
-This analysis allowed us to understand not only the text in the image but also the ideas and user experiences within it.
-‘Claude 3’ read the comments and summarized the product’s strengths and weaknesses, providing us with a general assessment.
-Critical details such as the product’s security level, alarm sensitivity, and target audience were clearly highlighted in the analysis text.
-Such summaries are invaluable for users who want to quickly and clearly understand the product.
** Now we can extract not only information from images, but also user experience and opinions!
What We Did at This Stage?
— — — — — — — — — — — — — — — —
-Using the ‘Claude 3 Haiku’ model, we directly analyzed the content of the image.
-We generated an AI response that understood the image, the objects it contained, and its context.
-We implemented the Bedrock service for visually-based use.
Conclusion
Throughout this blog post, we:
Extracted text from images with Textract,
Analyzed text with Comprehend,
Directly annotated images with Bedrock Claude 3 Haiku,
Experienced the entire process in SageMaker Notebook.
We can now not only digitize an image but also automatically extract the text and the full context of the image.
This approach can be used in a wide range of scenarios, from handwritten notes and customer reviews to health cards and complex drawings.
Shortly, we now have not only OCR (Optical Character Recognition) but also a true AI-powered image understanding system.
Stay tuned for my new blog posts…
Oğuzhan Selçuk HIZIROĞLU
SysOps Admnistrator Team Lead @Cloud and Cloud
AWS Ambassador | AWS Golden Jacket Winner | AWS Champion Authorized Instructor | AWS AAI Community Difference Maker Award Winner | AWS Community Builder (ML)



Top comments (0)