sehmimhaque

Posted on Apr 26, 2021

Getting started with Optical Character Recognition using Python and AWS

#python #aws #serverless #machinelearning

In this blog we will use AWS Textract to scan and extract the texts of a document from a picture and get a JSON output response. We will also use AWS lambda function with Python to build a backend.

If you've read my previous blog Serverless AWS Textract Document Scanner you can see that we created an endpoint to run AWS Textract using Node.js. There were however, a few problems that I ran into that only got fixed by switching to python. For example, one of the biggest problems I faced while using node.js to run my app was the request time being too long. Sometimes it would take up to 30 seconds for a response finish. As you can tell, that is terrible. So I decided to switch my stack from node.js to Python and now the average time to run requests dropped from 25s to 2s. Pretty nice right? I know. Let's see how we can do this.

1. Setting up Backend with Serverless using Python

Assuming you already know how serverless works, we can continue with AWS Textract and the flow it follows. If you're not familiar with serverless with node please don't jump the gun, go checkout some tuts here.

Okay. Let's quickly setup our serverless

sls create --template aws-python --path myService

Make sure you have the following dependencies installed.

boto3
json

NOTE

Something things to keep in mind before continuing

Make sure you have proper authorization for this task,
check your region and
make sure the bucket url is accurate.

2. Now once AWS SDK is configured, we can write code for Textract

import json
import boto3

def textractAnalyzer(event, context):
    bucket="YOUR_BUCKET_NAME"
    document = json.loads(event['body'])['fileKey']
    client = boto3.client('textract')

    #process using S3 object
    response = client.detect_document_text(
        Document={'S3Object': {'Bucket': bucket, 'Name': document}})

    #Get the text blocks
    blocks=response['Blocks']

    # All Text By Line
    texts_by_line = dataPurifierByLine(blocks)

    return {
        'statusCode': 200,
        'body': json.dumps({
            "fileKey": document,
            "textByLine": texts_by_line,
            "texTractblocks" : blocks ## Full response from textract
        }),
    }

def dataPurifierByLine(blocks):
    result = []
    for block in blocks:
        if block['BlockType'] == "LINE":
            entry = {
                "line": block['Text'],
                "confidence": block['Confidence']
            }
            result.append(entry)
    return result

The following code finds a file with the specific key in s3/public/** and then runs Textract analysis on it.

3. Deploy the Code

sls deploy
find the endpoint, for me it looks like this

4. For our next step, we will drop a file manually on the bucket so we can use it for testing.

Go to S3,
the navigate to /public
and then upload a img file

Im using this old receipt

5. Finally, Test it on post man.

payload:

    "fileKey" : "public/demo.jpeg"

If it gives you timeout error, change the function time out to 30s on .yml file.
You can see the type of data we get back. For this demo I'm gonna take every line and add them together in an array.

Your response should look something like this

{
    "fileKey": "public/demo.jpeg",
    "textByLine": [
        {
            "line": "01/027 APPROVED - THANK YOU",
            "confidence": 99.5232162475586
        },
        .
        .
        .
        .
    ],
    "texTractblocks": [
        {
            "BlockType": "PAGE",
            "Geometry": {
                "BoundingBox": {
                    "Width": 0.8844140768051147,
                    "Height": 0.8354079723358154,
                    "Left": 0.048781704157590866,
                    "Top": 0.15526676177978516
                },
                "Polygon": [
                    {
                        "X": 0.07131516188383102,
                        "Y": 0.1597394049167633
                    },
                    {
                        "X": 0.9331957697868347,
                        "Y": 0.15526676177978516
                    },
                    {
                        "X": 0.9245083928108215,
                        "Y": 0.9906747341156006
                    },
                    {
                        "X": 0.048781704157590866,
                        "Y": 0.9588059782981873
                    }
                ]
            },
            "Id": "9b384b8d-dcb8-4596-8511-af18659a9787",
            "Relationships": [
                {
                    "Type": "CHILD",
                    "Ids": [
                        "250a9339-d1ed-4c21-ad50-5a2154cd89da",
                        "aac798f2-3c05-41a2-979c-869509b53d58",
                        "eb878ad4-8b37-415d-b6ac-8cc909dab0a3",
                        "376c375f-94d1-47b7-9f4e-a9fb203043f2",
                        "628dbdd6-1225-43c9-867c-9a83ea91e1ae",
                        "aecacbf9-8727-4334-a904-6795df9c455b",
                        "c8e51b32-d010-4300-8e98-6002d6e5eee3",
                        "20e6422a-16c0-41b6-be2d-6c0c9d09ed44",
                        "82bfdb0d-20bd-407f-bc3b-33aef24fc097",
                        "aa3125fd-2e2d-48a5-9416-84ef7a987976",
                        "10ec162e-a937-4cd2-87d5-6d6b9205d719",
                        "b05a2ece-0a7f-4e65-87e5-fe4e49277f25",
                        "561f5c75-bbb4-4dc6-8660-fbc3f7386f9c",
                        "665bb6fe-8ac9-44b3-af49-189ac3ea7757",
                        "5d42a676-0621-42ad-89ff-7a16873290c4",
                        "bdb02d6e-3b80-4913-8359-ef7e70068582",
                        "28691f75-aef5-418d-8519-1d05bb991fda",
                        "8c4b9208-c2c5-4ad8-96a6-35e962043fbd"
                    ]
                }
            ]
        },
        .
        .
        .
}

That's it!

Next Step

Next week I will continue on with this app and build a front end for it using Flutter and AWS Amplify.

We will setup AWS Amplify suing Flutter,
Setup our camera to take pictures.
Once that's done we will confirm and send the picture to the S3 store,
Which will trigger our lambda function and send us the response back to our front-end.

DEV Community

Getting started with Optical Character Recognition using Python and AWS

1. Setting up Backend with Serverless using Python

NOTE

2. Now once AWS SDK is configured, we can write code for Textract

3. Deploy the Code

4. For our next step, we will drop a file manually on the bucket so we can use it for testing.

Im using this old receipt

5. Finally, Test it on post man.

Next Step

Top comments (0)

Read next

A Guide to Deploying Nest.js Applications with AWS CodePipeline and ECS Fargate

We gave a lightning talk at the AWS Summit in Sydney 2024.

The Power of Generative AI for Tomorrow's Tech Careers

GeneticsBot - Learn Genetics with open source knowledge