Marco Gonzalez

Posted on Mar 22, 2023 • Edited on Mar 24, 2023

AWS Serverless Services integration with ChatGPT for image-to-ChatGPTquestion solution

#chatgpt #aws #lambda #textract

ChatGPT is based on the GPT (Generative Pre-trained Transformer) series of language models developed by OpenAI.

In this demo, I leverage the power of AWS S3, AWS Lambda, Amazon Textract, and OpenAI’s ChatGPT to seamlessly process images containing text and generate intelligent, context-aware responses. By combining cutting-edge OCR (Optical Character Recognition) technology with advanced language understanding, this solution offers a unique way to interact with visual data.

Note: This demo is assuming basic knowledge on Core AWS services, so details for S3 bucket creation, AWS Lambda creation, AWS VPC creation and AWS IAM role creation are omitted.

Use cases:

Image to text processing that requires integration with ChatGPT
Image to text processing integration with Chatbot solution

Chapters:

General Topology
Image to text convertion
Text to ChatGPT integration
Conclusion
Next steps ##### Services involved in this demo

AWS S3
AWS Lambda
AWS Textract
AWS Cloudwatch
AWS VPC
AWS NAT Gateway
AWS Internet Gateway
OpenAI ChatGPT API

1. General Topology

2. Image to text convertion

Steps:

Create 2 S3 buckets in the same region AWS Lambda will be created.

chatgpt-demo-1 : S3 bucket to store images containing text
chatgpt-demo-1-text : S3 bucket to store extracted text

Create an AWS Lambda function (runtime Python 3.9) with the following topology

Trigger: S3 trigger will notify AWS Lambda to start function execution when the PUT event type is executed for any file with the "raw_data" prefix. Alternative, you can select a suffix such us .jpg, .png, .pdf if you want to restrict the source files per file type.

Layer: ZIP archive that contains libraries, a custom runtime, or other dependencies. For this demo, I added a AWS SDK for Python (Boto3) as a zip file for all functions in this demo. You can refer to this link for more details on Layer benefits for AWS Lambda functions: https://towardsdatascience.com/introduction-to-amazon-lambda-layers-and-boto3-using-python3-39bd390add17
Permissions: AWS Lambda function should have permission to the following services: AWS S3, AmazonTextract and AWS CloudWatch. Following image is an example of this setup

Environment Variable: This is an optional setup but truly useful in case you don't want to depend on fixed values, but have the freedom to quickly update AWS S3/API-gateway/etc information within your code.

Environment Variable for destination S3 bucket

Lambda Code: The following code objective is to collect the image from source S3 bucket, call AmazonTextract API service to extract the text within the image and store the result in a .txt file with "_processed_data.txt" suffix and store in a target S3 bucket.

import json
import boto3
import os
import urllib.parse

print('Loading function')

s3 = boto3.client('s3')

# Amazon Textract client
textract = boto3.client('textract')

def getTextractData(bucketName, documentKey):
    print('Loading getTextractData')
    # Call Amazon Textract
    response = textract.detect_document_text(
        Document={
            'S3Object': {
                'Bucket': bucketName,
                'Name': documentKey
            }
        })

    detectedText = ''

    # Print detected text
    for item in response['Blocks']:
        if item['BlockType'] == 'LINE':
            detectedText += item['Text'] + '\n'

    return detectedText

def writeTextractToS3File(textractData, bucketName, createdS3Document):
    print('Loading writeTextractToS3File')
    generateFilePath = os.path.splitext(createdS3Document)[0] + '_processed_data.txt'
    s3.put_object(Body=textractData, Bucket=bucketName, Key=generateFilePath)
    print('Generated ' + generateFilePath)


def lambda_handler(event, context):
    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    try:
        detectedText = getTextractData(bucket, key)
        writeTextractToS3File(detectedText, os.environ['processed_data_bucket_name'], key)

        return 'Processing Done!'

    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

Text to ChatGPT integration

Create 1 additional S3 bucket in the same region AWS Lambda will be created.

chatgpt-4073-output : S3 bucket to store ChatGPT answer in .txt format

2.Create a VPC with 2 subnets (1 private subnet, 1 public subnet), an Internet Gateway and a NAT Gateway AWS will be deployed.

3.Create an AWS Lambda function (runtime Python 3.9) with the following topology

Trigger: S3 trigger will notify AWS Lambda to start function execution when the PUT event type is executed for any file with the "_processed_data.txt" suffix.

Permissions: AWS Lambda function should have permission to the following services: AWS S3 and AWS CloudWatch.
VPC: For this AWS Lambda function, we will create within the new VPC, in the private subnet. Access to Public GhatGPT API will be done through NAT-Gateway.

Environment variables: In order to make an API call to Public API webservice, the following variables were define:

model_chatgpt : Selected ChatGPT model for text processing
openai_secret_key_env: Security Key to authenticate API call user
output_bucket_name: AWS S3 bucket to store results in .txt file.

4.A big kudos to my friend Prakash Rao https://www.linkedin.com/in/prakashrao40/), who is the creator for the following Python script

import os
import json
import boto3
import http.client
import urllib.parse

s3 = boto3.client('s3')

def lambda_handler(event, context):
    # Get the uploaded file's bucket and key
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    # Read the text file from S3
    file_content = s3.get_object(Bucket=bucket, Key=key)['Body'].read().decode('utf-8')

    # Store the file content in a Python JSON
    file_json = {'text': file_content}

    # Read OpenAI API credentials from environment variables
    openai_secret_key = os.environ['openai_secret_key_env']

    # Set up HTTP connection to OpenAI API endpoint
    connection = http.client.HTTPSConnection('api.openai.com')

    # Define request parameters
    prompt = file_json['text']
    model = os.environ['model_chatgpt']
    data = {
        'prompt': prompt,
        'model': model,
        'max_tokens': 50
    }
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {openai_secret_key}'
    }

    # Send API request and parse response
    connection.request('POST', '/v1/completions', json.dumps(data), headers)
    response = connection.getresponse()
    response_data = json.loads(response.read().decode())
    completion_text = response_data['choices'][0]['text']

    # Print generated text
    #print(completion_text)

    # Define the output bucket and output key (file name)
    output_bucket = os.environ['output_bucket_name']
    output_key = f"{os.path.splitext(os.path.basename(key))[0]}_chatgpt_result.txt"

    # Upload the generated text to the output S3 bucket
    s3.put_object(Bucket=output_bucket, Key=output_key, Body=completion_text)

    # Return response to API Gateway
    return {
        'statusCode': 200,
        'body': json.dumps({'text': completion_text})
    }

5.Once integration is completed, the S3 bucket will store the results. In addition, you can refer to AWS Cloudwatch Logs > Log groups to check the details of API calls and results.

Conclusion:

Existing AWS serverless services such as Amazon Textract, Amazon Comprehend, Amazon Lex and others are great candidates to integrate with OpenAI ChatGPT. In this demo, I wanted to show how easy this integration can be achieved without exceeding our budget.

Nex steps:

Integrate AWS Lambda to AWS Secret Manager for credentials
Add AWS SNS Service for multiple uploads
Improve existing ChatGPT model

Happy Learning!

DEV Community

AWS Serverless Services integration with ChatGPT for image-to-ChatGPTquestion solution

Use cases:

Chapters:

1. General Topology

2. Image to text convertion

Text to ChatGPT integration

Conclusion:

Nex steps:

Top comments (0)

Read next

Amazon Q Developer Tips: No.12 Mastering in-line prompts

Creating a S3 Bucket on AWS and Presigned URLs

Do you know that EKS Auto Mode enforces a 21-day maximum node lifetime?

Building Cloud Security Efforts with AWS CAF and Well-Architected Framework