DEV Community

Cover image for Automated AWS Receipt Processing System using Textract, Lambda & MySQL
AKASH S
AKASH S

Posted on

Automated AWS Receipt Processing System using Textract, Lambda & MySQL

Introduction

In this blog, I’ll walk you through how I built an end-to-end automated receipt processing system on AWS using:

  • Amazon S3
  • AWS Lambda
  • Amazon Textract (AnalyzeExpense)
  • Amazon RDS (MySQL)
  • Lambda Layers (pymysql)

Use this repo if needed

Architecture Overview
User uploads receipt (PDF / JPG)
        ↓
Amazon S3 (ObjectCreated event)
        ↓
AWS Lambda
        ↓
Amazon Textract (AnalyzeExpense)
        ↓
Amazon RDS (MySQL)
Enter fullscreen mode Exit fullscreen mode

Step 1: Create the MySQL Database (RDS) and set up a DB with Wanted Entities and Attributes.

CREATE DATABASE receipt_db;

USE receipt_db;

CREATE TABLE ride_receipts (
    receipt_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_name VARCHAR(100),
    ride_id VARCHAR(50),
    driver_name VARCHAR(100),
    vehicle_number VARCHAR(20),
    mode_of_vehicle VARCHAR(50),
    selected_price DECIMAL(10,2),
    time_of_ride TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Enter fullscreen mode Exit fullscreen mode

Step 2: Create Lambda Layer for pymysql

AWS Lambda does not include pymysql by default, so we must add it using a Lambda Layer.

mkdir pymysql_layer
cd pymysql_layer
mkdir python
pip install pymysql -t python/
zip -r pymysql_layer.zip python
Enter fullscreen mode Exit fullscreen mode

Upload Layer in AWS

  • Go to Lambda → Layers
  • Create new layer
  • Runtime: Python 3.10
  • Upload pymysql_layer.zip

Step 3: Create Lambda Function

  • Runtime: Python 3.10
  • Attach pymysql layer
  • Add IAM permissions:

    • AmazonTextractFullAccess
    • AmazonS3ReadOnlyAccess
    • CloudWatchLogsFullAccess

Step 4: Lambda Function Code

  • Listens for a receipt upload event from Amazon S3.
  • Reads the uploaded file’s bucket name and object key.
  • Sends the receipt to Amazon Textract for structured data extraction.
  • Processes the extracted text to identify key receipt details.
  • Stores the structured data into a MySQL database automatically.

Step 5: Configure S3 Trigger

Go to Lambda → Configuration → Triggers

  • Add S3 trigger
  • Event type: ObjectCreated (PUT)
  • Bucket: receipt-inp-ak

Step 6: Upload Receipt to S3(Upload a clean file receipt1.jpg,receipt.pdf)

Step 7:Debug Using CloudWatch
Check logs in:

CloudWatch → /aws/lambda/receipt-textract
Enter fullscreen mode Exit fullscreen mode

Successful log example:

Processing file: receipt.pdf
Extracted: {
 'vehicle_number': 'TN01B3694',
 'mode_of_vehicle': 'Car',
 'selected_price': 150.0
}
Enter fullscreen mode Exit fullscreen mode

Step 8: Verify Data in MySQL by using Select query.

Note: Don't forgot to delete the resources which we created .stay tuned for more blogs and Deployment.

Top comments (0)