Introduction
While building a budgeting app, I identified a feature that had value beyond personal expense tracking. By enabling users to scan supermarket receipts, the application can extract structured purchase data and analyze individual spending patterns automatically.
This capability not only simplifies budgeting for users but also highlights a broader opportunity for the retail industry. Receipt-level data can provide insights into consumer behavior and enable retailers to deliver more targeted, data-driven promotional offers tailored to specific customers.
Problem Statement: Manual Expense Tracking Doesn’t Scale
Tracking expenses manually is time-consuming and error-prone. Most budgeting applications rely on users to enter purchase details by hand, which often leads to incomplete data and poor long-term adoption.
Supermarket receipts contain rich information—item names, prices, categories, totals—but this data is usually locked away in unstructured formats such as images or PDFs. Without automation, extracting and organizing this information becomes a significant challenge, limiting both accurate budget tracking and deeper spending analysis.
This problem becomes more pronounced as transaction volume grows, making a scalable, automated receipt-processing solution essential.
Feature Overview
The receipt-scanning feature allows users to capture supermarket receipts and automatically convert them into structured expense data within the budgeting app.
From a user’s perspective, the workflow is simple:
- The user uploads a photo of a supermarket receipt.
- The application processes the image and extracts key purchase details such as store name, items, prices, total amount, and purchase date.
- The extracted data is then categorized and stored, making it immediately available for budget tracking and spending analysis.
By automating this process, the feature removes the need for manual expense entry while enabling more accurate, item-level insights into consumer spending patterns.
Architecture Walkthrough: Receipt Processing Pipeline
This section walks through the architecture shown above, focusing on how each AWS service contributes to the receipt-scanning feature, from ingestion to persistent storage.
The goal of this design is to keep the workflow event-driven, scalable, and simple, while clearly separating responsibilities between OCR, AI reasoning, and data storage.
Architecture Diagram
1. Ingestion: Uploading Receipts to Amazon S3
The workflow starts when a user uploads a receipt image using either a mobile application or a web interface.
- All receipt images are stored in an Amazon S3 bucket named
receipts - S3 acts as a durable, cost-effective entry point for unstructured data (images)
- The bucket is configured with an event notification that triggers processing as soon as a new object is uploaded
Using S3 for ingestion removes the need for a dedicated API layer just to accept images and ensures uploads scale automatically with usage.
2. Event Trigger: AWS Lambda (receipt-analyzer)
When a new receipt image is uploaded, S3 triggers an AWS Lambda function called receipt-analyzer.
When a new receipt image is uploaded, S3 triggers an AWS Lambda function called receipt-analyzer.
This Lambda function acts as the orchestrator for the entire pipeline:
- It reads the S3 event metadata
- Coordinates calls to downstream services
- Normalizes and persists the final output
Because Lambda is event-driven and serverless, the system only runs compute when a receipt actually arrives.
import json
import boto3
from datetime import datetime
from decimal import Decimal
# Initialize clients
textract = boto3.client('textract', region_name='us-east-1')
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
# Configuration
BEDROCK_MODEL_ID = 'anthropic.claude-3-sonnet-20240229-v1:0'
DYNAMODB_TABLE_NAME = 'receipt-processing-results'
def convert_floats_to_decimal(obj):
"""Recursively convert float values to Decimal for DynamoDB compatibility"""
if isinstance(obj, list):
return [convert_floats_to_decimal(item) for item in obj]
elif isinstance(obj, dict):
return {key: convert_floats_to_decimal(value) for key, value in obj.items()}
elif isinstance(obj, float):
return Decimal(str(obj))
else:
return obj
def lambda_handler(event, context):
# Extract S3 bucket and object key from event
bucket_name = event['Records'][0]['s3']['bucket']['name']
document_name = event['Records'][0]['s3']['object']['key']
# Step 1: Extract raw text using Textract OCR
textract_response = textract.detect_document_text(
Document={'S3Object': {'Bucket': bucket_name, 'Name': document_name}}
)
# Step 2: Concatenate lines of text
text_lines = [block['Text'] for block in textract_response['Blocks'] if block['BlockType'] == 'LINE']
full_text = "\n".join(text_lines)
# Step 3: Prepare prompt for Bedrock
prompt = f"""You are an AI assistant that extracts structured data from receipts. Given the receipt text below, return a JSON with the following fields:
- supermarket_name
- location (address)
- items (list of item name and price)
- total_amount
- date_of_purchase
Receipt Text:
\"\"\"
{full_text}
\"\"\"
Return only the JSON object, no explanation."""
# Step 4: Invoke Bedrock
bedrock_response = bedrock.invoke_model(
modelId=BEDROCK_MODEL_ID,
contentType='application/json',
accept='application/json',
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": prompt
}
],
"temperature": 0.3,
"top_p": 0.9
})
)
# Step 5: Parse Bedrock response
response_body = json.loads(bedrock_response['body'].read().decode())
model_output = response_body['content'][0]['text'].strip()
# Try to parse model output as JSON
try:
extracted_data = json.loads(model_output)
except json.JSONDecodeError:
extracted_data = {"error": "Failed to parse response", "raw_output": model_output}
# Step 6: Convert floats to Decimal for DynamoDB
extracted_data_decimal = convert_floats_to_decimal(extracted_data)
# Step 7: Save to DynamoDB
table = dynamodb.Table(DYNAMODB_TABLE_NAME)
# Create DynamoDB item
dynamodb_item = {
'document_id': document_name,
'bucket_name': bucket_name,
'processed_timestamp': datetime.utcnow().isoformat(),
'extracted_data': extracted_data_decimal,
'raw_text': full_text
}
try:
# Save to DynamoDB
table.put_item(Item=dynamodb_item)
return {
'statusCode': 200,
'body': json.dumps({
'message': 'Receipt processed and saved successfully',
'document_id': document_name,
'extracted_data': extracted_data # Return original for JSON serialization
})
}
except Exception as e:
return {
'statusCode': 500,
'body': json.dumps({
'error': 'Failed to save to DynamoDB',
'details': str(e),
'extracted_data': extracted_data
})
}
3. Text Extraction: Amazon Textract
The first processing step inside the Lambda is optical character recognition (OCR) using Amazon Textract.
- Textract extracts raw text from the receipt image
- All detected
LINEblocks are concatenated into a single text representation - No assumptions are made about receipt layout or formatting
At this stage, the data is still unstructured—just plain text—but it provides a reliable foundation for semantic analysis.
4. Semantic Parsing: Amazon Bedrock (Claude 3 Sonnet)
Once raw text is extracted, the Lambda invokes Amazon Bedrock using the anthropic.claude-3-sonnet model.
Instead of trying to manually parse receipts with rules or regex, the model is prompted to reason over the text and return a clean JSON structure containing:
- Supermarket name
- Store location
- Item list (name and price)
- Total amount
- Date of purchase
The prompt explicitly instructs the model to:
- Return only JSON
- Follow a fixed schema
This approach dramatically simplifies downstream processing and makes the output predictable enough for database storage.
5. Persistence: Amazon DynamoDB
After successful extraction, the structured result is stored in Amazon DynamoDB, in a table named receipt-processing-results.
Each receipt is saved as a single item with the following attributes:
- document_id (String, primary key)
- bucket_name
- extracted_data (structured JSON)
- processed_timestamp
- raw_text (original OCR output)
Example of extracted_data field:
{
"location": {
"S": "Markt 54, 3431 LB Nieuwegein"
},
"items": {
"L": [
{
"M": {
"name": {
"S": "S. MARIA TORTILLA W.W"
},
"price": {
"N": "2.99"
}
}
}
]
},
"total_amount": {
"N": "2.99"
},
"date_of_purchase": {
"S": "14/06/2025"
},
"supermarket_name": {
"S": "Dirk van den Broek"
}
}
DynamoDB was chosen because it:
- Scales automatically with receipt volume
- Provides low-latency access for dashboards and queries
- Works well for item-centric access patterns (one receipt per item)
Storing both structured data and raw text allows future reprocessing if extraction logic or prompts improve.
Why This Architecture Works Well
This design has a few key advantages:
- Fully serverless – no servers to manage or scale
- Event-driven – processing happens only when new data arrives
- Separation of concerns – OCR, reasoning, and storage are cleanly isolated
- Extensible – easy to add user IDs, GSIs, or analytics pipelines later
It also keeps the system flexible: Textract can be swapped or enhanced, prompts can evolve, and DynamoDB schemas can grow without breaking the ingestion flow.
Conclusion
This feature demonstrates how a focused, single-purpose workflow can deliver meaningful value when built with the right AWS services. By combining Amazon S3, AWS Lambda, Amazon Textract, Amazon Bedrock (Claude 3 Sonnet), and Amazon DynamoDB, unstructured receipt images are transformed into structured, queryable data with minimal operational complexity.
The event-driven, serverless design scales automatically with usage and keeps costs aligned with demand. Separating OCR from AI-based reasoning also makes the solution flexible—models, prompts, or extraction logic can evolve over time without requiring architectural changes.
Most importantly, this approach removes manual effort for users while creating a strong foundation for future capabilities such as spending analytics, budget insights, and personalized recommendations. With small incremental extensions, this same architecture can support more advanced financial intelligence use cases without sacrificing simplicity or scalability.

Top comments (0)