Praveen Tech World

Posted on Jun 30 • Originally published at praveentechworld.com

AI-Powered Expense Report Automation for Office Workers: No-Code Solutions

#tutorial

How Can AI Automate My Expense Reports Without Writing Code?

The short answer is: I used DeepSeek to generate a Python script that extracts data from receipt images, categorizes expenses, and formats them into a CSV report-all triggered by dragging files into a folder. The AI handled 80% of the code, while I focused on structuring the workflow and fixing library conflicts.

The Problem: Expense Reports Were Killing My Productivity

Every Friday, I wasted 2 hours manually:

Cropping receipt images
Typing amounts into Excel
Assigning categories (Travel, Meals, etc.)
Calculating totals

Our finance team needed reports in a specific CSV format with columns for Date, Vendor, Amount, and Category. The worst part? I’d often transpose numbers (like typing $56.20 as $65.20), requiring rework.

The AI Attempt: A One-Shot Prompt to DeepSeek

I fed DeepSeek this prompt:

Prompt:  
"Build a Python script that:  
1. Watches a 'receipts' folder for new image files (JPG/PNG)  
2. Uses OCR to extract text from each image  
3. Identifies the vendor name, date, and total amount  
4. Categorizes expenses as 'Travel', 'Meals', or 'Office Supplies' based on keywords  
5. Appends the data to a CSV with headers: Date,Vendor,Amount,Category  
6. Runs automatically when new files appear  

Use pytesseract for OCR and watchdog for file monitoring.  
Include error handling for bad OCR reads."

First Output:

DeepSeek generated a 47-line script using:

pytesseract for OCR
watchdog for folder monitoring
Regex patterns to find amounts/dates

Where It Worked:

The folder watcher logic was flawless
CSV generation used proper escaping

Where It Broke: 3 Critical Failures

Library Hell The script required pytesseract, but failed because I didn’t have Tesseract OCR installed system-wide. Error:

   TesseractNotFoundError: tesseract is not installed or it's not in your PATH

Vendor Detection Was Garbage For a Starbucks receipt, it output:

   Vendor: STARBUCKS INC 12345  
   Date: 2024-02-30 (invalid date)  
   Amount: $12.SO (misread '5' as 'S')

False Positives A screenshot of a Slack conversation triggered the script because it contained a dollar sign ($).

What I Had to Fix

1. The OCR Pipeline

I switched to EasyOCR after finding pytesseract struggled with thermal receipts. Required changes:

# Before (pytesseract)  
text = pytesseract.image_to_string(image)  

# After (EasyOCR)  
reader = easyocr.Reader(['en'])  
results = reader.readtext(image, detail=0)

2. Date Validation

Added a function to reject impossible dates (like February 30th):

from datetime import datetime  
def validate_date(date_str):  
    try:  
        datetime.strptime(date_str, '%Y-%m-%d')  
        return True  
    except ValueError:  
        return False

3. Vendor Cleanup

Used a hardcoded vendor list with fuzzy matching:

KNOWN_VENDORS = ['Starbucks', 'Uber', 'Amazon Web Services']  

def match_vendor(raw_text):  
    for vendor in KNOWN_VENDORS:  
        if vendor.lower() in raw_text.lower():  
            return vendor  
    return "Unknown"

The Working Result

Final script features:

Drag-and-drop processing: Drop receipts into ~/expense_reports/receipts
Human review: Flags uncertain entries in yellow (requires manual confirmation)
Audit trail: Logs all actions to processing.log

Sample Output CSV:

Date,Vendor,Amount,Category  
2024-03-15,Starbucks,12.50,Meals  
2024-03-16,Uber,24.30,Travel

Execution Time:

0.8 seconds per image (tested on 20 receipts)
95% accuracy on typed receipts, 70% on handwritten

GitHub repo with full code

What I Learned

AI Can’t Handle Edge Cases Alone

DeepSeek didn’t anticipate thermal receipt fade or handwritten notes. I had to build manual review steps.
Prompt Engineering = Specification Writing

The more precise my prompt (e.g., "include a 10% confidence threshold for OCR results"), the better the output.
Library Choices Matter

EasyOCR worked better than pytesseract for my use case, but added 300MB to the deployment size.

The Exact Prompt

Here’s the refined prompt that got me the final working version:

"Build a Python 3 script that:  
1. Uses EasyOCR instead of pytesseract for better receipt readability  
2. Watches a folder for new JPG/PNG files with watchdog  
3. Extracts:  
   - Date (must be valid, skip if invalid)  
   - Amount (must match regex ^\$?\d+\.\d{2}$)  
   - Vendor (fuzzy match against a hardcoded list)  
4. Categorizes expenses using these rules:  
   - 'Travel' if vendor is Uber/Lyft or contains 'taxi'  
   - 'Meals' if vendor is a restaurant  
   - 'Office' if vendor is Amazon/Staples  
5. Outputs to CSV with error logging  
6. Skips images where confidence is below 70%  

Include a function to manually review uncertain entries before CSV export."

FAQ

Q: Can this handle multi-currency receipts?

A: Not yet. The current regex only catches USD ($). You’d need to modify the amount detection pattern.

Q: What about receipts in other languages?

A: EasyOCR supports 80+ languages. Change the Reader initialization to ['en','es','fr'] as needed.

Q: Is this GDPR compliant?

A: No-the script stores raw receipt images alongside processed data. You’d need to add auto-deletion of images after processing.

Q: Can I run this on my phone?

A: Only via Termux (Linux environment for Android). For iOS, you’d need a cloud trigger like AWS Lambda.

Q: How much did this cost to build?

A: $0 in tools (all open-source), but 3 hours of prompt tuning and debugging.

What expense workflow would you automate with this approach? Drop a comment with your nightmare process!

Internal Links:

Related Guides

— The short answer is you can use a Python script with DeepSeek prompts to automatically pull grade da
— An IT Ops Lead used DeepSeek to build a Python script tracking AI API costs across providers. The AI
— A DeepSeek-generated AWS cleanup script identified 3,000 resources as deletable. The AI hallucinated

DEV Community

AI-Powered Expense Report Automation for Office Workers: No-Code Solutions

How Can AI Automate My Expense Reports Without Writing Code?

The Problem: Expense Reports Were Killing My Productivity

The AI Attempt: A One-Shot Prompt to DeepSeek

Where It Broke: 3 Critical Failures

What I Had to Fix

The Working Result

What I Learned

The Exact Prompt

FAQ

Related Guides

Related Guides

Top comments (0)