DEV Community

Cover image for AI-Powered Expense Report Automation for Office Workers: No-Code Solutions
Praveen Tech World
Praveen Tech World

Posted on • Originally published at praveentechworld.com

AI-Powered Expense Report Automation for Office Workers: No-Code Solutions

How Can AI Automate My Expense Reports Without Writing Code?

The short answer is: I used DeepSeek to generate a Python script that extracts data from receipt images, categorizes expenses, and formats them into a CSV report-all triggered by dragging files into a folder. The AI handled 80% of the code, while I focused on structuring the workflow and fixing library conflicts.

The Problem: Expense Reports Were Killing My Productivity

Every Friday, I wasted 2 hours manually:

  1. Cropping receipt images
  2. Typing amounts into Excel
  3. Assigning categories (Travel, Meals, etc.)
  4. Calculating totals

Our finance team needed reports in a specific CSV format with columns for Date, Vendor, Amount, and Category. The worst part? I’d often transpose numbers (like typing $56.20 as $65.20), requiring rework.

The AI Attempt: A One-Shot Prompt to DeepSeek

I fed DeepSeek this prompt:

Prompt:  
"Build a Python script that:  
1. Watches a 'receipts' folder for new image files (JPG/PNG)  
2. Uses OCR to extract text from each image  
3. Identifies the vendor name, date, and total amount  
4. Categorizes expenses as 'Travel', 'Meals', or 'Office Supplies' based on keywords  
5. Appends the data to a CSV with headers: Date,Vendor,Amount,Category  
6. Runs automatically when new files appear  

Use pytesseract for OCR and watchdog for file monitoring.  
Include error handling for bad OCR reads."  
Enter fullscreen mode Exit fullscreen mode

First Output:

DeepSeek generated a 47-line script using:

  • pytesseract for OCR
  • watchdog for folder monitoring
  • Regex patterns to find amounts/dates

Where It Worked:

  • The folder watcher logic was flawless
  • CSV generation used proper escaping

Where It Broke: 3 Critical Failures

  1. Library Hell The script required pytesseract, but failed because I didn’t have Tesseract OCR installed system-wide. Error:
   TesseractNotFoundError: tesseract is not installed or it's not in your PATH  
Enter fullscreen mode Exit fullscreen mode
  1. Vendor Detection Was Garbage For a Starbucks receipt, it output:
   Vendor: STARBUCKS INC 12345  
   Date: 2024-02-30 (invalid date)  
   Amount: $12.SO (misread '5' as 'S')  
Enter fullscreen mode Exit fullscreen mode
  1. False Positives A screenshot of a Slack conversation triggered the script because it contained a dollar sign ($).

What I Had to Fix

1. The OCR Pipeline

I switched to EasyOCR after finding pytesseract struggled with thermal receipts. Required changes:

# Before (pytesseract)  
text = pytesseract.image_to_string(image)  

# After (EasyOCR)  
reader = easyocr.Reader(['en'])  
results = reader.readtext(image, detail=0)  
Enter fullscreen mode Exit fullscreen mode

2. Date Validation

Added a function to reject impossible dates (like February 30th):

from datetime import datetime  
def validate_date(date_str):  
    try:  
        datetime.strptime(date_str, '%Y-%m-%d')  
        return True  
    except ValueError:  
        return False  
Enter fullscreen mode Exit fullscreen mode

3. Vendor Cleanup

Used a hardcoded vendor list with fuzzy matching:

KNOWN_VENDORS = ['Starbucks', 'Uber', 'Amazon Web Services']  

def match_vendor(raw_text):  
    for vendor in KNOWN_VENDORS:  
        if vendor.lower() in raw_text.lower():  
            return vendor  
    return "Unknown"  
Enter fullscreen mode Exit fullscreen mode

The Working Result

Final script features:

  • Drag-and-drop processing: Drop receipts into ~/expense_reports/receipts
  • Human review: Flags uncertain entries in yellow (requires manual confirmation)
  • Audit trail: Logs all actions to processing.log

Sample Output CSV:

Date,Vendor,Amount,Category  
2024-03-15,Starbucks,12.50,Meals  
2024-03-16,Uber,24.30,Travel  
Enter fullscreen mode Exit fullscreen mode

Execution Time:

  • 0.8 seconds per image (tested on 20 receipts)
  • 95% accuracy on typed receipts, 70% on handwritten

GitHub repo with full code

What I Learned

  1. AI Can’t Handle Edge Cases Alone

    DeepSeek didn’t anticipate thermal receipt fade or handwritten notes. I had to build manual review steps.

  2. Prompt Engineering = Specification Writing

    The more precise my prompt (e.g., "include a 10% confidence threshold for OCR results"), the better the output.

  3. Library Choices Matter

    EasyOCR worked better than pytesseract for my use case, but added 300MB to the deployment size.

The Exact Prompt

Here’s the refined prompt that got me the final working version:

"Build a Python 3 script that:  
1. Uses EasyOCR instead of pytesseract for better receipt readability  
2. Watches a folder for new JPG/PNG files with watchdog  
3. Extracts:  
   - Date (must be valid, skip if invalid)  
   - Amount (must match regex ^\$?\d+\.\d{2}$)  
   - Vendor (fuzzy match against a hardcoded list)  
4. Categorizes expenses using these rules:  
   - 'Travel' if vendor is Uber/Lyft or contains 'taxi'  
   - 'Meals' if vendor is a restaurant  
   - 'Office' if vendor is Amazon/Staples  
5. Outputs to CSV with error logging  
6. Skips images where confidence is below 70%  

Include a function to manually review uncertain entries before CSV export."  
Enter fullscreen mode Exit fullscreen mode

FAQ

Q: Can this handle multi-currency receipts?

A: Not yet. The current regex only catches USD ($). You’d need to modify the amount detection pattern.

Q: What about receipts in other languages?

A: EasyOCR supports 80+ languages. Change the Reader initialization to ['en','es','fr'] as needed.

Q: Is this GDPR compliant?

A: No-the script stores raw receipt images alongside processed data. You’d need to add auto-deletion of images after processing.

Q: Can I run this on my phone?

A: Only via Termux (Linux environment for Android). For iOS, you’d need a cloud trigger like AWS Lambda.

Q: How much did this cost to build?

A: $0 in tools (all open-source), but 3 hours of prompt tuning and debugging.

What expense workflow would you automate with this approach? Drop a comment with your nightmare process!


Internal Links:

Related Guides

  • — The short answer is you can use a Python script with DeepSeek prompts to automatically pull grade da
  • — An IT Ops Lead used DeepSeek to build a Python script tracking AI API costs across providers. The AI
  • — A DeepSeek-generated AWS cleanup script identified 3,000 resources as deletable. The AI hallucinated

Related Guides

Top comments (0)