How Can AI Automate My Expense Reports Without Writing Code?
The short answer is: I used DeepSeek to generate a Python script that extracts data from receipt images, categorizes expenses, and formats them into a CSV report-all triggered by dragging files into a folder. The AI handled 80% of the code, while I focused on structuring the workflow and fixing library conflicts.
The Problem: Expense Reports Were Killing My Productivity
Every Friday, I wasted 2 hours manually:
- Cropping receipt images
- Typing amounts into Excel
- Assigning categories (Travel, Meals, etc.)
- Calculating totals
Our finance team needed reports in a specific CSV format with columns for Date, Vendor, Amount, and Category. The worst part? I’d often transpose numbers (like typing $56.20 as $65.20), requiring rework.
The AI Attempt: A One-Shot Prompt to DeepSeek
I fed DeepSeek this prompt:
Prompt:
"Build a Python script that:
1. Watches a 'receipts' folder for new image files (JPG/PNG)
2. Uses OCR to extract text from each image
3. Identifies the vendor name, date, and total amount
4. Categorizes expenses as 'Travel', 'Meals', or 'Office Supplies' based on keywords
5. Appends the data to a CSV with headers: Date,Vendor,Amount,Category
6. Runs automatically when new files appear
Use pytesseract for OCR and watchdog for file monitoring.
Include error handling for bad OCR reads."
First Output:
DeepSeek generated a 47-line script using:
-
pytesseractfor OCR -
watchdogfor folder monitoring - Regex patterns to find amounts/dates
Where It Worked:
- The folder watcher logic was flawless
- CSV generation used proper escaping
Where It Broke: 3 Critical Failures
-
Library Hell
The script required
pytesseract, but failed because I didn’t have Tesseract OCR installed system-wide. Error:
TesseractNotFoundError: tesseract is not installed or it's not in your PATH
- Vendor Detection Was Garbage For a Starbucks receipt, it output:
Vendor: STARBUCKS INC 12345
Date: 2024-02-30 (invalid date)
Amount: $12.SO (misread '5' as 'S')
- False Positives A screenshot of a Slack conversation triggered the script because it contained a dollar sign ($).
What I Had to Fix
1. The OCR Pipeline
I switched to EasyOCR after finding pytesseract struggled with thermal receipts. Required changes:
# Before (pytesseract)
text = pytesseract.image_to_string(image)
# After (EasyOCR)
reader = easyocr.Reader(['en'])
results = reader.readtext(image, detail=0)
2. Date Validation
Added a function to reject impossible dates (like February 30th):
from datetime import datetime
def validate_date(date_str):
try:
datetime.strptime(date_str, '%Y-%m-%d')
return True
except ValueError:
return False
3. Vendor Cleanup
Used a hardcoded vendor list with fuzzy matching:
KNOWN_VENDORS = ['Starbucks', 'Uber', 'Amazon Web Services']
def match_vendor(raw_text):
for vendor in KNOWN_VENDORS:
if vendor.lower() in raw_text.lower():
return vendor
return "Unknown"
The Working Result
Final script features:
-
Drag-and-drop processing: Drop receipts into
~/expense_reports/receipts - Human review: Flags uncertain entries in yellow (requires manual confirmation)
-
Audit trail: Logs all actions to
processing.log
Sample Output CSV:
Date,Vendor,Amount,Category
2024-03-15,Starbucks,12.50,Meals
2024-03-16,Uber,24.30,Travel
Execution Time:
- 0.8 seconds per image (tested on 20 receipts)
- 95% accuracy on typed receipts, 70% on handwritten
What I Learned
AI Can’t Handle Edge Cases Alone
DeepSeek didn’t anticipate thermal receipt fade or handwritten notes. I had to build manual review steps.Prompt Engineering = Specification Writing
The more precise my prompt (e.g., "include a 10% confidence threshold for OCR results"), the better the output.Library Choices Matter
EasyOCR worked better than pytesseract for my use case, but added 300MB to the deployment size.
The Exact Prompt
Here’s the refined prompt that got me the final working version:
"Build a Python 3 script that:
1. Uses EasyOCR instead of pytesseract for better receipt readability
2. Watches a folder for new JPG/PNG files with watchdog
3. Extracts:
- Date (must be valid, skip if invalid)
- Amount (must match regex ^\$?\d+\.\d{2}$)
- Vendor (fuzzy match against a hardcoded list)
4. Categorizes expenses using these rules:
- 'Travel' if vendor is Uber/Lyft or contains 'taxi'
- 'Meals' if vendor is a restaurant
- 'Office' if vendor is Amazon/Staples
5. Outputs to CSV with error logging
6. Skips images where confidence is below 70%
Include a function to manually review uncertain entries before CSV export."
FAQ
Q: Can this handle multi-currency receipts?
A: Not yet. The current regex only catches USD ($). You’d need to modify the amount detection pattern.
Q: What about receipts in other languages?
A: EasyOCR supports 80+ languages. Change the Reader initialization to ['en','es','fr'] as needed.
Q: Is this GDPR compliant?
A: No-the script stores raw receipt images alongside processed data. You’d need to add auto-deletion of images after processing.
Q: Can I run this on my phone?
A: Only via Termux (Linux environment for Android). For iOS, you’d need a cloud trigger like AWS Lambda.
Q: How much did this cost to build?
A: $0 in tools (all open-source), but 3 hours of prompt tuning and debugging.
What expense workflow would you automate with this approach? Drop a comment with your nightmare process!
Internal Links:
- How I Used AI to Automate My Weekly Data Reports
- The No-Code CLI Tool That Saved My Team 20 Hours/Month
Related Guides
- — The short answer is you can use a Python script with DeepSeek prompts to automatically pull grade da
- — An IT Ops Lead used DeepSeek to build a Python script tracking AI API costs across providers. The AI
- — A DeepSeek-generated AWS cleanup script identified 3,000 resources as deletable. The AI hallucinated
Top comments (0)