Built a script to categorize expenses automatically. Saved 3 hours/month.
Spent every Sunday sorting bank transactions into categories for my freelance accounting. Business meals, software subscriptions, travel, office supplies. Copying stuff from my bank CSV into a spreadsheet. After 6 months of this I finally snapped and wrote a Python script.
Before (the painful way)
Every week I'd download my bank CSV export. Then open it and categorize each transaction myself:
- Transaction at "Starbucks" → Business meal
- "AWS Invoice" → Software/tools
- "United Airlines" → Travel
- "Office Depot" → Office supplies
For maybe 40 to 60 transactions per week this took about 45 minutes. Hated it.
The script
Basic Python that reads the bank CSV and categorizes based on keywords. Nothing fancy.
import pandas as pd
# Category rules (keyword matching)
categories = {
'Software': ['aws', 'github', 'digitalocean', 'heroku', 'stripe'],
'Business Meals': ['starbucks', 'chipotle', 'subway', 'restaurant'],
'Travel': ['united', 'delta', 'airbnb', 'uber', 'lyft'],
'Office': ['staples', 'office depot', 'amazon'],
}
def categorize_transaction(description):
desc_lower = description.lower()
for category, keywords in categories.items():
if any(keyword in desc_lower for keyword in keywords):
return category
return 'Other' # Default
# Read bank CSV
df = pd.read_csv('bank_export.csv')
df['Category'] = df['Description'].apply(categorize_transaction)
# Save categorized output
df.to_csv('categorized_expenses.csv', index=False)
print(f"Categorized {len(df)} transactions")
How I use it now
- Download CSV from bank (once a week)
- Run script:
python categorize.py - Get categorized CSV in 2 seconds
- Import to Google Sheets for final review
Script handles about 85% of transactions automatically. The other 15% are random purchases I still do myself, but way better than doing everything.
What could be better
Super basic. No machine learning, no fancy NLP. Just keyword matching. Gets confused sometimes:
- "Amazon" could be office supplies OR personal shopping
- "Uber" to client meeting is business, Uber to gym is personal
Thought about adding ML but honestly the keyword approach works fine for my use case. If I spent more time training a model than I save, what's the point.
Time saved
Before: 45 min/week × 4 weeks = 3 hours/month
After: 10 min/week × 4 weeks = 40 min/month
Saved roughly 2h 20min/month. Worth the 2 hours I spent building it.
The keyword dict is easy to customize too. Just add your recurring vendors and categories. Put it on GitHub if anyone wants it.
Top comments (0)