Ever look at your bank account and wonder where all the money went? Yeah, same. I recently sat down to audit my recurring charges and realized I was bleeding cash on subscriptions I'd completely forgotten about. Streaming services, audiobook platforms, cloud storage — the works.
The annoying part? There's no single dashboard that shows you every recurring charge across all your accounts. So I did what any reasonable developer would do: I wrote a script.
The Problem: Death by a Thousand Subscriptions
Subscription creep is real. You sign up for a free trial, forget to cancel, and suddenly you're paying $14.99/month for something you used once in February. Multiply that by a dozen services and you're looking at a couple hundred bucks a month you didn't even notice.
The root cause is simple: recurring charges are designed to be invisible. They're small enough that you don't notice them individually, and they blend into your statement noise. Your bank won't flag them because they're authorized transactions.
So let's fix that.
Step 1: Export Your Statements as CSV
Most banks let you download transaction history as CSV. Grab at least 3 months of data — that gives us enough to detect monthly patterns. Save it somewhere sensible.
Most bank CSVs look roughly like this:
Date,Description,Amount,Category
2026-01-15,SOME STREAMING SVC,-15.99,Entertainment
2026-01-15,GROCERY STORE,-47.23,Groceries
2026-02-15,SOME STREAMING SVC,-15.99,Entertainment
2026-03-15,SOME STREAMING SVC,-15.99,Entertainment
The key insight: subscriptions repeat with the same description and roughly the same amount at regular intervals.
Step 2: Parse and Normalize Transactions
Bank CSVs are messy. Descriptions have random reference numbers, amounts might be formatted differently, dates might be in weird formats. Let's clean that up.
import csv
from datetime import datetime
from collections import defaultdict
import re
def load_transactions(csv_path):
transactions = []
with open(csv_path, 'r') as f:
reader = csv.DictReader(f)
for row in reader:
amount = float(row['Amount'].replace('$', '').replace(',', ''))
# Only care about charges, not deposits
if amount >= 0:
continue
transactions.append({
'date': datetime.strptime(row['Date'], '%Y-%m-%d'),
'description': normalize_description(row['Description']),
'amount': abs(amount),
})
return transactions
def normalize_description(desc):
# Strip trailing reference numbers and extra whitespace
desc = re.sub(r'\s*#[A-Z0-9]+$', '', desc)
desc = re.sub(r'\s*\d{6,}$', '', desc) # remove long trailing numbers
return desc.strip().upper()
That normalize_description function is doing the heavy lifting here. Bank descriptions are full of garbage like transaction IDs and reference codes that change every month. Stripping those out lets us group transactions from the same merchant together.
Step 3: Detect Recurring Patterns
Now the interesting part. We need to find transactions that repeat at roughly monthly intervals with roughly the same amount. I say "roughly" because subscription charges don't always hit on the exact same day — weekends, bank processing times, and billing cycles all cause slight variations.
def find_subscriptions(transactions, tolerance_days=5, tolerance_amount=1.00):
# Group by normalized description
grouped = defaultdict(list)
for t in transactions:
grouped[t['description']].append(t)
subscriptions = []
for description, charges in grouped.items():
if len(charges) < 2:
continue # need at least 2 occurrences to detect a pattern
# Sort by date
charges.sort(key=lambda x: x['date'])
# Check if amounts are consistent
amounts = [c['amount'] for c in charges]
avg_amount = sum(amounts) / len(amounts)
if max(amounts) - min(amounts) > tolerance_amount:
continue # amount varies too much, probably not a subscription
# Check if intervals are roughly monthly (25-35 days)
intervals = []
for i in range(1, len(charges)):
delta = (charges[i]['date'] - charges[i-1]['date']).days
intervals.append(delta)
avg_interval = sum(intervals) / len(intervals)
# Monthly: ~30 days, Annual: ~365 days, Weekly: ~7 days
is_monthly = 25 <= avg_interval <= 35
is_annual = 350 <= avg_interval <= 380
is_weekly = 5 <= avg_interval <= 9
if is_monthly or is_annual or is_weekly:
frequency = 'monthly' if is_monthly else ('annual' if is_annual else 'weekly')
subscriptions.append({
'description': description,
'amount': round(avg_amount, 2),
'frequency': frequency,
'occurrences': len(charges),
'last_charged': charges[-1]['date'],
'monthly_cost': round(
avg_amount if is_monthly
else avg_amount / 12 if is_annual
else avg_amount * 4.33, # avg weeks per month
2
),
})
# Sort by monthly cost, highest first
subscriptions.sort(key=lambda x: x['monthly_cost'], reverse=True)
return subscriptions
The tolerance parameters are important. I started with exact matching and missed half my subscriptions because one service charged $9.99 one month and $10.49 another (probably a tax adjustment). A dollar of tolerance catches those edge cases.
Step 4: Generate the Report
Let's make the output actually useful:
def print_report(subscriptions):
total_monthly = sum(s['monthly_cost'] for s in subscriptions)
print(f"\n{'='*60}")
print(f" SUBSCRIPTION AUDIT — {len(subscriptions)} recurring charges found")
print(f"{'='*60}\n")
for s in subscriptions:
freq_label = f"${s['amount']}/{s['frequency']}"
print(f" {s['description']}")
print(f" {freq_label} (${s['monthly_cost']}/mo effective)")
print(f" Last charged: {s['last_charged'].strftime('%Y-%m-%d')}")
print(f" Seen {s['occurrences']} times\n")
print(f"{'='*60}")
print(f" TOTAL MONTHLY COST: ${total_monthly:.2f}")
print(f" TOTAL ANNUAL COST: ${total_monthly * 12:.2f}")
print(f"{'='*60}\n")
# Run it
transactions = load_transactions('statements.csv')
subscriptions = find_subscriptions(transactions)
print_report(subscriptions)
When I first ran this on my own data, the annual cost number genuinely startled me. Seeing $2,847.00 in big letters at the bottom hits different than watching $15 leave your account every month.
Making It Smarter
The basic version works, but here are a few improvements I've added over time:
Fuzzy description matching: Some merchants change their billing descriptor slightly between months. I added Levenshtein distance comparison (the
python-Levenshteinpackage works well) to catch these. If two descriptions are 85%+ similar and the amounts match, they're probably the same thing.Multi-account support: Just concat CSVs from multiple banks. Add a
sourcecolumn so you know which card is getting charged.Delta tracking: Save previous reports as JSON and diff them. This way you get alerts like "NEW: $12.99/mo charge appeared" or "GONE: that $9.99 charge you had — looks like it stopped."
Prevention: Stop Future Creep
Detection is great, but prevention is better. A few things that actually work:
Run this script monthly via cron. Automate the boring stuff. If your bank supports CSV export via API (some do through open banking APIs), you can make this fully hands-off.
Use a dedicated card for subscriptions. One card, nothing else on it. Makes auditing trivial — every charge on that card is a subscription by definition.
Calendar reminders for free trials. I know, it's low-tech. But it works better than any app I've tried. Sign up for a trial, immediately set a reminder for two days before it ends.
Check your email for receipts. Most services send monthly receipts. A quick search for "receipt" or "invoice" in your email can catch things the bank statement misses (like charges bundled under a parent company name).
The Bigger Lesson
The reason I built this instead of using an existing budgeting app is control. I don't want to hand my bank credentials to a third-party service just to see my own spending patterns. A 100-line Python script that processes a CSV I downloaded myself does the job without any privacy tradeoffs.
Sometimes the best tool is the one you build yourself — not because it's better, but because you actually understand what it's doing with your data.
The full script is pretty easy to extend. Add Slack notifications, pipe it into a spreadsheet, build a web dashboard with Flask — whatever fits your workflow. The hard part was identifying the pattern detection logic, and now you've got that.
Go run it on your statements. The number at the bottom might surprise you.
Top comments (0)