Alan West

Posted on Apr 6

How to Detect Subscription Creep by Parsing Your Bank Statements with Python

#python #automation #fintech #beginners

Ever look at your bank account and wonder where all the money went? Yeah, same. I recently sat down to audit my recurring charges and realized I was bleeding cash on subscriptions I'd completely forgotten about. Streaming services, audiobook platforms, cloud storage — the works.

The annoying part? There's no single dashboard that shows you every recurring charge across all your accounts. So I did what any reasonable developer would do: I wrote a script.

The Problem: Death by a Thousand Subscriptions

Subscription creep is real. You sign up for a free trial, forget to cancel, and suddenly you're paying $14.99/month for something you used once in February. Multiply that by a dozen services and you're looking at a couple hundred bucks a month you didn't even notice.

The root cause is simple: recurring charges are designed to be invisible. They're small enough that you don't notice them individually, and they blend into your statement noise. Your bank won't flag them because they're authorized transactions.

So let's fix that.

Step 1: Export Your Statements as CSV

Most banks let you download transaction history as CSV. Grab at least 3 months of data — that gives us enough to detect monthly patterns. Save it somewhere sensible.

Most bank CSVs look roughly like this:

Date,Description,Amount,Category
2026-01-15,SOME STREAMING SVC,-15.99,Entertainment
2026-01-15,GROCERY STORE,-47.23,Groceries
2026-02-15,SOME STREAMING SVC,-15.99,Entertainment
2026-03-15,SOME STREAMING SVC,-15.99,Entertainment

The key insight: subscriptions repeat with the same description and roughly the same amount at regular intervals.

Step 2: Parse and Normalize Transactions

Bank CSVs are messy. Descriptions have random reference numbers, amounts might be formatted differently, dates might be in weird formats. Let's clean that up.

import csv
from datetime import datetime
from collections import defaultdict
import re

def load_transactions(csv_path):
    transactions = []
    with open(csv_path, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            amount = float(row['Amount'].replace('$', '').replace(',', ''))
            # Only care about charges, not deposits
            if amount >= 0:
                continue
            transactions.append({
                'date': datetime.strptime(row['Date'], '%Y-%m-%d'),
                'description': normalize_description(row['Description']),
                'amount': abs(amount),
            })
    return transactions

def normalize_description(desc):
    # Strip trailing reference numbers and extra whitespace
    desc = re.sub(r'\s*#[A-Z0-9]+$', '', desc)
    desc = re.sub(r'\s*\d{6,}$', '', desc)  # remove long trailing numbers
    return desc.strip().upper()

That normalize_description function is doing the heavy lifting here. Bank descriptions are full of garbage like transaction IDs and reference codes that change every month. Stripping those out lets us group transactions from the same merchant together.

Step 3: Detect Recurring Patterns

Now the interesting part. We need to find transactions that repeat at roughly monthly intervals with roughly the same amount. I say "roughly" because subscription charges don't always hit on the exact same day — weekends, bank processing times, and billing cycles all cause slight variations.

def find_subscriptions(transactions, tolerance_days=5, tolerance_amount=1.00):
    # Group by normalized description
    grouped = defaultdict(list)
    for t in transactions:
        grouped[t['description']].append(t)

    subscriptions = []

    for description, charges in grouped.items():
        if len(charges) < 2:
            continue  # need at least 2 occurrences to detect a pattern

        # Sort by date
        charges.sort(key=lambda x: x['date'])

        # Check if amounts are consistent
        amounts = [c['amount'] for c in charges]
        avg_amount = sum(amounts) / len(amounts)
        if max(amounts) - min(amounts) > tolerance_amount:
            continue  # amount varies too much, probably not a subscription

        # Check if intervals are roughly monthly (25-35 days)
        intervals = []
        for i in range(1, len(charges)):
            delta = (charges[i]['date'] - charges[i-1]['date']).days
            intervals.append(delta)

        avg_interval = sum(intervals) / len(intervals)

        # Monthly: ~30 days, Annual: ~365 days, Weekly: ~7 days
        is_monthly = 25 <= avg_interval <= 35
        is_annual = 350 <= avg_interval <= 380
        is_weekly = 5 <= avg_interval <= 9

        if is_monthly or is_annual or is_weekly:
            frequency = 'monthly' if is_monthly else ('annual' if is_annual else 'weekly')
            subscriptions.append({
                'description': description,
                'amount': round(avg_amount, 2),
                'frequency': frequency,
                'occurrences': len(charges),
                'last_charged': charges[-1]['date'],
                'monthly_cost': round(
                    avg_amount if is_monthly
                    else avg_amount / 12 if is_annual
                    else avg_amount * 4.33,  # avg weeks per month
                    2
                ),
            })

    # Sort by monthly cost, highest first
    subscriptions.sort(key=lambda x: x['monthly_cost'], reverse=True)
    return subscriptions

The tolerance parameters are important. I started with exact matching and missed half my subscriptions because one service charged $9.99 one month and $10.49 another (probably a tax adjustment). A dollar of tolerance catches those edge cases.

Step 4: Generate the Report

Let's make the output actually useful:

def print_report(subscriptions):
    total_monthly = sum(s['monthly_cost'] for s in subscriptions)

    print(f"\n{'='*60}")
    print(f"  SUBSCRIPTION AUDIT — {len(subscriptions)} recurring charges found")
    print(f"{'='*60}\n")

    for s in subscriptions:
        freq_label = f"${s['amount']}/{s['frequency']}"
        print(f"  {s['description']}")
        print(f"    {freq_label} (${s['monthly_cost']}/mo effective)")
        print(f"    Last charged: {s['last_charged'].strftime('%Y-%m-%d')}")
        print(f"    Seen {s['occurrences']} times\n")

    print(f"{'='*60}")
    print(f"  TOTAL MONTHLY COST: ${total_monthly:.2f}")
    print(f"  TOTAL ANNUAL COST:  ${total_monthly * 12:.2f}")
    print(f"{'='*60}\n")

# Run it
transactions = load_transactions('statements.csv')
subscriptions = find_subscriptions(transactions)
print_report(subscriptions)

When I first ran this on my own data, the annual cost number genuinely startled me. Seeing $2,847.00 in big letters at the bottom hits different than watching $15 leave your account every month.

Making It Smarter

The basic version works, but here are a few improvements I've added over time:

Fuzzy description matching: Some merchants change their billing descriptor slightly between months. I added Levenshtein distance comparison (the python-Levenshtein package works well) to catch these. If two descriptions are 85%+ similar and the amounts match, they're probably the same thing.
Multi-account support: Just concat CSVs from multiple banks. Add a source column so you know which card is getting charged.
Delta tracking: Save previous reports as JSON and diff them. This way you get alerts like "NEW: $12.99/mo charge appeared" or "GONE: that $9.99 charge you had — looks like it stopped."

Prevention: Stop Future Creep

Detection is great, but prevention is better. A few things that actually work:

Run this script monthly via cron. Automate the boring stuff. If your bank supports CSV export via API (some do through open banking APIs), you can make this fully hands-off.
Use a dedicated card for subscriptions. One card, nothing else on it. Makes auditing trivial — every charge on that card is a subscription by definition.
Calendar reminders for free trials. I know, it's low-tech. But it works better than any app I've tried. Sign up for a trial, immediately set a reminder for two days before it ends.
Check your email for receipts. Most services send monthly receipts. A quick search for "receipt" or "invoice" in your email can catch things the bank statement misses (like charges bundled under a parent company name).

The Bigger Lesson

The reason I built this instead of using an existing budgeting app is control. I don't want to hand my bank credentials to a third-party service just to see my own spending patterns. A 100-line Python script that processes a CSV I downloaded myself does the job without any privacy tradeoffs.

Sometimes the best tool is the one you build yourself — not because it's better, but because you actually understand what it's doing with your data.

The full script is pretty easy to extend. Add Slack notifications, pipe it into a spreadsheet, build a web dashboard with Flask — whatever fits your workflow. The hard part was identifying the pattern detection logic, and now you've got that.

Go run it on your statements. The number at the bottom might surprise you.

DEV Community