I Built a Tool That Reads My GCB Bank Statement and Tells Me Where My Money Goes

#python #webdev #machinelearning #django

[FinTrack Ghana cover(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bu0jme7eolvyabtbxz83.png)

Let me paint a picture.

It's the end of the month. You open your GCB banking app. You see a list of transactions — numbers, dates, weird reference codes. You scroll and scroll. You know you spent too much somewhere, but where? On what? The app doesn't tell you. It just shows you the damage.

This is the reality for most people in Ghana. Our bank apps are basically digital receipts. No categories. No charts. No "hey, you spent 40% more on food this month." Nothing.

I got tired of it.

So I decided to build something

I'm a CS student in Accra, and I've been teaching myself Django. I wanted a portfolio project that wasn't just another to-do app or blog — something that solves a problem I actually have. So I asked myself: what if I could upload my bank statement and have a system break down my spending automatically?

That's how FinTrack Ghana was born.

You upload a GCB statement (PDF or CSV), and within seconds the system:

Extracts every transaction
Figures out what each one is (food? transport? utilities?)
Shows you pie charts and spending breakdowns
Asks Google Gemini to give you actual financial advice based on YOUR numbers

Not generic advice. Advice like: "Your essentials — food, transport, utilities — are well managed at GHS 90 total. Consider automating GHS 100 into savings each payday."

The hard part nobody warns you about

Parsing bank statements sounds simple until you actually try it.

GCB gives you a PDF with a table. Sounds easy, right? Just read the table. Except PDF tables aren't really tables — they're text positioned to look like tables. Every bank formats theirs differently. GCB has columns for Date, Debit, Credit, Balance, and Remarks. Ecobank uses completely different headers. MTN MoMo is its own thing entirely.

I ended up building a bank detector — a piece of code that reads the filename and the first few lines of the file to figure out which bank it's from:

class BankDetector:
    BANK_SIGNATURES = {
        'gcb': {
            'keywords': ['gcb', 'ghana commercial bank'],
        },
        'ecobank': {
            'keywords': ['ecobank'],
        },
        'mtn_momo': {
            'keywords': ['mtn', 'mobile money', 'momo'],
        },
    }

    @classmethod
    def detect(cls, filename, content_preview=''):
        combined = (filename + ' ' + content_preview).lower()
        for bank, sig in cls.BANK_SIGNATURES.items():
            if any(kw in combined for kw in sig['keywords']):
                return bank
        return 'unknown'

Simple? Yes. But it works. And sometimes simple is exactly what you need.

For the actual parsing, I used pdfplumber for PDFs and Python's built-in csv module for CSV files. The key insight was handling the Debit/Credit split — GCB uses separate columns, so if Debit has a value, money went out. If Credit has a value, money came in.

Making sense of "ECG PREPAID PURCHASE Ref 078445566778"

Once you have the transactions, you need to categorize them. Nobody wants to see raw bank descriptions. They want to know: was this food? transport? a bill?

I built a two-layer system:

First layer: keywords. If the description contains "kfc" or "shoprite", it's food. If it says "uber" or "goil", it's transport. This catches about 80% of transactions instantly with 100% accuracy.

RULES = {
    'food': ['shoprite', 'kfc', 'papaye', 'chop bar', 'restaurant'],
    'transport': ['uber', 'bolt', 'trotro', 'goil', 'fuel'],
    'utilities': ['ecg', 'ghana water', 'mtn bill', 'vodafone'],
}

Second layer: spaCy NLP. For transactions that keywords don't catch — like "SUSU COLLECTOR PAYMENT" — I trained a text classifier on 190 labeled examples. All with Ghanaian context. Makola Market purchases. Trotro fares. ECG prepaid top-ups.

The result? The system correctly categorizes about 90% of transactions automatically. The rest get labeled as "other", and users can manually correct them.

The part that made me smile

When I plugged in Google Gemini and fed it a real month's spending data, the response genuinely surprised me. It opened with "Akwaaba!" — the Twi word for welcome — and then gave specific, thoughtful advice based on my actual numbers.

Not "save more money." Not "reduce your spending." Actual advice like:

Your essential expenses for food (GHS 30), transport (GHS 15), and utilities (GHS 45) are impressively low. Consider automating a transfer of GHS 100-150 into a dedicated savings account each payday.

That's when I knew this project was worth building.

What's under the hood

For the curious, here's what powers everything:

Backend: Python + Django + Django REST Framework
Auth: JWT tokens (register, login, password reset)
File parsing: pdfplumber for PDFs, csv module for CSVs
NLP: spaCy text classifier + keyword matching
AI insights: Google Gemini 2.5 Flash
Database: SQLite for dev, PostgreSQL for production
Tests: 29 unit tests covering parsers and categorization

The whole thing is API-first. I built and tested every endpoint with curl before writing a single line of frontend code. That approach saved me a lot of headaches.

What I learned (the honest version)

PDF parsing will humble you. I thought it'd take a day. It took much longer. Every bank structures their PDFs differently, and "extracting a table" from a PDF is way harder than it sounds.

Start with the dumb solution. My keyword rules are literally just string matching. No fancy algorithms. But they handle 80% of the work. I only needed spaCy for the edge cases. Don't over-engineer early.

Build for a niche nobody's serving. There are thousands of finance dashboards on GitHub. But how many of them can read a GCB statement? Or know what "trotro" means? That regional specificity is what makes this project different from everything else out there.

Test with real data early. I used my actual bank statement to test the parser. Seeing my own transactions appear — correctly parsed, correctly categorized — was the most satisfying moment in the whole project.

What's next

I'm currently building the React frontend — charts, drag-and-drop upload, budget tracking with progress bars. After that, deployment to Railway and Vercel.

The full code is open source: github.com/aimlin9/finance-dashboard

If you're building something for your region — whether it's Ghanaian banks, Nigerian fintech, Kenyan mobile money, or anything else — lean into that specificity. It's not a limitation. It's your edge.

I'm Ramsey, a CS student in Accra building portfolio projects and looking for remote internship opportunities. Find me on GitHub.