Edwards Tech Innovations

Posted on Apr 12

Bank Reconciliation in Python: Building a Plaid Integration from Scratch

#python #plaid #fintech #tutorial

Bank Reconciliation in Python: Building a Plaid Integration from Scratch

If you're building a fintech product, a business dashboard, or any application that needs to pull bank data, you're going to need Plaid. But Plaid's API surface is large, and the official SDK has a lot of ceremony around request/response objects.

In this tutorial, I'll walk through a complete Plaid integration module -- 337 lines of Python -- that handles the full lifecycle: link tokens, account access, transaction sync, reconciliation, and spending categorization.

Prerequisites

You will need:

A Plaid developer account (sandbox is free)
Python 3.9+
The plaid-python SDK

pip install plaid-python python-dotenv

Set up your .env:

PLAID_CLIENT_ID=your_client_id
PLAID_SECRET=your_sandbox_secret
PLAID_ENV=sandbox

Step 1: Initialize the Client

The first design decision is handling the case where Plaid credentials aren't available. In a multi-integration system, not every deployment will have every service configured. We use a try/except on import and an initialized flag.

import os
from typing import Dict, Any, List
from datetime import datetime, timedelta

try:
    import plaid
    from plaid.api import plaid_api
    from plaid.model.products import Products
    from plaid.model.country_code import CountryCode
    from plaid.model.link_token_create_request import LinkTokenCreateRequest
    from plaid.model.link_token_create_request_user import LinkTokenCreateRequestUser
    from plaid.model.item_public_token_exchange_request import ItemPublicTokenExchangeRequest
    from plaid.model.transactions_get_request import TransactionsGetRequest
    from plaid.model.transactions_get_request_options import TransactionsGetRequestOptions
    from plaid.model.accounts_get_request import AccountsGetRequest
    from plaid.model.accounts_balance_get_request import AccountsBalanceGetRequest
    PLAID_AVAILABLE = True
except ImportError:
    PLAID_AVAILABLE = False


class PlaidIntegration:
    def __init__(self):
        self.client_id = os.getenv("PLAID_CLIENT_ID")
        self.secret = os.getenv("PLAID_SECRET")
        self.env = os.getenv("PLAID_ENV", "sandbox")
        self.initialized = bool(self.client_id and self.secret and PLAID_AVAILABLE)

        if self.initialized:
            if self.env == "production":
                host = plaid.Environment.Production
            elif self.env == "development":
                host = plaid.Environment.Development
            else:
                host = plaid.Environment.Sandbox

            configuration = plaid.Configuration(
                host=host,
                api_key={
                    "clientId": self.client_id,
                    "secret": self.secret
                }
            )
            api_client = plaid.ApiClient(configuration)
            self.client = plaid_api.PlaidApi(api_client)

The environment toggle (sandbox / development / production) is read from an env var with a safe default. In production, you never want to accidentally hit sandbox.

Step 2: Link Tokens -- The Entry Point

Plaid Link is a drop-in frontend component that lets users securely connect their bank accounts. Your backend creates a link token, your frontend uses it to open the Link flow, and then you exchange the resulting public token for a persistent access token.

def create_link_token(self, user_id: str, products: List[str] = None) -> Dict[str, Any]:
    """Create a link token for Plaid Link"""
    try:
        product_list = [Products(p) for p in (products or ["transactions"])]

        request = LinkTokenCreateRequest(
            products=product_list,
            client_name="Your App Name",
            country_codes=[CountryCode("US")],
            language="en",
            user=LinkTokenCreateRequestUser(client_user_id=user_id)
        )

        response = self.client.link_token_create(request)
        return {
            "success": True,
            "link_token": response.link_token,
            "expiration": response.expiration.isoformat()
        }
    except Exception as e:
        return {"success": False, "error": str(e)}

The products parameter controls what data you can access. Common options:

"transactions" -- account balances and transaction history
"auth" -- account and routing numbers (for ACH)
"identity" -- account holder name, address, email

You can request multiple products in one link session.

Step 3: Token Exchange

When the user completes the Link flow, Plaid gives your frontend a public_token. This is temporary. You exchange it for a permanent access_token that you store server-side.

def exchange_public_token(self, public_token: str) -> Dict[str, Any]:
    """Exchange public token for access token"""
    try:
        request = ItemPublicTokenExchangeRequest(public_token=public_token)
        response = self.client.item_public_token_exchange(request)
        return {
            "success": True,
            "access_token": response.access_token,
            "item_id": response.item_id
        }
    except Exception as e:
        return {"success": False, "error": str(e)}

Important: The access_token is a credential. Store it encrypted. Never log it. Never send it to the frontend.

Step 4: Pulling Transactions

This is where it gets useful. Plaid's transaction endpoint supports date ranges, pagination, and account filtering.

def get_transactions(self, access_token: str, start_date: str = None,
                    end_date: str = None, account_ids: List[str] = None,
                    count: int = 100, offset: int = 0) -> Dict[str, Any]:
    """Get transactions for date range"""
    try:
        if not start_date:
            start_date = (datetime.now() - timedelta(days=30)).strftime("%Y-%m-%d")
        if not end_date:
            end_date = datetime.now().strftime("%Y-%m-%d")

        options = TransactionsGetRequestOptions(count=count, offset=offset)
        if account_ids:
            options.account_ids = account_ids

        request = TransactionsGetRequest(
            access_token=access_token,
            start_date=datetime.strptime(start_date, "%Y-%m-%d").date(),
            end_date=datetime.strptime(end_date, "%Y-%m-%d").date(),
            options=options
        )

        response = self.client.transactions_get(request)

        return {
            "success": True,
            "total_transactions": response.total_transactions,
            "transactions": [{
                "transaction_id": t.transaction_id,
                "account_id": t.account_id,
                "date": t.date.isoformat(),
                "name": t.name,
                "merchant_name": t.merchant_name,
                "amount": t.amount,
                "currency": t.iso_currency_code,
                "category": t.category,
                "pending": t.pending,
                "payment_channel": t.payment_channel.value if t.payment_channel else None
            } for t in response.transactions]
        }
    except Exception as e:
        return {"success": False, "error": str(e)}

Plaid returns up to 500 transactions per request. For accounts with heavy activity, you need to paginate:

def get_all_transactions(self, access_token: str, start_date: str = None,
                        end_date: str = None) -> List[Dict]:
    """Get all transactions (handles pagination)"""
    all_transactions = []
    offset = 0
    count = 500

    while True:
        result = self.get_transactions(
            access_token, start_date, end_date,
            count=count, offset=offset
        )
        if not result.get("success"):
            break

        transactions = result.get("transactions", [])
        all_transactions.extend(transactions)

        if len(transactions) < count:
            break
        offset += count

    return all_transactions

Step 5: Bank Reconciliation

Reconciliation is matching your internal records against what the bank actually shows. This is where business applications diverge from simple "show me my transactions" dashboards.

The approach here is straightforward: match by amount (within a penny tolerance), then report what matched, what's in the bank but not in your records, and what's in your records but not in the bank.

def reconcile_transactions(self, access_token: str, expected_transactions: List[Dict],
                          start_date: str = None, end_date: str = None) -> Dict[str, Any]:
    """Reconcile bank transactions against expected transactions"""
    try:
        bank_txns = self.get_all_transactions(access_token, start_date, end_date)

        matched = []
        unmatched_bank = []
        unmatched_expected = list(expected_transactions)

        for bank_txn in bank_txns:
            found_match = False
            for i, exp_txn in enumerate(unmatched_expected):
                if abs(bank_txn["amount"] - exp_txn.get("amount", 0)) < 0.01:
                    matched.append({
                        "bank_transaction": bank_txn,
                        "expected_transaction": exp_txn,
                        "status": "matched"
                    })
                    unmatched_expected.pop(i)
                    found_match = True
                    break

            if not found_match:
                unmatched_bank.append(bank_txn)

        return {
            "success": True,
            "matched_count": len(matched),
            "unmatched_bank_count": len(unmatched_bank),
            "unmatched_expected_count": len(unmatched_expected),
            "matched": matched,
            "unmatched_bank": unmatched_bank,
            "unmatched_expected": unmatched_expected,
            "reconciliation_status": "complete" if not unmatched_bank and not unmatched_expected else "needs_review"
        }
    except Exception as e:
        return {"success": False, "error": str(e)}

The reconciliation_status field gives you a quick check: "complete" means everything lines up, "needs_review" means a human should look at the discrepancies.

Step 6: Spending Categorization

Plaid provides category data with transactions, but the raw categories are nested lists. This method rolls them up into a summary with totals:

def categorize_transactions(self, transactions: List[Dict]) -> Dict[str, Any]:
    """Categorize transactions by type"""
    categories = {}
    for txn in transactions:
        cat = txn.get("category", ["Uncategorized"])[0] if txn.get("category") else "Uncategorized"
        if cat not in categories:
            categories[cat] = {"count": 0, "total": 0, "transactions": []}
        categories[cat]["count"] += 1
        categories[cat]["total"] += txn.get("amount", 0)
        categories[cat]["transactions"].append(txn)

    return {
        "success": True,
        "categories": categories,
        "summary": {cat: {"count": data["count"], "total": data["total"]}
                   for cat, data in categories.items()}
    }

And for a full spending report with top merchants:

def get_spending_summary(self, access_token: str, start_date: str = None,
                        end_date: str = None) -> Dict[str, Any]:
    """Get spending summary by category"""
    transactions = self.get_all_transactions(access_token, start_date, end_date)

    # In Plaid, positive amounts are debits (money out)
    expenses = [t for t in transactions if t["amount"] > 0]
    income = [t for t in transactions if t["amount"] < 0]

    categorized = self.categorize_transactions(expenses)

    return {
        "success": True,
        "period": {"start": start_date, "end": end_date},
        "total_expenses": sum(t["amount"] for t in expenses),
        "total_income": abs(sum(t["amount"] for t in income)),
        "transaction_count": len(transactions),
        "expense_categories": categorized.get("summary", {}),
        "top_merchants": self._get_top_merchants(expenses, limit=10)
    }

One thing that catches people: Plaid uses positive numbers for debits and negative numbers for credits. This is the opposite of what most people expect. The code above handles this correctly by filtering on sign.

Putting It Together

With the singleton pattern, using this in your application is straightforward:

from dotenv import load_dotenv
load_dotenv()

from eti_integrations import get_plaid

plaid = get_plaid()

# Check if Plaid is configured
if not plaid.initialized:
    print("Set PLAID_CLIENT_ID and PLAID_SECRET in .env")
else:
    # Create a link token for your frontend
    link = plaid.create_link_token(user_id="user_123")
    print(f"Link token: {link['link_token']}")

    # After the user completes Link and you have an access token:
    summary = plaid.get_spending_summary(access_token, "2025-01-01", "2025-03-31")
    print(f"Total expenses: ${summary['total_expenses']:.2f}")
    print(f"Total income: ${summary['total_income']:.2f}")
    for category, data in summary["expense_categories"].items():
        print(f"  {category}: ${data['total']:.2f} ({data['count']} transactions)")

What I'd Build Next

This module covers the core Plaid use cases. If I were extending it, I'd add:

Plaid Investments for brokerage account data
Plaid Liabilities for credit card and loan balances
Webhook handling for real-time transaction updates instead of polling
Caching for balance requests that don't need real-time data

This Plaid module is one of 9 production Python modules in the AI Integration Pack. The other 8 cover Stripe, PayPal, Twilio, Airtable, ClickUp, Outlook, and e-commerce platforms (Gumroad, LemonSqueezy, Paddle). 4,488 lines of code total, 152 tests, security-audited.

If you're interested: AI Integration Pack on Gumroad -- $49 for the full pack.

Top comments (1)

John Frandsen • Jul 9

Nice writeup — the sign-convention handling (positive debits / negative credits) is the exact gotcha that bites people first, and you flagged it cleanly. Two things worth adding for anyone taking this toward real reconciliation:

Transactions Sync over transactions_get. Sync is cursor-based and is now Plaid's recommended path for ongoing history — you store a cursor, poll for additions/removes, and never re-pull the full window. transactions_get still works but is being phased out and forces you to re-fetch ranges. For a reconciliation loop that runs daily, Sync is the difference between a cheap delta pull and re-downloading months of history every run.
True reconciliation vs. categorization. The categorization step here is spending analytics, not double-entry reconciliation. Actual bank-to-ledger reconciliation needs fuzzy matching: amount equality + date proximity (bank postings lag 1–3 days) + merchant normalization (raw descriptors like "SQ*JOES COFFEE" need cleaning before they'll match a ledger payee). Worth a two-pass matcher — exact amount+date first, then a fuzzy fallback on normalized merchant + amount — otherwise unmatched rows pile up fast. Idempotency on transaction_id matters too, because Plaid re-delivers updated and occasionally duplicated rows and you don't want to double-count.

On webhooks (your "next" item): TRANSACTIONS_DEFAULT_UPDATE is the one that fires when new transactions land or pending ones post. Hook that to a delta sync and you can stop polling entirely for active users.

Solid foundation — the singleton + initialized-flag pattern makes the multi-integration swap clean.