Bank Reconciliation in Python: Building a Plaid Integration from Scratch
If you're building a fintech product, a business dashboard, or any application that needs to pull bank data, you're going to need Plaid. But Plaid's API surface is large, and the official SDK has a lot of ceremony around request/response objects.
In this tutorial, I'll walk through a complete Plaid integration module -- 337 lines of Python -- that handles the full lifecycle: link tokens, account access, transaction sync, reconciliation, and spending categorization.
Prerequisites
You will need:
- A Plaid developer account (sandbox is free)
- Python 3.9+
- The
plaid-pythonSDK
pip install plaid-python python-dotenv
Set up your .env:
PLAID_CLIENT_ID=your_client_id
PLAID_SECRET=your_sandbox_secret
PLAID_ENV=sandbox
Step 1: Initialize the Client
The first design decision is handling the case where Plaid credentials aren't available. In a multi-integration system, not every deployment will have every service configured. We use a try/except on import and an initialized flag.
import os
from typing import Dict, Any, List
from datetime import datetime, timedelta
try:
import plaid
from plaid.api import plaid_api
from plaid.model.products import Products
from plaid.model.country_code import CountryCode
from plaid.model.link_token_create_request import LinkTokenCreateRequest
from plaid.model.link_token_create_request_user import LinkTokenCreateRequestUser
from plaid.model.item_public_token_exchange_request import ItemPublicTokenExchangeRequest
from plaid.model.transactions_get_request import TransactionsGetRequest
from plaid.model.transactions_get_request_options import TransactionsGetRequestOptions
from plaid.model.accounts_get_request import AccountsGetRequest
from plaid.model.accounts_balance_get_request import AccountsBalanceGetRequest
PLAID_AVAILABLE = True
except ImportError:
PLAID_AVAILABLE = False
class PlaidIntegration:
def __init__(self):
self.client_id = os.getenv("PLAID_CLIENT_ID")
self.secret = os.getenv("PLAID_SECRET")
self.env = os.getenv("PLAID_ENV", "sandbox")
self.initialized = bool(self.client_id and self.secret and PLAID_AVAILABLE)
if self.initialized:
if self.env == "production":
host = plaid.Environment.Production
elif self.env == "development":
host = plaid.Environment.Development
else:
host = plaid.Environment.Sandbox
configuration = plaid.Configuration(
host=host,
api_key={
"clientId": self.client_id,
"secret": self.secret
}
)
api_client = plaid.ApiClient(configuration)
self.client = plaid_api.PlaidApi(api_client)
The environment toggle (sandbox / development / production) is read from an env var with a safe default. In production, you never want to accidentally hit sandbox.
Step 2: Link Tokens -- The Entry Point
Plaid Link is a drop-in frontend component that lets users securely connect their bank accounts. Your backend creates a link token, your frontend uses it to open the Link flow, and then you exchange the resulting public token for a persistent access token.
def create_link_token(self, user_id: str, products: List[str] = None) -> Dict[str, Any]:
"""Create a link token for Plaid Link"""
try:
product_list = [Products(p) for p in (products or ["transactions"])]
request = LinkTokenCreateRequest(
products=product_list,
client_name="Your App Name",
country_codes=[CountryCode("US")],
language="en",
user=LinkTokenCreateRequestUser(client_user_id=user_id)
)
response = self.client.link_token_create(request)
return {
"success": True,
"link_token": response.link_token,
"expiration": response.expiration.isoformat()
}
except Exception as e:
return {"success": False, "error": str(e)}
The products parameter controls what data you can access. Common options:
-
"transactions"-- account balances and transaction history -
"auth"-- account and routing numbers (for ACH) -
"identity"-- account holder name, address, email
You can request multiple products in one link session.
Step 3: Token Exchange
When the user completes the Link flow, Plaid gives your frontend a public_token. This is temporary. You exchange it for a permanent access_token that you store server-side.
def exchange_public_token(self, public_token: str) -> Dict[str, Any]:
"""Exchange public token for access token"""
try:
request = ItemPublicTokenExchangeRequest(public_token=public_token)
response = self.client.item_public_token_exchange(request)
return {
"success": True,
"access_token": response.access_token,
"item_id": response.item_id
}
except Exception as e:
return {"success": False, "error": str(e)}
Important: The access_token is a credential. Store it encrypted. Never log it. Never send it to the frontend.
Step 4: Pulling Transactions
This is where it gets useful. Plaid's transaction endpoint supports date ranges, pagination, and account filtering.
def get_transactions(self, access_token: str, start_date: str = None,
end_date: str = None, account_ids: List[str] = None,
count: int = 100, offset: int = 0) -> Dict[str, Any]:
"""Get transactions for date range"""
try:
if not start_date:
start_date = (datetime.now() - timedelta(days=30)).strftime("%Y-%m-%d")
if not end_date:
end_date = datetime.now().strftime("%Y-%m-%d")
options = TransactionsGetRequestOptions(count=count, offset=offset)
if account_ids:
options.account_ids = account_ids
request = TransactionsGetRequest(
access_token=access_token,
start_date=datetime.strptime(start_date, "%Y-%m-%d").date(),
end_date=datetime.strptime(end_date, "%Y-%m-%d").date(),
options=options
)
response = self.client.transactions_get(request)
return {
"success": True,
"total_transactions": response.total_transactions,
"transactions": [{
"transaction_id": t.transaction_id,
"account_id": t.account_id,
"date": t.date.isoformat(),
"name": t.name,
"merchant_name": t.merchant_name,
"amount": t.amount,
"currency": t.iso_currency_code,
"category": t.category,
"pending": t.pending,
"payment_channel": t.payment_channel.value if t.payment_channel else None
} for t in response.transactions]
}
except Exception as e:
return {"success": False, "error": str(e)}
Plaid returns up to 500 transactions per request. For accounts with heavy activity, you need to paginate:
def get_all_transactions(self, access_token: str, start_date: str = None,
end_date: str = None) -> List[Dict]:
"""Get all transactions (handles pagination)"""
all_transactions = []
offset = 0
count = 500
while True:
result = self.get_transactions(
access_token, start_date, end_date,
count=count, offset=offset
)
if not result.get("success"):
break
transactions = result.get("transactions", [])
all_transactions.extend(transactions)
if len(transactions) < count:
break
offset += count
return all_transactions
Step 5: Bank Reconciliation
Reconciliation is matching your internal records against what the bank actually shows. This is where business applications diverge from simple "show me my transactions" dashboards.
The approach here is straightforward: match by amount (within a penny tolerance), then report what matched, what's in the bank but not in your records, and what's in your records but not in the bank.
def reconcile_transactions(self, access_token: str, expected_transactions: List[Dict],
start_date: str = None, end_date: str = None) -> Dict[str, Any]:
"""Reconcile bank transactions against expected transactions"""
try:
bank_txns = self.get_all_transactions(access_token, start_date, end_date)
matched = []
unmatched_bank = []
unmatched_expected = list(expected_transactions)
for bank_txn in bank_txns:
found_match = False
for i, exp_txn in enumerate(unmatched_expected):
if abs(bank_txn["amount"] - exp_txn.get("amount", 0)) < 0.01:
matched.append({
"bank_transaction": bank_txn,
"expected_transaction": exp_txn,
"status": "matched"
})
unmatched_expected.pop(i)
found_match = True
break
if not found_match:
unmatched_bank.append(bank_txn)
return {
"success": True,
"matched_count": len(matched),
"unmatched_bank_count": len(unmatched_bank),
"unmatched_expected_count": len(unmatched_expected),
"matched": matched,
"unmatched_bank": unmatched_bank,
"unmatched_expected": unmatched_expected,
"reconciliation_status": "complete" if not unmatched_bank and not unmatched_expected else "needs_review"
}
except Exception as e:
return {"success": False, "error": str(e)}
The reconciliation_status field gives you a quick check: "complete" means everything lines up, "needs_review" means a human should look at the discrepancies.
Step 6: Spending Categorization
Plaid provides category data with transactions, but the raw categories are nested lists. This method rolls them up into a summary with totals:
def categorize_transactions(self, transactions: List[Dict]) -> Dict[str, Any]:
"""Categorize transactions by type"""
categories = {}
for txn in transactions:
cat = txn.get("category", ["Uncategorized"])[0] if txn.get("category") else "Uncategorized"
if cat not in categories:
categories[cat] = {"count": 0, "total": 0, "transactions": []}
categories[cat]["count"] += 1
categories[cat]["total"] += txn.get("amount", 0)
categories[cat]["transactions"].append(txn)
return {
"success": True,
"categories": categories,
"summary": {cat: {"count": data["count"], "total": data["total"]}
for cat, data in categories.items()}
}
And for a full spending report with top merchants:
def get_spending_summary(self, access_token: str, start_date: str = None,
end_date: str = None) -> Dict[str, Any]:
"""Get spending summary by category"""
transactions = self.get_all_transactions(access_token, start_date, end_date)
# In Plaid, positive amounts are debits (money out)
expenses = [t for t in transactions if t["amount"] > 0]
income = [t for t in transactions if t["amount"] < 0]
categorized = self.categorize_transactions(expenses)
return {
"success": True,
"period": {"start": start_date, "end": end_date},
"total_expenses": sum(t["amount"] for t in expenses),
"total_income": abs(sum(t["amount"] for t in income)),
"transaction_count": len(transactions),
"expense_categories": categorized.get("summary", {}),
"top_merchants": self._get_top_merchants(expenses, limit=10)
}
One thing that catches people: Plaid uses positive numbers for debits and negative numbers for credits. This is the opposite of what most people expect. The code above handles this correctly by filtering on sign.
Putting It Together
With the singleton pattern, using this in your application is straightforward:
from dotenv import load_dotenv
load_dotenv()
from eti_integrations import get_plaid
plaid = get_plaid()
# Check if Plaid is configured
if not plaid.initialized:
print("Set PLAID_CLIENT_ID and PLAID_SECRET in .env")
else:
# Create a link token for your frontend
link = plaid.create_link_token(user_id="user_123")
print(f"Link token: {link['link_token']}")
# After the user completes Link and you have an access token:
summary = plaid.get_spending_summary(access_token, "2025-01-01", "2025-03-31")
print(f"Total expenses: ${summary['total_expenses']:.2f}")
print(f"Total income: ${summary['total_income']:.2f}")
for category, data in summary["expense_categories"].items():
print(f" {category}: ${data['total']:.2f} ({data['count']} transactions)")
What I'd Build Next
This module covers the core Plaid use cases. If I were extending it, I'd add:
- Plaid Investments for brokerage account data
- Plaid Liabilities for credit card and loan balances
- Webhook handling for real-time transaction updates instead of polling
- Caching for balance requests that don't need real-time data
This Plaid module is one of 9 production Python modules in the AI Integration Pack. The other 8 cover Stripe, PayPal, Twilio, Airtable, ClickUp, Outlook, and e-commerce platforms (Gumroad, LemonSqueezy, Paddle). 4,488 lines of code total, 152 tests, security-audited.
If you're interested: AI Integration Pack on Gumroad -- $49 for the full pack.
Top comments (0)