Wes Dieleman

Posted on Mar 23

Parsing bank transaction strings is way harder than you think

#fintech #ai #webdev #api

Look at this string:

POS 1234 AMZN Mktp NL 29.99 EUR

Seems easy to parse, right? You see "AMZN," you think Amazon, done.

Now try extracting all of this:

The actual merchant name
What category the purchase belongs to
Where the purchase happened
Whether a payment processor was involved
How confident you are in all of the above

Now do it for these too:

UBER BV HELP.UBER.COM NL
CRV*UBER EATS 123456
SPOTIFY P1234 STOCKHOLM SE
SQ *VERVE COFFEE ROASTERS SAN FRAN
TST* RESTAURANT DE HAVEN AMSTERDAM
APPLE PAY *SQ *BLUE BOTTLE
PAYPAL #12367121, Milhouse Hostel, Buenos Aires

Every single one of those follows different conventions. Different banks, different countries, different payment processors, different abbreviations. No standard format. No shared merchant IDs. No consistency whatsoever.

I've spent the better part of two years working on this problem, and in this post I want to share what I learned about why it's so deceptively difficult and what approaches actually work.

Why bank transactions look the way they do

Bank transaction descriptors were never designed for humans. They were designed for clearing and reconciliation between financial institutions. The priority was unique identifiers and processing codes, not readable merchant names.

That means:

Character limits from legacy banking infrastructure truncate merchant names
Acquirers, card networks, and payment facilitators each inject their own tokens
The same merchant appears differently depending on the bank, the country, and the payment method
There's no universal merchant identifier across the payment ecosystem

A single Starbucks purchase might appear as any of these depending on where and how you paid:

STARBUCKS #12345 SEATTLE WA
SBX*STARBUCKS MOBILE ORDER
STARBUCKS COFFEE 0012345
CARD PURCHASE STARBUCKS

Four different strings. Same brand. Your system needs to handle all of them, plus the thousands of other formats from thousands of other merchants.

The regex trap

Every developer's first instinct is regex. It's the obvious choice, and it works surprisingly well for a prototype.

RULES = {
    "STARBUCKS": {"name": "Starbucks", "category": "Coffee"},
    "AMAZON": {"name": "Amazon", "category": "Shopping"},
    "UBER": {"name": "Uber", "category": "Transportation"},
}

def enrich(description):
    for keyword, result in RULES.items():
        if keyword in description.upper():
            return result
    return {"name": description, "category": "Unknown"}

This looks clean. It handles the obvious cases. Ship it.

Except:

"UBER EATS DELIVERY" → returns "Transportation" (wrong it's food delivery)
"SBX*STARBUCKS MOBILE" → might not match your STARBUCKS rule depending on how you wrote the regex
"AMZN Mktp" → doesn't match "AMAZON"
"CRV*UBER EATS" → which rule fires first?

The number of rules grows linearly with the number of merchants and transaction formats you need to support. With thousands of merchants across multiple countries, you're looking at a full-time maintenance job just to keep the rules from rotting.

I learned this the hard way. Rules feel like control until they become a ball of special cases that nobody wants to touch.

Why keyword matching and static databases also fail

The next step up is building or buying a merchant database: a structured dataset that maps merchant strings to clean names and categories.

This is better. You get consistency, better coverage for major brands, and a more maintainable system. But static databases have structural problems that get worse over time:

The long tail kills you. The top 500 global merchants might cover roughly half of all consumer transactions. The other half is distributed across millions of local, regional, and new businesses. A Thai restaurant in Rotterdam, a barbershop in Lagos, a popup coffee stand in Melbourne, none of them are in your database.

New businesses appear constantly. In the US alone, roughly five million new businesses are registered every year. Your database is outdated the moment you ship it.

Payment intermediaries break name matching. When a transaction goes through Square, Stripe, PayPal, or Apple Pay, the descriptor often shows the intermediary instead of the merchant. APPLE PAY *SQ *BLUE BOTTLE has three entities layered into one string. A static lookup returns "Apple" or "Square" when you needed "Blue Bottle Coffee."

International coverage is uneven. Database vendors focus on the markets with the most paying customers — US, UK, Western Europe. If your product works in Southeast Asia, Latin America, or Africa, coverage drops off a cliff.

The edge cases that haunt you

Some transaction types are structurally hostile to enrichment, and they represent a larger share of real-world data than most teams expect:

Multi-category merchants. Amazon sells groceries, electronics, clothing, digital subscriptions, and cloud services. Walmart sells food, hardware, and pharmacy products. A single MCC code (if it's even available) tells you nothing about what was actually purchased.

Digital wallet chains. When a transaction shows APPLE PAY *SQUAREUP, your system needs to understand three layers: Apple Pay is the wallet, Square is the processor, and there's an actual merchant hiding behind both. Most systems just return "Apple" or "Square."

Peer-to-peer transfers. Venmo, Zelle, and PayPal transfers look like expenses in the bank feed, but they're money movements, not purchases. Categorizing a Venmo payment as "Financial Services" is technically accurate but useless to a user.

Generic descriptors. Some transactions are just POS PAYMENT, DIRECT DEBIT, or SEPA CREDIT TRANSFER. There is no merchant signal. Period. No system can enrich these accurately, and any system that claims otherwise is hallucinating.

That last point matters more than people realize. Honesty about uncertainty. Knowing when you don't know is what separates a reliable system from one that confidently gives you wrong answers.

What actually works: context over exact matching

After burning through regex, keyword lists, and database lookups, I found that the approaches that actually work at scale share a common trait: they use contextual reasoning rather than exact matching.

Here's what that means in practice:

Combine AI with web-derived context. Instead of asking "Is this string in my database?", ask "What real-world entity does this string most likely represent?" Modern language models are remarkably good at interpreting messy text when you give them the right signals: the transaction string, the country, the amount, the payment type. Supplement that with real-time web data (business directories, map listings, review platforms) and you can identify merchants that no static database would ever cover.

Separate the payment chain. A good system doesn't just identify one entity. It separates the wallet from the processor from the merchant, giving you clean data for each layer. The transaction PAYPAL #12367121, Milhouse Hostel, Buenos Aires should return PayPal as the intermediary and Milhouse Hostel as the merchant, with a location in Buenos Aires.

Use confidence scoring, not binary matching. Every enrichment should carry a confidence score. A Starbucks transaction with 0.98 confidence is fundamentally different from a TST* UNKNOWN VENDOR with 0.35 confidence. Your UI should treat them differently:

function displayMerchant(enrichment, rawDescription) {
  if (enrichment.confidence >= 0.85) {
    return enrichment.merchant.name;
  }
  if (enrichment.confidence >= 0.6) {
    return `${enrichment.merchant.name} (?)`;
  }
  return rawDescription; // show the raw string, don't guess
}

This is a design principle, not just a technical one. Users trust an app that says "I'm not sure" far more than one that confidently mislabels their transactions.

What structured output actually looks like

To make the difference concrete, here's a real-world example. Raw input:

PAYPAL #12367121, Milhouse Hostel, Buenos Aires

Structured output from an enrichment system that does this well:

{
  "merchant": {
    "name": "Milhouse Hostel",
    "logo": "https://logos.triqai.com/images/milhousehostelcom",
    "website": "https://milhousehostel.com/"
  },
  "category": {
    "primary": "Hotels",
    "secondary": "Accommodation",
    "tertiary": "Travel"
  },
  "location": {
    "city": "Buenos Aires",
    "country": "AR",
    "formatted": "Costa Rica 4526, Buenos Aires, Argentina",
    "coordinates": { "latitude": -34.6037, "longitude": -58.3816 }
  },
  "intermediary": {
    "name": "PayPal",
    "type": "processor"
  },
  "channel": "in_store",
  "confidence": 0.90
}

From one messy string, you get: the real merchant (not PayPal), a hierarchical spending category, a structured location with GPS coordinates, the intermediary identified and separated, and a confidence score.

That's the difference between a bank statement that says PAYPAL #12367121 and a financial app that shows you "Milhouse Hostel — Buenos Aires, Argentina" with a logo and a travel spending category.

The hard truth about impossible transactions

Not everything can be enriched. This is important to acknowledge because overpromising is a credibility killer in fintech.

These will always be low-confidence:

PAYMENT THANK YOU — literally no signal
SEPA CREDIT TRANSFER — describes the mechanism, not the merchant
POS 00001234 — just a terminal ID
DIRECT DEBIT MANDATE REF 1234567890 — internal bank reference

A good enrichment system returns low confidence on these and lets your app decide what to show. A bad one guesses "Shopping" and hopes nobody notices.

Being transparent about limitations actually builds more trust than claiming 100% accuracy. If your system reliably tells users "I'm not sure about this one," they'll trust it more when it does give a confident answer.

Why I ended up building an API for this

I ran into this problem while building fintech tools and couldn't find a solution that handled all the edge cases I cared about: intermediary separation, international coverage, confidence scoring, and long-tail merchant identification without a massive static database.

So I built Triqai, a transaction enrichment API that uses AI reasoning and web context instead of fixed merchant lists. It processes raw transaction strings from any bank in any country and returns structured merchant, category, and location data with confidence scores.

The API is a single endpoint, send a transaction, get back structured data:

curl -X POST https://api.triqai.com/v1/transactions/enrich \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "SQ *VERVE COFFEE ROASTERS SAN FRAN",
    "country": "US",
    "type": "expense"
  }'

There's a free tier (100 enrichments/month, no credit card) if you want to test it against your own data.

Practical advice if you're solving this yourself

Whether you use a third-party API or build something in-house, here are patterns I've found work well in production:

Always store raw + enriched data side by side. Raw data lets you re-enrich as your system improves. It's also a compliance requirement in most financial contexts.

Cache aggressively by descriptor string. The same descriptor (like STARBUCKS #12345) resolves to the same merchant every time. A simple cache with a 48-hour TTL cuts your processing costs dramatically:

import hashlib, json

def cache_key(description, country):
    raw = f"{description}:{country}"
    return f"enrich:{hashlib.md5(raw.encode()).hexdigest()}"

Filter transactions that don't need enrichment. Internal bank transfers, ATM withdrawals, interest payments, and fee charges generally don't have a meaningful merchant. Skip them to reduce noise and cost.

Respect user corrections permanently. If a user recategorizes a transaction, that override should persist through any re-enrichment cycle. Users who feel their input is ignored stop using your app.

Design your UI around confidence tiers. High confidence (>85%): show the enriched data. Medium (60-85%): show it with a softer presentation or edit option. Low (<60%): show the raw string and let the user categorize manually.

Wrapping up

Parsing bank transactions feels like a string-processing problem until you actually sit down and try to do it at scale. Then it becomes a data problem, a coverage problem, an internationalization problem, and a trust problem all at once.

The approaches that work combine contextual reasoning (not just exact matching), honest confidence scoring (not false certainty), and the humility to say "I don't know" when the data genuinely doesn't contain enough signal.

If you're building anything in fintech that touches transaction data: budgeting tools, accounting platforms, banking apps, expense trackers . This is foundational infrastructure that's worth getting right early.

Happy to answer questions or discuss approaches in the comments. And if you're dealing with this problem right now, I'd genuinely love to hear what edge cases are tripping you up.

If you want to dig deeper into specific aspects: