Payneteasy

Posted on May 20

Smart transaction routing: turning auth-rate data into routing rules (without a black box)

#payments #architecture #fintech #tutorial

TL;DR

A "smart router" is not a model — it's a rules engine fed by fresh, segmented approval-rate data.

Inputs you actually need: BIN/issuer, country, MCC, amount, currency, retry-count, acquirer health, rolling approval window.

Cascading retries: retry only soft declines, never hard / fraud / lost-stolen. Fresh idempotency key per attempt.

Approval-rate-aware routing without a circuit breaker is a way to hammer a dead acquirer. Don't skip the breaker.

The phrase "smart transaction routing" gets used to sell a lot of black boxes. This post is the opposite: how to build one whose decisions a human can read, and how to keep it honest with data.

Why a single acquirer stops being enough

Three things, sooner or later:

A regional approval dip. Card schemes adjust, an issuer changes its fraud rules, you wake up to a 5pp drop in DE/NL and zero levers to pull.
An outage. Acquirers go down. A merger or migration takes a network offline for a window. With one acquirer, your checkout is down with it.
Pricing leverage. Once you can move a percentage of traffic with a config change, every renegotiation is real.

The fix is structural: route every transaction through a layer that knows about more than one acquirer and decides where it goes.

Anatomy of a rules engine

Inputs that matter (in roughly this order of usefulness):

Input	Why it changes the route
Issuer BIN / country	Local acquirers approve local cards better — almost universally
Currency / amount	Cross-border fees jump at currency mismatches and ticket size
MCC	Some acquirers are strong in specific verticals (travel, digital goods, subscription)
Brand (Visa / Mastercard / Amex / local)	Amex / domestic networks often need a specialist acquirer
Retry count	Sticky-on-retry vs failover-on-retry are different strategies
Acquirer health / approval-rate window	Excludes acquirers in an active dip
Time-of-day / day-of-week	Some issuers have approval cycles — rarely worth bothering with day one

Outputs are simple: the chosen acquirer (plus credentials) and a fallback chain.

# rules.yaml
- name: eu-cards-primary
  match: { issuer_country: [DE, FR, NL, ES], currency: EUR, amount_lt: 50000 }
  route:
    - { acquirer: acq_a, weight: 70 }
    - { acquirer: acq_b, weight: 30 }
  fallback: [acq_c]

- name: high-ticket-amex
  match: { brand: amex, amount_gte: 50000 }
  route: [{ acquirer: acq_amex_specialist, weight: 100 }]
  fallback: [acq_a]

- name: latam-default
  match: { issuer_country: [BR, MX, CO, AR] }
  route: [{ acquirer: acq_local_latam, weight: 100 }]
  fallback: [acq_a]

def evaluate(rules, ctx):
    for r in rules:
        if matches(r["match"], ctx):
            return pick_weighted(r["route"]), r.get("fallback", [])
    return default_route(ctx), default_fallbacks()

The rules file is the contract between engineering and ops. If your PM can't read it without you, you've built a black box. The most common mistake here is letting the matcher grow regex/DSL features until only its author understands it — resist.

Routing strategies, with honest tradeoffs

Strategy	When to use	Optimizes	Risk
Least-cost	Stable approval rates across acquirers	Fees per approved tx	A penny saved on fees can cost a dollar in declines if approval is uneven
Approval-rate-aware	Volatile approval, multi-region	Overall approval %	Requires fresh data; flap risk if the window is too small
Weighted A/B	Onboarding a new acquirer	Risk-controlled ramp	Don't keep the A/B forever — pick a winner
Sticky-on-retry	Card-on-file retries	Consistency / fewer step-ups	Sticky to a failing acquirer = obvious bug
Failover-on-retry	First attempt failed, try elsewhere	Recovers approvals	Wrong on hard declines — see below

The boring answer is that mature stacks combine all of these — the rules file becomes the explicit place where you say when each one fires.

Turning auth-rate data into a routing rule (without a model)

You don't need ML for this on day one. A rolling window per (acquirer, bin_country, brand) is enough to be useful:

def approval_rate(acq, bin_country, brand, window=timedelta(minutes=15)):
    rows = stats.fetch(acq, bin_country, brand, since=now()-window)
    if rows.attempts < MIN_SAMPLE:
        return None  # not enough signal — fall back to the default rule
    return rows.approved / rows.attempts

Two production-grade details:

Minimum sample. With low volume on a corridor, a single decline drops the rate to 0% and you'd eject a fine acquirer. Require N ≥ 50 attempts before letting the data steer.
Damping. Don't switch winners on every refresh. EMA, hysteresis bands, or a 5-minute lockout after a flip — pick one.

Then the routing rule becomes:

def best_acquirer(candidates, ctx):
    ranked = []
    for acq in candidates:
        rate = approval_rate(acq, ctx.bin_country, ctx.brand)
        if rate is None: rate = baseline(acq, ctx)
        if rate < KILL_SWITCH:           # e.g. 0.10
            continue
        ranked.append((rate, acq))
    ranked.sort(reverse=True)
    return [acq for _, acq in ranked]

That's the entire "smart" of smart routing on day one. Add cost-weighting after the approval-rate signal is stable.

Cascading retries done right

Retry soft declines, never hard ones:

Decline class	Retry?	Why
`do_not_honor` (often issuer transient)	Yes — different acquirer	Issuers re-evaluate via a different fingerprint
`insufficient_funds`	Maybe, after delay	Topping-up is a real event but most retries are wishful
`issuer_unavailable` / `network_timeout`	Yes	Definitionally transient
`lost_card` / `stolen_card` / `pickup_card`	Never	Card-network rule; you'll get fined
`do_not_honor` flagged as fraud	Never	Fraud scores stack
`expired_card`	No	Need new credentials

def charge_with_cascade(ctx, primary, fallbacks):
    for acq in [primary, *fallbacks]:
        if circuit_open(acq):
            continue
        res = adapters[acq].charge(ctx, idem_key=attempt_key(ctx))
        if res.approved: return res
        if res.taxonomy in ("hard", "fraud"):
            return res   # do NOT cascade
    return Declined("all routes exhausted")

The two non-obvious details:

Fresh idempotency key per attempt. Different acquirers don't share state; reusing the same key is undefined.
Hard-decline short-circuit. This is also the thing that protects you from disputes if a card was reported stolen between attempts.

Health checks, circuit breakers, observability

The router will lie to you if it's not measured. The minimum kit:

Synthetic pings. Heartbeats per acquirer, decoupled from real traffic.
Error-rate circuit breaker. Trip on (5xx + timeouts) / total > X% over Y minutes. Auto-eject, auto-return after a cool-down.
Approval-rate alerting. Page on a 3σ drop per (acquirer × bin_country).
Per-rule shadow. Log what the previous rule version would have decided. You'll need this every time you change a rule.

@dataclass
class CircuitBreaker:
    fail_rate_threshold: float = 0.25
    window: timedelta = timedelta(minutes=5)
    cooldown: timedelta = timedelta(minutes=10)

    def open_for(self, acq) -> bool: ...
    def record(self, acq, ok: bool) -> None: ...

Metrics that matter

Metric	Formula	Target
Overall approval %	approved / attempted (unique tx)	maximize
Cost per approved tx	total fees / approved	minimize
Fallback rate	tx using fallback / total	low + stable
Retry success uplift	extra approvals from cascade / attempted	track & celebrate
p95 routing latency	—	< 50 ms
Per-rule decision drift	rules diff vs shadow	review weekly

Without these you have a router; with them you have a routing system.

Build vs buy

This is the article that gets answered most often with "buy" — and that's usually right when routing isn't your moat. Where an orchestration platform actually saves months is in the adapters, normalized error taxonomy, and reconciliation, not the rules engine itself. The rules engine is the easy part; staying current with 12 acquirer APIs is what burns a team out.

A pragmatic split: keep the rules and the data in-house, integrate the adapters via a platform. That gives you the strategic edge and removes the busy-work.

If you want the orchestration-layer view of the whole picture, the payment orchestration overview walks through the architecture and the build-vs-buy lines we use with customers.

The next post in this series gets into cross-border reconciliation — settlement files, currency, fee parsing — which is where most homemade routing layers quietly fall over.

*Author: payments engineer at PaynetEasy — we build payment orchestration and global payouts infrastructure → payneteasy.com

DEV Community