DEV Community

Payneteasy
Payneteasy

Posted on

Smart transaction routing: turning auth-rate data into routing rules (without a black box)

TL;DR

  • A "smart router" is not a model — it's a rules engine fed by fresh, segmented approval-rate data.
  • Inputs you actually need: BIN/issuer, country, MCC, amount, currency, retry-count, acquirer health, rolling approval window.
  • Cascading retries: retry only soft declines, never hard / fraud / lost-stolen. Fresh idempotency key per attempt.
  • Approval-rate-aware routing without a circuit breaker is a way to hammer a dead acquirer. Don't skip the breaker.

The phrase "smart transaction routing" gets used to sell a lot of black boxes. This post is the opposite: how to build one whose decisions a human can read, and how to keep it honest with data.

Why a single acquirer stops being enough

Three things, sooner or later:

  • A regional approval dip. Card schemes adjust, an issuer changes its fraud rules, you wake up to a 5pp drop in DE/NL and zero levers to pull.
  • An outage. Acquirers go down. A merger or migration takes a network offline for a window. With one acquirer, your checkout is down with it.
  • Pricing leverage. Once you can move a percentage of traffic with a config change, every renegotiation is real.

The fix is structural: route every transaction through a layer that knows about more than one acquirer and decides where it goes.

Anatomy of a rules engine

Inputs that matter (in roughly this order of usefulness):

Input Why it changes the route
Issuer BIN / country Local acquirers approve local cards better — almost universally
Currency / amount Cross-border fees jump at currency mismatches and ticket size
MCC Some acquirers are strong in specific verticals (travel, digital goods, subscription)
Brand (Visa / Mastercard / Amex / local) Amex / domestic networks often need a specialist acquirer
Retry count Sticky-on-retry vs failover-on-retry are different strategies
Acquirer health / approval-rate window Excludes acquirers in an active dip
Time-of-day / day-of-week Some issuers have approval cycles — rarely worth bothering with day one

Outputs are simple: the chosen acquirer (plus credentials) and a fallback chain.

# rules.yaml
- name: eu-cards-primary
  match: { issuer_country: [DE, FR, NL, ES], currency: EUR, amount_lt: 50000 }
  route:
    - { acquirer: acq_a, weight: 70 }
    - { acquirer: acq_b, weight: 30 }
  fallback: [acq_c]

- name: high-ticket-amex
  match: { brand: amex, amount_gte: 50000 }
  route: [{ acquirer: acq_amex_specialist, weight: 100 }]
  fallback: [acq_a]

- name: latam-default
  match: { issuer_country: [BR, MX, CO, AR] }
  route: [{ acquirer: acq_local_latam, weight: 100 }]
  fallback: [acq_a]
Enter fullscreen mode Exit fullscreen mode
def evaluate(rules, ctx):
    for r in rules:
        if matches(r["match"], ctx):
            return pick_weighted(r["route"]), r.get("fallback", [])
    return default_route(ctx), default_fallbacks()
Enter fullscreen mode Exit fullscreen mode

The rules file is the contract between engineering and ops. If your PM can't read it without you, you've built a black box. The most common mistake here is letting the matcher grow regex/DSL features until only its author understands it — resist.

Routing strategies, with honest tradeoffs

Strategy When to use Optimizes Risk
Least-cost Stable approval rates across acquirers Fees per approved tx A penny saved on fees can cost a dollar in declines if approval is uneven
Approval-rate-aware Volatile approval, multi-region Overall approval % Requires fresh data; flap risk if the window is too small
Weighted A/B Onboarding a new acquirer Risk-controlled ramp Don't keep the A/B forever — pick a winner
Sticky-on-retry Card-on-file retries Consistency / fewer step-ups Sticky to a failing acquirer = obvious bug
Failover-on-retry First attempt failed, try elsewhere Recovers approvals Wrong on hard declines — see below

The boring answer is that mature stacks combine all of these — the rules file becomes the explicit place where you say when each one fires.

Turning auth-rate data into a routing rule (without a model)

You don't need ML for this on day one. A rolling window per (acquirer, bin_country, brand) is enough to be useful:

def approval_rate(acq, bin_country, brand, window=timedelta(minutes=15)):
    rows = stats.fetch(acq, bin_country, brand, since=now()-window)
    if rows.attempts < MIN_SAMPLE:
        return None  # not enough signal — fall back to the default rule
    return rows.approved / rows.attempts
Enter fullscreen mode Exit fullscreen mode

Two production-grade details:

  • Minimum sample. With low volume on a corridor, a single decline drops the rate to 0% and you'd eject a fine acquirer. Require N ≥ 50 attempts before letting the data steer.
  • Damping. Don't switch winners on every refresh. EMA, hysteresis bands, or a 5-minute lockout after a flip — pick one.

Then the routing rule becomes:

def best_acquirer(candidates, ctx):
    ranked = []
    for acq in candidates:
        rate = approval_rate(acq, ctx.bin_country, ctx.brand)
        if rate is None: rate = baseline(acq, ctx)
        if rate < KILL_SWITCH:           # e.g. 0.10
            continue
        ranked.append((rate, acq))
    ranked.sort(reverse=True)
    return [acq for _, acq in ranked]
Enter fullscreen mode Exit fullscreen mode

That's the entire "smart" of smart routing on day one. Add cost-weighting after the approval-rate signal is stable.

Cascading retries done right

Retry soft declines, never hard ones:

Decline class Retry? Why
do_not_honor (often issuer transient) Yes — different acquirer Issuers re-evaluate via a different fingerprint
insufficient_funds Maybe, after delay Topping-up is a real event but most retries are wishful
issuer_unavailable / network_timeout Yes Definitionally transient
lost_card / stolen_card / pickup_card Never Card-network rule; you'll get fined
do_not_honor flagged as fraud Never Fraud scores stack
expired_card No Need new credentials
def charge_with_cascade(ctx, primary, fallbacks):
    for acq in [primary, *fallbacks]:
        if circuit_open(acq):
            continue
        res = adapters[acq].charge(ctx, idem_key=attempt_key(ctx))
        if res.approved: return res
        if res.taxonomy in ("hard", "fraud"):
            return res   # do NOT cascade
    return Declined("all routes exhausted")
Enter fullscreen mode Exit fullscreen mode

The two non-obvious details:

  • Fresh idempotency key per attempt. Different acquirers don't share state; reusing the same key is undefined.
  • Hard-decline short-circuit. This is also the thing that protects you from disputes if a card was reported stolen between attempts.

Health checks, circuit breakers, observability

The router will lie to you if it's not measured. The minimum kit:

  • Synthetic pings. Heartbeats per acquirer, decoupled from real traffic.
  • Error-rate circuit breaker. Trip on (5xx + timeouts) / total > X% over Y minutes. Auto-eject, auto-return after a cool-down.
  • Approval-rate alerting. Page on a 3σ drop per (acquirer × bin_country).
  • Per-rule shadow. Log what the previous rule version would have decided. You'll need this every time you change a rule.
@dataclass
class CircuitBreaker:
    fail_rate_threshold: float = 0.25
    window: timedelta = timedelta(minutes=5)
    cooldown: timedelta = timedelta(minutes=10)

    def open_for(self, acq) -> bool: ...
    def record(self, acq, ok: bool) -> None: ...
Enter fullscreen mode Exit fullscreen mode

Metrics that matter

Metric Formula Target
Overall approval % approved / attempted (unique tx) maximize
Cost per approved tx total fees / approved minimize
Fallback rate tx using fallback / total low + stable
Retry success uplift extra approvals from cascade / attempted track & celebrate
p95 routing latency < 50 ms
Per-rule decision drift rules diff vs shadow review weekly

Without these you have a router; with them you have a routing system.

Build vs buy

This is the article that gets answered most often with "buy" — and that's usually right when routing isn't your moat. Where an orchestration platform actually saves months is in the adapters, normalized error taxonomy, and reconciliation, not the rules engine itself. The rules engine is the easy part; staying current with 12 acquirer APIs is what burns a team out.

A pragmatic split: keep the rules and the data in-house, integrate the adapters via a platform. That gives you the strategic edge and removes the busy-work.

If you want the orchestration-layer view of the whole picture, the payment orchestration overview walks through the architecture and the build-vs-buy lines we use with customers.

The next post in this series gets into cross-border reconciliation — settlement files, currency, fee parsing — which is where most homemade routing layers quietly fall over.


*Author: payments engineer at PaynetEasy — we build payment orchestration and global payouts infrastructure → payneteasy.com

Top comments (0)