TL;DR
- A "smart router" is not a model — it's a rules engine fed by fresh, segmented approval-rate data.
- Inputs you actually need: BIN/issuer, country, MCC, amount, currency, retry-count, acquirer health, rolling approval window.
- Cascading retries: retry only soft declines, never hard / fraud / lost-stolen. Fresh idempotency key per attempt.
- Approval-rate-aware routing without a circuit breaker is a way to hammer a dead acquirer. Don't skip the breaker.
The phrase "smart transaction routing" gets used to sell a lot of black boxes. This post is the opposite: how to build one whose decisions a human can read, and how to keep it honest with data.
Why a single acquirer stops being enough
Three things, sooner or later:
- A regional approval dip. Card schemes adjust, an issuer changes its fraud rules, you wake up to a 5pp drop in DE/NL and zero levers to pull.
- An outage. Acquirers go down. A merger or migration takes a network offline for a window. With one acquirer, your checkout is down with it.
- Pricing leverage. Once you can move a percentage of traffic with a config change, every renegotiation is real.
The fix is structural: route every transaction through a layer that knows about more than one acquirer and decides where it goes.
Anatomy of a rules engine
Inputs that matter (in roughly this order of usefulness):
| Input | Why it changes the route |
|---|---|
| Issuer BIN / country | Local acquirers approve local cards better — almost universally |
| Currency / amount | Cross-border fees jump at currency mismatches and ticket size |
| MCC | Some acquirers are strong in specific verticals (travel, digital goods, subscription) |
| Brand (Visa / Mastercard / Amex / local) | Amex / domestic networks often need a specialist acquirer |
| Retry count | Sticky-on-retry vs failover-on-retry are different strategies |
| Acquirer health / approval-rate window | Excludes acquirers in an active dip |
| Time-of-day / day-of-week | Some issuers have approval cycles — rarely worth bothering with day one |
Outputs are simple: the chosen acquirer (plus credentials) and a fallback chain.
# rules.yaml
- name: eu-cards-primary
match: { issuer_country: [DE, FR, NL, ES], currency: EUR, amount_lt: 50000 }
route:
- { acquirer: acq_a, weight: 70 }
- { acquirer: acq_b, weight: 30 }
fallback: [acq_c]
- name: high-ticket-amex
match: { brand: amex, amount_gte: 50000 }
route: [{ acquirer: acq_amex_specialist, weight: 100 }]
fallback: [acq_a]
- name: latam-default
match: { issuer_country: [BR, MX, CO, AR] }
route: [{ acquirer: acq_local_latam, weight: 100 }]
fallback: [acq_a]
def evaluate(rules, ctx):
for r in rules:
if matches(r["match"], ctx):
return pick_weighted(r["route"]), r.get("fallback", [])
return default_route(ctx), default_fallbacks()
The rules file is the contract between engineering and ops. If your PM can't read it without you, you've built a black box. The most common mistake here is letting the matcher grow regex/DSL features until only its author understands it — resist.
Routing strategies, with honest tradeoffs
| Strategy | When to use | Optimizes | Risk |
|---|---|---|---|
| Least-cost | Stable approval rates across acquirers | Fees per approved tx | A penny saved on fees can cost a dollar in declines if approval is uneven |
| Approval-rate-aware | Volatile approval, multi-region | Overall approval % | Requires fresh data; flap risk if the window is too small |
| Weighted A/B | Onboarding a new acquirer | Risk-controlled ramp | Don't keep the A/B forever — pick a winner |
| Sticky-on-retry | Card-on-file retries | Consistency / fewer step-ups | Sticky to a failing acquirer = obvious bug |
| Failover-on-retry | First attempt failed, try elsewhere | Recovers approvals | Wrong on hard declines — see below |
The boring answer is that mature stacks combine all of these — the rules file becomes the explicit place where you say when each one fires.
Turning auth-rate data into a routing rule (without a model)
You don't need ML for this on day one. A rolling window per (acquirer, bin_country, brand) is enough to be useful:
def approval_rate(acq, bin_country, brand, window=timedelta(minutes=15)):
rows = stats.fetch(acq, bin_country, brand, since=now()-window)
if rows.attempts < MIN_SAMPLE:
return None # not enough signal — fall back to the default rule
return rows.approved / rows.attempts
Two production-grade details:
- Minimum sample. With low volume on a corridor, a single decline drops the rate to 0% and you'd eject a fine acquirer. Require N ≥ 50 attempts before letting the data steer.
- Damping. Don't switch winners on every refresh. EMA, hysteresis bands, or a 5-minute lockout after a flip — pick one.
Then the routing rule becomes:
def best_acquirer(candidates, ctx):
ranked = []
for acq in candidates:
rate = approval_rate(acq, ctx.bin_country, ctx.brand)
if rate is None: rate = baseline(acq, ctx)
if rate < KILL_SWITCH: # e.g. 0.10
continue
ranked.append((rate, acq))
ranked.sort(reverse=True)
return [acq for _, acq in ranked]
That's the entire "smart" of smart routing on day one. Add cost-weighting after the approval-rate signal is stable.
Cascading retries done right
Retry soft declines, never hard ones:
| Decline class | Retry? | Why |
|---|---|---|
do_not_honor (often issuer transient) |
Yes — different acquirer | Issuers re-evaluate via a different fingerprint |
insufficient_funds |
Maybe, after delay | Topping-up is a real event but most retries are wishful |
issuer_unavailable / network_timeout
|
Yes | Definitionally transient |
lost_card / stolen_card / pickup_card
|
Never | Card-network rule; you'll get fined |
do_not_honor flagged as fraud |
Never | Fraud scores stack |
expired_card |
No | Need new credentials |
def charge_with_cascade(ctx, primary, fallbacks):
for acq in [primary, *fallbacks]:
if circuit_open(acq):
continue
res = adapters[acq].charge(ctx, idem_key=attempt_key(ctx))
if res.approved: return res
if res.taxonomy in ("hard", "fraud"):
return res # do NOT cascade
return Declined("all routes exhausted")
The two non-obvious details:
- Fresh idempotency key per attempt. Different acquirers don't share state; reusing the same key is undefined.
- Hard-decline short-circuit. This is also the thing that protects you from disputes if a card was reported stolen between attempts.
Health checks, circuit breakers, observability
The router will lie to you if it's not measured. The minimum kit:
- Synthetic pings. Heartbeats per acquirer, decoupled from real traffic.
- Error-rate circuit breaker. Trip on (5xx + timeouts) / total > X% over Y minutes. Auto-eject, auto-return after a cool-down.
-
Approval-rate alerting. Page on a 3σ drop per
(acquirer × bin_country). - Per-rule shadow. Log what the previous rule version would have decided. You'll need this every time you change a rule.
@dataclass
class CircuitBreaker:
fail_rate_threshold: float = 0.25
window: timedelta = timedelta(minutes=5)
cooldown: timedelta = timedelta(minutes=10)
def open_for(self, acq) -> bool: ...
def record(self, acq, ok: bool) -> None: ...
Metrics that matter
| Metric | Formula | Target |
|---|---|---|
| Overall approval % | approved / attempted (unique tx) | maximize |
| Cost per approved tx | total fees / approved | minimize |
| Fallback rate | tx using fallback / total | low + stable |
| Retry success uplift | extra approvals from cascade / attempted | track & celebrate |
| p95 routing latency | — | < 50 ms |
| Per-rule decision drift | rules diff vs shadow | review weekly |
Without these you have a router; with them you have a routing system.
Build vs buy
This is the article that gets answered most often with "buy" — and that's usually right when routing isn't your moat. Where an orchestration platform actually saves months is in the adapters, normalized error taxonomy, and reconciliation, not the rules engine itself. The rules engine is the easy part; staying current with 12 acquirer APIs is what burns a team out.
A pragmatic split: keep the rules and the data in-house, integrate the adapters via a platform. That gives you the strategic edge and removes the busy-work.
If you want the orchestration-layer view of the whole picture, the payment orchestration overview walks through the architecture and the build-vs-buy lines we use with customers.
The next post in this series gets into cross-border reconciliation — settlement files, currency, fee parsing — which is where most homemade routing layers quietly fall over.
*Author: payments engineer at PaynetEasy — we build payment orchestration and global payouts infrastructure → payneteasy.com
Top comments (0)