Gatling.io

Posted on May 12 • Originally published at gatling.io

SLO examples for financial services: what good performance looks like in fintech

#sre #gatling #performance #testing

Every financial services company knows what a failed transaction costs. The number is immediate, calculable, and visible in the next day's report. What's less visible — but equally costly — is the slow transaction. The payment that took four seconds instead of half a second. The login that timed out. The dashboard that wouldn't load.

These aren't outages. They don't show up in incident reports. But they erode customer trust, increase support volume, and — in a world where switching costs are lower than ever — they drive churn.

Service Level Objectives (SLOs) are how leading fintech companies make performance measurable before it becomes a problem. This post breaks down what those targets look like, why they're set where they are, and how to know whether your systems are actually meeting them.

Why fintech has stricter performance requirements than most industries
Two things make financial services different when it comes to reliability:

Regulatory exposure. The FDIC's Technology Service Provider Guidance (2024) explicitly cites 99.9% uptime and 1,000+ transactions per minute as baseline expectations for banking technology vendors. The EU's Digital Operational Resilience Act (DORA) mandates continuous availability of critical ICT systems across ~22,000 financial entities and holds management bodies accountable for reviewing performance targets. These aren't voluntary benchmarks — they're compliance requirements with fines up to 2% of annual turnover.

The cost of a slow transaction. In e-commerce, a slow page load costs a conversion. In fintech, a slow or failed transaction costs the transaction — plus the trust that took years to build. Research from Google and Deloitte found that a 0.1-second improvement in load time increases retail conversions by 8.4%. For financial services, where users have zero tolerance for payment failures, the stakes are higher still.

The three tiers of fintech SLOs
Not every part of a financial services platform carries the same risk. A useful starting point is to think in three tiers.

Tier 1: Payment-critical paths
Checkout, payment authorisation, transaction processing

These are the paths where failure has an immediate, measurable cost. The targets here are the strictest in the industry.

Category	SLI	SLO	SLA
What it is	What you measure	What you target	What you promise
Who uses it	Engineering teams	Internal stakeholders	Customers
Its nature	Actual metric value	Internal goal	Legal contract
Example	Current uptime is 99.87%	Target 99.95% uptime	Guarantee 99.9% uptime with credits for breaches

At very high transaction volumes (over 10,000 requests per minute), these targets tighten further — there's no acceptable percentage of users hitting a slow payment path when thousands of transactions are processing simultaneously.

Tier 2: Account access and authentication
Login flows, identity verification, SSO, MFA

Authentication is the gate to everything else. Users have low tolerance for slow logins — it's the first interaction in every session, and a poor experience here colours everything that follows.

Metric	Target
Availability	99.9%
Response time p95	< 150 ms
Response time p99	< 300 ms
Error ratio	< 0.1%

The 150ms p95 threshold reflects the expectation set by modern authentication experiences — Touch ID, Face ID, and SSO flows have trained users to expect near-instant identity verification. Anything slower registers as friction.

Tier 3: Non-payment flows
Dashboards, reporting, account management, back-office tools

These paths carry indirect business impact — slow dashboards frustrate users but don't stop transactions. The targets reflect that difference.

Metric	Target
Availability	99.9%
Response time p95	< 500 ms
Response time p99	< 1,500 ms
Error ratio	< 0.5%

The number most fintech companies get wrong

Almost every fintech company tracks availability. Fewer track latency percentiles. Almost none have a defined error ratio target.

The problem with availability alone is that it's a lagging indicator. Your system can be "up" — returning responses, passing health checks — while 5% of payment requests are timing out. Availability won't catch that. A p99 latency target will.

Error ratio is the metric that closes the gap. It measures the percentage of requests that fail, regardless of whether the system is technically available. Setting a target — even a loose one — forces the question: what counts as a failure? That conversation, had before an incident, is far more productive than the same conversation had during one.

How do financial services companies use SLOs?
Setting targets is one thing. Using them to run a business is another. Here's how leading financial services organisations put SLOs into practice.

They start with business services, not infrastructure. The most common mistake is measuring the wrong thing. The right question is always: can a user successfully pay, quickly, without duplicate charges, and with a correct outcome? CPU utilisation and queue depth are diagnostics — not SLOs.

Key business services to map SLOs to:

Card and wallet payment authorisation
Payment capture and settlement
Login and account access
Balance and transaction history
Refunds and reversals
Webhooks and downstream event delivery
Reconciliation and ledger accuracy
They treat correctness as more important than availability. A payment system that is available but double-charges customers is not reliable. The strongest SLO programs go beyond uptime to measure:

Correctness: no duplicate authorisation or capture
Durability: transactions persisted before success is returned to the caller
Freshness: account balances reflecting posted transactions within a defined window
Reconciliation: ledger entries matching processor and banking records within minutes
For money movement, "available but wrong" can be worse than temporarily unavailable.

They use error budgets to make release decisions. An SLO creates an error budget: the amount of unreliability the system can absorb before reliability takes priority over new features. A practical policy:

Error budget actions
RELIABILITY • RESPONSE
Error budget state Action
Healthy Normal releases
50% consumed Increase monitoring, reduce risky deploys
80% consumed Require approval for payment-path changes
Exhausted Freeze non-critical releases, focus on reliability
Correctness breach Incident response, reconciliation, customer remediation
They separate their own failures from provider failures. Payment systems depend on card networks, processors, fraud vendors, and banking infrastructure. Financial services companies track two SLO views in parallel:

Customer-facing SLO: measures total experience including dependencies
Internal SLO: measures only what their own systems did correctly
This prevents teams from attributing systemic reliability problems to third parties — and helps pinpoint exactly where in the chain a failure originated.

They connect SLOs to resilience testing. Monitoring tells you what happened. Testing tells you what will happen under pressure. Financial firms validate SLOs through:

Load testing against peak transaction volumes
Failover and disaster recovery exercises
Third-party outage simulations
Peak-event readiness testing
Incident postmortems tied to SLO burn
An SLO that has never been stress-tested is a hypothesis, not a commitment.

How to know if you're meeting your SLOs
Setting a target is straightforward. Knowing whether you're meeting it requires two things.

‍Continuous measurement. An SLO checked monthly is a reporting exercise. With organizations averaging 86 outages per year, an SLO evaluated in real time — on every load test run, on every deployment — is an operational tool. Gatling Enterprise Edition evaluates SLOs continuously throughout every test run, producing a compliance score for each metric rather than a pass/fail at the end. If your p99 was under 400ms for 94% of the run, you know that. You also know which 6% you need to investigate.
‍‍
A load test that reflects production. The most common failure mode in performance testing is validating against conditions that don't match reality. A test that simulates 100 users on a payment path tells you something. A test that simulates your actual peak volume — with realistic transaction mix, realistic error conditions, realistic third-party dependencies — tells you whether your SLOs will hold when it matters.
Where to start
If your organisation doesn't have defined SLOs today, the place to start is not a spreadsheet. It's a conversation about what failure actually costs — for each path, at each tier.

The FDIC's 99.9% uptime floor is a useful anchor for Tier 1 and Tier 2 paths. The targets in the table above are a reasonable starting point for most fintech platforms. But the right number for your system depends on your traffic volume, your user expectations, and your regulatory obligations.

Use our SLO Advisor to get thresholds tailored to your service

Try the SLO advisor

Answer four questions about your service and get specific p95, p99, and error ratio targets — with the reasoning behind each one — ready to configure directly in Gatling Enterprise.

DEV Community

SLO examples for financial services: what good performance looks like in fintech

The number most fintech companies get wrong

Top comments (0)