DEV Community: Abhishek Shekhar

Building Agentic AI in a Regulated Banking System: What Nobody Tells You

Abhishek Shekhar — Wed, 25 Mar 2026 02:10:28 +0000

By a Backend Lead Engineer | 10+ years building core banking and fintech systems in the UK

2,400 words · 11 min read · Intermediate to Senior Engineers

If you're building AI agents that touch financial decisions in 2026, the architecture choices you make in the next six months will determine whether you survive your first regulatory audit. This is the practical guide — audit logs, guardrails, circuit breakers, EU AI Act compliance, and why you can never let an LLM write directly to financial state.

📌 Part 2 of a series. If you missed Part 1, start here: Fintech Backend Architecture: Building Systems That Don't Break When Money Is Involved

Agentic AI in banking is not coming. It's here. Goldman Sachs is running autonomous agents against core trade systems. RBC is projecting a $1B revenue lift. CIBC has deployed AI copilots to 1,700+ engineers.

And almost every article about it is written for a boardroom, not an engine room.

This one isn't. This is what it actually takes to architect, trust, audit, and govern an AI agent making financial decisions in a regulated environment.

Introduction: Forget the Pilot — You're in Production Now

Here's how most AI-in-banking stories go:

2024: "We're running an exciting pilot."
2025: "Our pilot showed promising results."
2026: "Our agent blocked 40,000 legitimate transactions before anyone noticed."

That last one doesn't make the press releases. But it happens.

The shift from pilot to production is where the real engineering starts. And the real engineering looks nothing like the demo.

The question is no longer "Should we use AI?" — it's "When our AI makes a wrong decision at scale, can we explain it, roll it back, and survive the regulatory review?"

That's what this article is about.

What "Agentic AI in Banking" Actually Means (Not a Chatbot)

I'm not talking about a chatbot that summarises statements.

I mean a system that:

Reads live financial data — transactions, risk signals, account history
Makes a decision without a human approving it
Triggers an action that changes financial state

That's the definition. Hold it in your head.

Here's what that looks like in the wild right now:

Fraud decision agents — block or allow a payment in under 200ms
KYC/AML agents — classify customers, surface suspicious patterns, auto-escalate
Payment routing agents — choose the cheapest, fastest, lowest-risk rail
Compliance monitoring agents — watch every transaction for DORA/FCA violations, continuously
Credit decision agents — approve or decline a lending application

Every single one of these affects real money and real people.

And that changes everything about how you build them. A bug in your API returns a 500. A bug in a fraud agent blocks someone's rent payment.

⚠️ Real-world lesson: The same properties that make an AI agent powerful in banking — speed, scale, autonomy — are exactly the properties that cause catastrophic damage when it goes wrong. Design for failure before you design for success.

Three Problems Nobody Warns You About (Explainability, Non-Determinism, Rollback)

Every conference talk on AI in fintech covers use cases and ROI. Almost none cover these.

Problem 1: Explainability Under FCA and EU AI Act — "The Model Decided" Is Not an Answer

Picture this.

A regulator walks in. Sits down. Slides a sheet of paper across the table.

"Why did your system block Mr. Ahmed's payment on 14 March?"

You cannot say: "the model gave it a 0.73 risk score."

Under the FCA, the EU AI Act, and current UK financial regulation, high-risk AI decisions require documented, human-interpretable explanations. Not attention weights. Not probability distributions. A traceable reasoning chain that a compliance officer can read, understand, and defend.

This is not a future requirement. It is enforceable now.

Problem 2: LLM Non-Determinism in Fraud Detection — A Compliance Violation Waiting to Happen

LLMs produce different outputs for identical inputs. That's a feature in a creative writing tool. In a fraud detection system, it's a compliance violation waiting to happen.

If your fraud agent blocked the same transaction on Tuesday that it approved on Monday — identical inputs, different outcome — you have a legal problem.

You cannot fix this by tuning the model. You fix it by architecting around it. More on that shortly.

Problem 3: AI Agent Rollback at Scale — When It Goes Wrong, It Goes Wrong Fast

An AI agent in production doesn't make one bad decision. It makes thousands. Per minute.

When the model drifts, or a training bug ships, or a fraudster figures out how to game it, you need to:

Detect the problem in seconds
Stop the agent without taking down the payment system
Reverse affected decisions systematically
Explain the full blast radius to Risk and Compliance

None of this is possible if your AI agent writes directly to financial state.

⚠️ Real-world lesson: A fraud model trained on a biased dataset went live on a Friday afternoon. By Saturday morning it had blocked 40% of legitimate transactions from a specific postcode. The rollback took 3 days. The regulatory incident report took 3 weeks. The Friday deployment window was never used again.

The Architecture Pattern That Solves All Three: Read-Reason-Emit

Here's the pattern. It's not complicated. It's just not obvious until someone tells you.

The AI agent must never write directly to financial state. It reads, it reasons, it emits a decision. A separate deterministic service executes that decision.

In practice:

[Event Stream]
       |
       ▼
[AI Agent Layer]    ← reads context, NEVER writes financial state
       |
       ▼
[Decision Queue]    ← append-only, immutable, fully auditable
       |
       ▼
[Execution Service] ← deterministic, idempotent, saga-driven
       |
       ▼
[Ledger / Financial State]

Why this works:

The AI agent is stateless. It reads, reasons, and emits. It never mutates anything.
The Decision Queue is append-only. Immutable. Just like your ledger. Every decision ever made is permanently recorded.
The Execution Service is deterministic. It applies decisions exactly once, idempotently, with full compensation logic.
Rollback is safe. Mark the decision as reversed in the queue. Run compensations. Done.

If this looks familiar, it should. It's the same append-only, idempotent, saga-driven pattern from good fintech backend design. The AI is just a new layer at the top.

⚠️ Real-world lesson: Every team that gave their AI agent direct database write access ended up in an incident. Every single one. Separate the layers. Non-negotiable.

Building an AI Decision Audit Log That Satisfies Regulators

Every AI decision needs a paper trail. Not just the outcome — the complete context that produced it.

This is the minimum schema that will satisfy an FCA or EU AI Act audit:

CREATE TABLE ai_decision_log (
  decision_id       UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  agent_id          VARCHAR(100) NOT NULL,  -- which agent made this call
  model_version     VARCHAR(50)  NOT NULL,  -- exact model version (mandatory)
  input_hash        VARCHAR(64)  NOT NULL,  -- SHA-256 of the full input context
  input_snapshot    JSONB        NOT NULL,  -- the FULL input, not a reference
  decision          VARCHAR(50)  NOT NULL,  -- ALLOW / BLOCK / ESCALATE
  confidence        NUMERIC(5,4),           -- 0.0000 to 1.0000
  reasoning         TEXT         NOT NULL,  -- plain-English explanation
  rules_triggered   JSONB,                  -- every guardrail that fired
  execution_id      UUID,                   -- FK to execution service
  created_at        TIMESTAMPTZ  DEFAULT NOW(),
  reviewed_by       VARCHAR(100),           -- human reviewer if escalated
  reviewed_at       TIMESTAMPTZ
);

Three fields that engineers always want to skip. Don't.

input_snapshot — store the full input, not a reference. Data gets mutated. Audit logs must not. If a regulator pulls a 2-year-old decision, you need to show exactly what the agent saw at that moment.
model_version — mandatory, not optional. When your model gets retrained next month, you need to know which decisions were made by which version. This is also how you scope a rollback.
reasoning — human-readable text generated by the agent as part of its output. Not post-hoc rationalisation. Not a confidence score dressed up as an explanation. Enforce this at the API contract.

⚠️ Real-world lesson: "The model gave it a 0.73 risk score" is not a regulatory explanation. "This transaction was blocked because it exceeded the account's 30-day velocity threshold by 340%, originated from an IP linked to 3 previous fraud reports, and the beneficiary account was opened 6 hours ago" — that is. Build your agents to produce the second one.

Hard Guardrails, Soft Guardrails, and Circuit Breakers for LLM Agents in Fintech

Guardrails are not suggestions for the AI to consider. They are hard stops that the AI layer never sees.

Hard Guardrails — The LLM Never Gets Involved

// These fire BEFORE the AI agent. If triggered: BLOCK, log, done.
const hardGuardrails = {
  maxSingleTransactionGBP: 50_000,
  maxDailyVolumePerAccount: 200_000,
  sanctionedCountries: ['XX', 'YY'],       // OFAC / HMT list
  requiredFields: ['reference', 'beneficiary_name'],
  minAccountAgeForHighValue: 30,            // days
};
// Hard guardrail fires? Block immediately. Don't ask the AI.

Soft Guardrails — The AI Can Override, But It Better Explain Why

// These fire AFTER the AI decision.
// AI says ALLOW + soft guardrail fires = ESCALATE to human review.
const softGuardrails = {
  velocityMultiplierThreshold: 3.0,         // 3x account's normal monthly volume
  newPayeeHighValueThreshold: 5_000,        // first payment ever to this payee
  unusualHours: { start: 1, end: 5 },       // 1am–5am local time
  confidenceMinimum: 0.80,                  // AI confidence must clear 80%
};

Circuit Breakers — Because AI Models Fail Silently If You Let Them

Your circuit breaker watches the AI's decision pattern in real time. The moment something looks wrong, it yanks the AI out of the loop and routes everything to a deterministic fallback.

// Trip conditions — any of these fires the breaker:
blockRateSpike:  +15% block rate in a 5-minute window
allowRateSpike:  +20% allow rate (could mean model is being gamed)
latencyBreach:   P99 > 800ms (agent is struggling under load)
errorRate:       >1% errors in 60 seconds
modelDrift:      decision distribution >2 sigma from 7-day baseline

// On trip: fallback to rules engine, page on-call, open incident.

⚠️ Real-world lesson: Test your circuit breaker in production. Quarterly. Deliberately trigger it. A circuit breaker that has never fired in a drill will not fire reliably when you're at 3am staring at a $2M transaction anomaly.

How to Test an AI Agent in Banking When You Can't Unit Test It

You cannot write "given input X, expect output Y" and call your AI agent tested. The model doesn't work like that.

What you can do:

Shadow Mode Testing — Run it Live, But Without Consequences

Deploy the agent alongside your existing system. It processes every real transaction and logs its decision. But the live decision is still made by the existing rules engine.

Then compare.

Run for a minimum of 4 weeks across all transaction types and volumes
Target ≥98% agreement rate with the existing system before going anywhere near live
Every disagreement gets reviewed manually — these are your highest-signal edge cases
No go-live without sign-off from Risk, Compliance, and Engineering. All three.

Red-Team Testing for Fraud AI — Someone Else Will If You Don't

Fraudsters don't read your model card. They probe your system, find the edges, and exploit them.

Before deployment, hire someone to do it first:

Craft transactions that probe just under every hard guardrail threshold
Test distributional shift — transaction patterns the training data never saw
Test boundary inputs that have never occurred in your historical data
Regression test on every model update. Every single one. No exceptions.

⚠️ Real-world lesson: A retrained fraud model looked great on all benchmarks. The red team found it could be systematically bypassed using split transactions just below the guardrail threshold — a pattern not in the training set. Update rolled back. Two days of red-teaming. Would have been a catastrophic production incident otherwise.

DORA, FCA, and EU AI Act Compliance for AI Agents: What Engineers Must Know

I've seen engineers treat regulation as someone else's problem.

It isn't. Not anymore.

If you build an AI agent that makes financial decisions in the UK or EU, here's what the law already requires:

Model cards for every agent. What the model does, what it was trained on, known failure modes, performance across demographic groups. This is a legal artefact, not documentation for documentation's sake.
Immutable model versioning. Every version that ever went to production must be retained and reproducible. If a claim surfaces about a 2-year-old decision, you need to be able to re-run that exact model.
High-risk AI classification under the EU AI Act. Credit scoring and fraud detection are "high-risk." That triggers mandatory conformity assessments before deployment. Not after.
Mandatory human oversight for high-stakes decisions. Above certain thresholds, a human must be in the loop. Design your escalation queues for this now, not as an afterthought.
Continuous bias monitoring. If your fraud agent is blocking transactions from certain groups at a higher rate, you need automated detection. Manual sampling at scale doesn't cut it.

Most engineering teams are scrambling to retrofit this onto systems that were built without it. Don't be that team.

⚠️ Real-world lesson: Engineers who understand AI governance are rare and extremely well-compensated right now. The vast majority of developers avoid learning it because it seems boring. That's your competitive advantage. Take it.

The Honest Truth About Agentic AI in Banking

This isn't a hard problem. It's a discipline problem.

The patterns that make financial systems reliable apply directly to AI agents. Immutability. Idempotency. Auditability. Circuit breakers. Append-only state. None of this is new.

What's new is having the AI layer sitting on top of all of it — and the discipline to keep it there, instead of letting it reach down and touch financial state directly.

The stack that works:

Append-only decision log — same principles as your ledger
Idempotent execution service — same principles as your payment processor
Hard guardrails that fire before the model is ever consulted
Circuit breakers tested in production, not just in staging
Shadow mode before any agent goes live. Always.

The difference between a bank that deploys AI confidently and one that fears it is not the quality of the model. It's the quality of the architecture around it.

"When this AI agent makes a wrong decision — and it will — can I explain exactly what happened, to a regulator, at 9am on a Monday?"
If yes — you're building it right.

What to Read Next

If this was useful, Part 1 covers the foundational backend patterns that everything in this article builds on:

👉 Fintech Backend Architecture: Building Systems That Don't Break When Money Is Involved

Follow if you want more on distributed systems, fintech backend architecture, and building AI you can actually trust in production. More coming on event-sourced architectures, DORA incident response, and real-time fraud pipelines.

Backend Lead Engineer. 10+ years in UK core banking. Distributed systems, financial data integrity, regulatory compliance, and AI-powered fintech tooling.

Fintech Backend Architecture: Building Systems That Don't Break (When Money Is Involved)

Abhishek Shekhar — Wed, 25 Mar 2026 01:29:33 +0000

By a Backend Lead Engineer | 10+ years building core banking and fintech systems in the UK
2,200 words · 10 min read · Intermediate to Senior Engineers

Fintech backend architecture is one of the most demanding disciplines in software engineering. Unlike typical web applications, financial systems cannot tolerate data loss, silent corruption, or ambiguous state — because the data represents real money.

This guide covers the seven core principles of production-grade fintech backend architecture: from immutable ledger design and idempotency patterns to distributed transaction management, secrets security, observability, and regulatory compliance engineering. Every pattern here has been validated in real UK banking production environments.

Introduction: Why Fintech Backend Architecture Is Different

⚠️ This is not theory. This is what actually works in production.

Building backend systems for a bank is a completely different game. This isn't about building APIs that "mostly work."

If you get it wrong, it's not a bug — it's someone's money.

Over the last decade working inside a UK bank, I've seen:

Systems that scaled cleanly under real production load
Systems that silently corrupted data for weeks before anyone noticed
Systems that passed every test — and failed catastrophically in production

This article is not a tutorial. It's what actually works when regulators are watching every decision, volumes are real and unforgiving, and failure is simply not an option.

1. The Three Rules Every Fintech Backend Engineer Must Follow

Every fintech system lives or dies on three things.

1. Correctness — Money must always be right. Not eventually. Not "close enough." Always.

2. Auditability — If a regulator asks: "What happened to this £100?" — you must answer with data, not assumptions.

3. Resilience — Failures will happen. Your system must not corrupt data when things go wrong, not silently lose state under pressure, and recover predictably every single time.

Everything else — performance, cost, elegance — comes later. I've seen teams optimise the wrong things early. It always comes back as a production incident.

2. Immutable Ledger Design: Never Update Financial Data

If you take one thing from this article, take this:

Never mutate financial state.

Don't update balances. Don't overwrite data. Don't "fix" values in place.

❌ The Wrong Approach: Mutable Balance Updates

UPDATE accounts
SET balance = balance - 100
WHERE account_id = 'A1';

Looks fine. Works fine — until a retry fires twice, a race condition hits under load, or a bug goes undetected for 3 days. Now you don't know what happened. And neither does your auditor.

✅ The Right Approach: Append-Only Ledger

INSERT INTO ledger_entries (
  account_id, amount, direction,
  type, reference_id, created_at
) VALUES (
  'A1', 100.00, 'DEBIT',
  'PAYMENT', 'txn-123', NOW()
);

Balance is always derived — never stored:

SELECT
  SUM(CASE WHEN direction = 'CREDIT'
    THEN amount ELSE -amount END)
FROM ledger_entries
WHERE account_id = 'A1';

Why immutable ledger design matters:

Reconstruct the exact balance at any point in history
Explain every transaction to a regulator with a single query
Bugs create traceable entries — not silent corruption
Race conditions drop dramatically — writes are inserts, not read-modify-write

⚠️ Real-world lesson: Ledger tables grow FAST. 10M+ records/month is normal at scale. Without partitioning from day one, performance collapses. Partition by created_at month — before you need it, not after.

3. Idempotency in Payment APIs: Your Distributed Systems Safety Net

In distributed payment systems, retries are not optional. They WILL happen — network timeouts mid-payment, load balancer retries, mobile clients with flaky connections, internal service retries.

The question is never "will a request be retried?" — it's "when it's retried, is it safe?"

Implementing Idempotency Keys in Payment APIs

Every payment endpoint must accept an idempotency key from the client:

POST /v1/payments
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
Authorization: Bearer {token}

{ "amount": 100.00, "currency": "GBP", "to": "ACC456" }

Store the key with the result the first time it's processed:

CREATE TABLE idempotency_keys (
  key         VARCHAR(255) PRIMARY KEY,
  response    JSONB NOT NULL,
  status_code INT NOT NULL,
  created_at  TIMESTAMPTZ DEFAULT NOW(),
  expires_at  TIMESTAMPTZ DEFAULT NOW() + INTERVAL '24 hours'
);

Same key arrives again? Return the stored response. No re-processing. No double charge.

Idempotency key best practices:

Use UUIDs v4 — not sequential IDs
Expire after 24–48 hours — not forever
Return the EXACT same status code and body on replay
Log every replay — it's a useful operational signal

⚠️ Real-world lesson: I've seen the same payment processed 7 times in 4 seconds because a mobile client retried on a slow network. Without idempotency keys, that's 7 debits. With them, it's 1 debit and 6 instant cache hits.

4. The Saga Pattern: Handling Distributed Transactions in Fintech

You move £100 from Account A to Account B. That's two writes. System crashes after the debit, before the credit. You've just lost £100.

Two-phase commit (2PC) solves it — in theory. In practice it brings lock contention, coordinator failures, and a throughput cliff. Most modern fintech backend systems use the Saga pattern instead.

How the Saga Pattern Works in Payment Processing

A Saga is a sequence of local transactions. Each step has a defined compensating action. If step 3 fails, the system runs compensations for steps 2 and 1 — automatically.

// Payment Saga — compensating transactions:
Step 1: Debit Account A     → Compensate: Credit Account A
Step 2: Credit Account B    → Compensate: Debit Account B
Step 3: Send confirmation   → Compensate: Send reversal event
Step 4: Update status       → (terminal — no compensation needed)

💡 Write your compensation logic BEFORE your forward logic. If you can't define the compensating transaction, you don't understand the operation well enough to build it.

⚠️ Real-world lesson: The Saga pattern gives you eventual consistency with a full audit trail of every forward step and every compensation that ran. Regulators love this. On-call engineers love this even more.

5. Fintech Security Architecture: Secrets, mTLS, and Fraud Prevention

In most web systems, security is layered on top. In fintech backend architecture, security is baked into every decision from day one.

Secrets Management in Financial Systems

No credentials in environment variables. No credentials in config files. Definitely not in source code.

AWS Secrets Manager or HashiCorp Vault — mandatory, not optional
Rotate secrets automatically — 90-day maximum lifetime for any credential
Every service has its own credentials — no shared database users
Audit log every secret access — you need to know when and by whom

⚠️ Real-world lesson: I have seen a production database password live in a .env file committed to a private GitHub repo for 14 months. It was found during a security audit, not a breach. That time, they were lucky.

mTLS for Internal Service Communication

Internal service-to-service calls should use mutual TLS (mTLS), not just TLS. Both sides present certificates. A compromised internal service can't impersonate another. Istio or Linkerd handles this at the infrastructure level — your application code stays clean.

Rate Limiting as a Fraud Detection Signal

Rate limiting in fintech isn't just DDoS protection — it's fraud intelligence. A legitimate user doesn't send 200 payment requests in 60 seconds.

Global: requests per IP per minute at infrastructure level
Per user: transactions per hour per account at application level
Velocity triggers: unusual patterns → step-up authentication, not hard blocks

6. Observability in Fintech Systems: Structured Logs, Tracing, and Business Metrics

Structured Logging for Financial Services

A string log: "Payment failed for user 123" is useless at 3am.

A structured log:

{
  "event": "payment.failed",
  "user_id": "123",
  "payment_id": "PAY-456",
  "reason": "insufficient_funds",
  "amount": 100.00,
  "currency": "GBP",
  "timestamp": "2026-03-18T09:23:11Z",
  "service": "payment-processor"
}

This is queryable. Alertable. It feeds your compliance dashboards. The string version feeds only frustration.

Distributed Tracing with OpenTelemetry

A single payment touches 6–10 services. When it fails, you need the exact path. Instrument with OpenTelemetry from day one — not after a production incident proves you needed it.

Business Metrics Alongside Technical Metrics

Your SRE watches p99 latency. Your CFO watches payment success rate. Build dashboards for both from the same data pipeline. Grafana and DataDog handle this well. Your on-call engineer and your board meeting both benefit.

7. Compliance as Code: DORA, PCI-DSS, and FCA Requirements

DORA. PCI-DSS. ISO 27001. FCA requirements. The regulatory landscape for fintech is dense and it is enforced. The teams that handle it best don't treat compliance as an audit exercise — they treat it as an engineering requirement.

Compliance engineering in practice:

Data retention policies enforced at the database level — not by a manual process someone forgets
PII fields encrypted at rest with automated key rotation — not a post-launch task
Audit logs immutable and replicated to write-once storage — S3 Object Lock works well
Access reviews automated — quarterly reports from your IAM system, not spreadsheets
Change management tracked with mandatory risk assessment fields — not informal Slack messages

DORA specifically requires documented evidence of operational resilience testing. If you're not generating structured resilience test reports now, you will be scrambling later.

⚠️ Real-world lesson: Engineers who understand compliance earn more, get promoted faster, and have a dramatically easier time selling tools into the fintech sector. Most developers actively avoid learning it. That's your competitive advantage.

Summary: What Makes a Production-Grade Fintech Backend

Fintech backend architecture rewards one type of engineer above all others: the one who prioritises correctness over cleverness, and auditability over speed-of-delivery.

The technology stack is not exotic:

PostgreSQL for the immutable ledger
Kafka or SQS for saga orchestration events
OpenTelemetry for distributed tracing
HashiCorp Vault for secrets management

The difference is discipline. Not technology.

The question to ask at every architectural decision in fintech isn't "will this scale?"

It's: "When this fails — can I explain exactly what happened, to a regulator, at 9am on a Monday?"

If the answer is yes — you're building it right.

Found this useful? Follow for more articles on distributed systems, financial data integrity, and regulatory compliance engineering in production fintech environments.

Backend Lead Engineer. 10+ years building core banking systems in the UK. Specialises in distributed systems, financial data integrity, and regulatory compliance engineering. Currently building AI-powered tooling for fintech compliance teams.