Stuart Watkins

Posted on May 27 • Originally published at zenoo.com

We replaced 73 hours of weekly alert triage with 10 AI agents. Here is what the architecture looks like.

#ai #agents #architecture #automation

TL;DR: Most compliance teams spend 73 hours a week on alert triage. Around 95% of those alerts are noise. We built 10 AI agents that investigate alerts in parallel, reversing the time ratio so analysts spend 70% of their time on genuine risk instead of confirming nothing is wrong. This post walks through the architecture, the pain points that drove it, and what actually changed.

Last month, a compliance engineer at a European neobank showed me their Grafana dashboard. 200 alerts a day. Average handling time: 22 minutes per alert. That is 73 hours of analyst time per week, burned on triage.

The worst part? By the FCA's own estimates, around 95% of those alerts turn out to be false positives. Senior analysts, people you are paying £75k or more, spending their days confirming that nothing is wrong.

If you are building compliance infrastructure and this sounds familiar, this post is for you.

The problem is not detection. It is investigation.

Threshold-based transaction monitoring systems are good at flagging. They are terrible at context. A £9,500 transfer triggers the same alert whether it is a first-time sender to a high-risk jurisdiction or a recurring payment to a known supplier.

The analyst then has to pull together data from four or five sources manually: transaction history, screening results, corporate registry data, adverse media, and prior case notes. That is the 22 minutes. Not the decision. The assembly.

McKinsey's 2024 KYC/AML Benchmark found that banks across North America, Europe, and Asia-Pacific allocate 10-15% of full-time employees to KYC and AML tasks. Most of that time goes to manual data reconciliation, document collection, and routine alert processing. The actual risk judgement, the thing you hired analysts for, gets squeezed into whatever time is left.

Why threshold-based systems keep failing

If you have worked with traditional AML monitoring, you know the pattern:

interface ThresholdRule {
  ruleId: string;
  metric: 'transaction_amount' | 'frequency' | 'jurisdiction_risk';
  operator: 'gt' | 'lt' | 'eq';
  threshold: number;
  action: 'flag' | 'block' | 'escalate';
}

const legacyRules: ThresholdRule[] = [
  {
    ruleId: 'CTR-001',
    metric: 'transaction_amount',
    operator: 'gt',
    threshold: 9500,
    action: 'flag',
  },
  {
    ruleId: 'FREQ-003',
    metric: 'frequency',
    operator: 'gt',
    threshold: 5,
    action: 'escalate',
  },
];

Rigid. Context-free. Every rule generates alerts independently, with no correlation between them. The system does not know (or care) that the customer has made this same payment monthly for two years.

The industry is shifting toward AI-driven predictive analytics that analyse transaction records, ambient risk metrics, and behavioural trends. But most teams I talk to are still stuck on the threshold model, drowning in alerts they cannot turn off because the regulator expects a documented reason for every rule change.

What a multi-agent investigation architecture looks like

When we built this at Zenoo, we made a deliberate choice: not a chatbot. Not a single model doing everything. 10 agents, each with a specific investigation responsibility.

Here is a simplified version of how the orchestration works:

interface InvestigationAgent {
  agentId: string;
  role: AgentRole;
  dataSources: string[];
  outputSchema: string;
  maxLatencyMs: number;
}

type AgentRole =
  | 'transaction_pattern'
  | 'screening'
  | 'adverse_media'
  | 'corporate_registry'
  | 'pep_check'
  | 'jurisdiction_risk'
  | 'behavioural_baseline'
  | 'document_verification'
  | 'network_analysis'
  | 'case_synthesis';

interface AlertInvestigation {
  alertId: string;
  customerId: string;
  agents: InvestigationAgent[];
  status: 'pending' | 'investigating' | 'synthesised' | 'escalated';
  startedAt: number;
  completedAt: number | null;
}

interface SynthesisResult {
  alertId: string;
  riskScore: number;
  recommendation: 'dismiss' | 'review' | 'escalate';
  evidence: AgentFinding[];
  confidence: number;
}

interface AgentFinding {
  agentRole: AgentRole;
  finding: string;
  dataSource: string;
  relevance: number;
}

The key design decisions:

Parallel, not sequential. All 10 agents run concurrently against the same alert. The case_synthesis agent waits for findings from the other nine before producing a recommendation. This is what collapses the investigation time.

Each agent owns its data source. The screening agent connects to sanctions lists. The corporate_registry agent pulls UBO data and cross-references it. The adverse_media agent runs real-time searches. No agent tries to do everything.

Typed output schemas. Every agent returns a structured AgentFinding with a relevance score. The synthesis agent does not have to parse free text. It works with structured evidence.

Human-in-the-loop for escalations. The agents investigate. Analysts decide. The 70/30 time ratio reversal means analysts now spend 70% of their time on genuine risk cases and 30% on reviewing dismissed alerts, instead of the other way around.

How perpetual KYC fits in

One of the architectural patterns we see gaining traction is perpetual KYC (pKYC): continuous, event-driven monitoring rather than periodic reviews. When a customer's status changes (a new PEP designation, a sanctions list update, a significant change in transaction behaviour), the system automatically recalculates their risk score and triggers a review if needed.

This is the "event-driven compliance" shift the industry has been talking about, and it maps cleanly to an agent-based architecture. Each status change event fans out to the relevant agents. The synthesis agent determines whether the change is material enough to warrant analyst attention.

For KYB specifically, this matters even more. Manual KYB processes take 24 to 30 days on average. AI-powered verification can reduce that to 2 to 3 minutes for UBO verification and cross-referencing against global sanctions lists. But the real value is not just speed at onboarding. It is continuous monitoring of corporate structures that change.

AMLD5 mandates online personal information checks for financial businesses and emphasises KYB in AML compliance. Paired with PSD2's API-based data sharing requirements, the regulatory direction is clear: automated, continuous, and auditable.

What actually changes

The Head of Compliance at a UK challenger bank we work with put it bluntly: "We were hiring analysts to feed the alert queue. Now we are hiring analysts to investigate actual risk."

The numbers tell the story. When you take 200 daily alerts and run them through 10 investigation agents in parallel, the 95% that are false positives get documented, evidenced, and dismissed automatically. The remaining 5% arrive on an analyst's desk with a full investigation package: transaction patterns, screening results, corporate registry data, adverse media findings, and a synthesised risk assessment.

That is the difference between a chatbot and an investigation architecture. A chatbot answers questions. Agents do the work.

If you are building compliance flows

The compliance failures in KYC, AML, and fraud controls are the top cause of neobank shutdowns. Weak controls are not just a regulatory risk. They are an existential one.

If you are an engineer building these systems, the architecture choices you make now (threshold vs. predictive, sequential vs. parallel, chatbot vs. multi-agent) determine whether your compliance team scales or drowns.

We have built this at Zenoo so teams do not have to architect it from scratch. 10 agents. Full investigation pipeline. Not a chatbot.

If you are building compliance integrations and want to see how the orchestration works with your own data, check out zenoo.com.

Stuart Watkins is CEO of Zenoo, where he builds compliance infrastructure for fintechs and financial institutions.

DEV Community