Building Compliance APIs: Why Single-Vendor Solutions Break at Scale
Your compliance API just hit 10,000 KYC checks per day, and suddenly your single-vendor solution is throwing 504 timeouts during peak hours. Sound familiar?
I've watched countless engineering teams face this exact scenario. What starts as a simple integration with one KYC provider becomes a distributed systems nightmare when business requirements evolve. Let me walk you through the architectural decisions that can save you months of technical debt.
The Hidden Complexity of Compliance Infrastructure
Compliance APIs look deceptively simple from the outside. Send a user's details, get back a pass/fail response. But peek under the hood and you'll find:
- Document verification services with 99.2% uptime (not 99.9%)
- Identity databases that rate-limit aggressively during fraud spikes
- Sanction screening APIs that can take 30+ seconds for complex cases
- Regional data residency requirements that fragment your architecture
When you're building on a single vendor, you're essentially running a distributed system with only one node. Every outage becomes your outage. Every performance bottleneck becomes your bottleneck.
// What your code looks like with a single vendor
async function performKYC(userData) {
try {
const result = await singleVendorAPI.verify(userData);
return result;
} catch (error) {
// If this fails, your entire KYC pipeline is down
throw new Error('KYC verification failed');
}
}
When Single Vendors Make Sense (Yes, Really)
Despite the scaling issues, single-vendor solutions aren't inherently wrong. They excel in specific scenarios:
Early-stage startups processing fewer than 1,000 verifications monthly often benefit from the reduced integration overhead. Your engineering team can focus on core product features rather than building compliance infrastructure.
Highly regulated industries with strict audit requirements sometimes prefer the simplified compliance story. Having one vendor means one security assessment, one data processing agreement, one point of regulatory responsibility.
Geographically constrained businesses operating in a single region might find that one vendor covers all their regulatory requirements without the complexity of routing logic.
But here's the critical insight: these advantages disappear the moment you need to scale beyond your vendor's limitations.
The Orchestration Alternative: Building Resilient Compliance
Orchestration platforms flip the script. Instead of being locked into one provider's infrastructure decisions, you get to make your own architectural choices:
// Orchestration approach with failover
async function performKYC(userData, config) {
const providers = config.providers; // [vendorA, vendorB, vendorC]
for (const provider of providers) {
try {
const startTime = Date.now();
const result = await provider.verify(userData);
// Log performance metrics
metrics.recordLatency(provider.name, Date.now() - startTime);
if (result.confidence > config.minimumConfidence) {
return result;
}
} catch (error) {
// Log error and try next provider
logger.warn(`Provider ${provider.name} failed:`, error);
continue;
}
}
throw new Error('All KYC providers failed');
}
Designing Your Orchestration Layer
The key to successful orchestration lies in understanding that compliance providers aren't commodity APIs. Each has different strengths, pricing models, and failure modes.
Provider Selection Logic
# Example routing configuration
routing_rules:
- condition: "country == 'GB'"
primary_provider: "uk_specialist"
fallback_providers: ["global_provider_a"]
- condition: "document_type == 'passport'"
primary_provider: "document_specialist"
fallback_providers: ["global_provider_b"]
- condition: "risk_score > 7"
providers: ["high_accuracy_provider"]
require_manual_review: true
Handling Response Normalisation
Different vendors return data in completely different formats. Your orchestration layer needs to normalise these into a consistent schema:
interface NormalisedKYCResult {
status: 'approved' | 'rejected' | 'manual_review';
confidence: number; // 0-10 scale
checks: {
identity: CheckResult;
document: CheckResult;
address: CheckResult;
sanctions: CheckResult;
};
metadata: {
provider: string;
processingTime: number;
costInCredits: number;
};
}
function normaliseVendorAResponse(rawResponse: VendorAResponse): NormalisedKYCResult {
return {
status: mapVendorAStatus(rawResponse.verification_result),
confidence: rawResponse.score * 10, // Vendor A uses 0-1 scale
checks: {
identity: {
passed: rawResponse.identity_check === 'PASS',
details: rawResponse.identity_details
},
// ... map other fields
},
metadata: {
provider: 'vendor_a',
processingTime: rawResponse.processing_time_ms,
costInCredits: calculateCost(rawResponse.service_level)
}
};
}
Performance Monitoring and Circuit Breakers
In a multi-vendor setup, monitoring becomes crucial. You need real-time visibility into each provider's performance:
class ProviderCircuitBreaker {
constructor(provider, config) {
this.provider = provider;
this.failureThreshold = config.failureThreshold || 5;
this.resetTimeout = config.resetTimeout || 60000;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.failureCount = 0;
}
async call(method, ...args) {
if (this.state === 'OPEN') {
if (Date.now() - this.lastFailureTime > this.resetTimeout) {
this.state = 'HALF_OPEN';
} else {
throw new Error(`Circuit breaker OPEN for ${this.provider.name}`);
}
}
try {
const result = await this.provider[method](...args);
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN';
}
}
}
The Infrastructure Trade-offs
Here's where it gets interesting from an ops perspective. Orchestration introduces complexity that your infrastructure needs to handle:
Latency considerations: Your P95 response time is now bounded by your slowest provider, not your fastest. You'll need timeout strategies and async processing for slow checks.
Cost optimisation: Different providers have different pricing models. Some charge per API call, others use credit systems, some have monthly minimums. Your orchestration layer should route based on cost efficiency, not just accuracy.
Data residency: GDPR and other regulations mean you can't always use your preferred provider. Your routing logic needs to understand where data can legally be processed.
Implementation Patterns That Work
After working with compliance teams at Zenoo, I've seen a few patterns that consistently work:
- Start with two providers minimum - Even if you're small, the redundancy pays for itself during the first major outage
- Build provider-agnostic data models first - Don't let your database schema mirror one vendor's API structure
- Implement gradual rollouts - When adding new providers, route 5% of traffic initially and monitor quality metrics
- Cache aggressively - Many compliance checks on the same user data within 24 hours can use cached results
We explored this in depth on the Zenoo blog, examining how different orchestration strategies impact both technical performance and regulatory compliance outcomes.
The Bottom Line for Engineering Teams
Single-vendor solutions optimise for initial development speed at the cost of long-term flexibility. Orchestration platforms optimise for operational resilience at the cost of initial complexity.
The inflection point typically occurs around 5,000-10,000 monthly verifications, when provider limitations start impacting user experience. But the architectural decisions you make early determine how painful that transition becomes.
If you're building compliance infrastructure today, assume you'll need multiple providers eventually. Design your data models, error handling, and monitoring with that future in mind. Your future self (and your on-call rotation) will thank you.
Stuart Watkins is CEO of Zenoo, a compliance orchestration platform that helps engineering teams build resilient KYC and AML infrastructure.
Top comments (0)