You’ve been asked to build integrations for your platform. Seems straightforward: call some APIs, normalize the data, display it in the UI. A few weeks of work, tops.
Except it’s not a few weeks. And it’s not straightforward.
I’ve spent few years building integration infrastructure for security platforms. Here’s everything I wish someone had told me before I started.
The Gap Between POC and Production
A proof-of-concept integration is easy. Read the docs, make some calls, parse the response. Done in a day.
Production is a different beast. Here’s the actual checklist:
Authentication Hell
Every vendor does auth differently:
Vendor A: OAuth 2.0 with refresh
headers = {"Authorization": f"Bearer {access_token}"}
Vendor B: API key in header
headers = {"X-API-Key": api_key}
Vendor C: API key as query param (yes, really)
url = f"{base_url}/endpoint?api_key={api_key}"
Vendor D: Custom signature with timestamp
signature = hmac.new(secret, f"{timestamp}{method}{path}".encode(), 'sha256')
headers = {"X-Signature": signature.hexdigest(), "X-Timestamp": timestamp}
And you need to handle token refresh without interrupting syncs. Plus store credentials securely for hundreds of customer connections. Plus handle IP allowlisting for vendors that require it.
Rate Limiting is Harder Than You Think
Every API has rate limits. The fun part is they’re all different:
The nice vendor: returns 429 with retry-after
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 60))
time.sleep(retry_after)
The less nice vendor: just returns 500 when you hit the limit
Good luck figuring out why
The enterprise vendor: different limits per endpoint
/users: 100 req/min
/alerts: 10 req/min
/export: 1 req/hour
When you’re pulling data for hundreds of tenants, rate limits become a constant constraint. You need: - Per-tenant rate limit tracking - Intelligent request queuing - Exponential backoff with jitter - Circuit breakers so one failing integration doesn’t cascade
Pagination Nightmares
Offset pagination (simple but inefficient)
for offset in range(0, total, page_size):
response = client.get(f"/items?offset={offset}&limit={page_size}")
Cursor pagination (better, but cursors expire)
cursor = None
while True:
response = client.get(f"/items?cursor={cursor}&limit={page_size}")
cursor = response.json().get('next_cursor')
if not cursor:
break
Link header pagination (RFC 5988)
while url:
response = client.get(url)
url = response.links.get('next', {}).get('url')
The vendor that changes pagination between API versions
and doesn't document it
The Normalization Problem
This is where it gets really fun. Here’s the same concept. a security alert, across three vendors:
// Illustrative shapes (actual field names vary by vendor and API version)
// CrowdStrike: calls it a "detection" (via /detects/ endpoints)
{
"detection_id": "ldt:abc123...",
"max_severity": 4,
"created_timestamp": "2024-01-15T10:30:00Z",
"device": { "hostname": "..." }
}
// SentinelOne: calls it a "threat" (via /threats endpoint)
{
"id": "123456789",
"threatInfo": {
"classification": "Malware",
"confidenceLevel": "high"
},
"agentRealtimeInfo": { "agentComputerName": "..." },
"createdAt": "2024-01-15T10:30:00.000Z"
}
// Microsoft Defender: calls it an "alert" (incidents are collections of alerts)
{
"alertId": "da637292082891366787_1234567890",
"severity": "high",
"createdDateTime": "2024-01-15T10:30:00.0000000Z",
"evidence": [{ "deviceDnsName": "..." }]
}
Three different structures for the same concept. Different field names, different severity formats (number vs string), different timestamp formats, different nesting structures.
You need to map all of these to a single normalized schema:
Your normalized alert schema
@dataclass
class NormalizedAlert:
id: str
severity: str # "critical", "high", "medium", "low"
timestamp: datetime
hostname: str
source: str
raw_data: dict
Multiply this by 40 vendors across 8 security categories. That’s a lot of mapping logic.
The Multi-Tenant Complexity
All of the above gets exponentially harder with multiple tenants.
Tenant Isolation
This is the one that keeps me up at night:
WRONG: Shared cache without tenant scoping
cache.set("crowdstrike_detections", detections)
RIGHT: Tenant-scoped everything
cache.set(f"tenant:{tenant_id}:crowdstrike:detections", detections)
WRONG: Logging raw data
logger.error(f"API failed: {response.json()}")
RIGHT: Scrubbed logging
logger.error(f"API failed for tenant {tenant_id}: {response.status_code}")
Shared rate limit pools, shared caches, shared logs; any of these can leak data between tenants if you’re not careful.
Credential Storage
You’re storing API credentials for hundreds of connections. This is a high-value target:
Minimum requirements:
- Encrypted at rest (AES-256 or better)
- Encrypted in transit (TLS 1.2+)
- Access controls (which service can access which creds)
- Audit logging (who accessed what, when)
- Key rotation support
- HSM/KMS integration for key management
If you’re building a security product (like a GRC platform), your credential storage needs to pass auditor scrutiny.
Scaling
One customer with one integration is manageable. A hundred customers with ten integrations each is a thousand concurrent connections:
Tenants: 100
Integrations per tenant: 10
Sync frequency: every 15 minutes
API calls per sync: ~50
= 1,000 integrations
= 4,000 sync jobs per hour
= 200,000 API calls per hour
Your architecture needs to handle this without falling over. Queue-based processing, worker pools, connection pooling, database optimization. It just adds up.
The Maintenance Burden
Here’s the part nobody warns you about: building is maybe 30% of the work. Maintenance is 70%.
API Versioning
Your code, working fine
response = client.get("/v1/detections")
Vendor announcement: "v1 deprecated, migrate to v2 by March"
v2 changes:
- Different auth flow
- Different pagination
- Different response schema
- Some fields renamed
- Some fields removed
- New required parameters
Your weekend: gone
Multiply by 40 integrations. You’re dealing with API changes constantly.
Silent Breaking Changes
The worst kind:
What your code expects
device = detection.get("device", {})
hostname = device.get("hostname") # Returns: "workstation-1"
What the API started returning (no announcement)
hostname = device.get("hostname") # Returns: ["workstation-1"]
Your normalization: quietly broken
Customer data: silently wrong
Time to discover: days or weeks
The Real Cost
I’ve seen teams underestimate this consistently:
Initial build: - 20 integrations × 2-3 weeks each = 40-60 weeks of engineering - Plus common infrastructure (auth, rate limiting, queuing) = 8-12 weeks - Plus testing, deployment, monitoring = 4-8 weeks - Total: 12-18 months for a small team
Ongoing maintenance: - 2+ FTEs just to keep integrations running - Every API change = regression testing across affected tenants - Every new integration request = another month of work
Opportunity cost: - Every hour on integrations = hour not spent on your actual product - I’ve seen teams lose 40-50% of engineering capacity to integration work
The Alternative
At some point, you have to ask: is building integration infrastructure actually your core competency?
If you’re building a GRC platform, your value is in compliance logic, risk analysis, and control mapping; not in parsing CrowdStrike’s pagination quirks.
The “buy” option today isn’t just Zapier-style workflow tools. There are now category-level unified APIs that handle all the complexity above. Here’s what using one looks like:
// Using Unizo's SDK (from docs.unizo.ai/docs/sdks/overview)
// npm install @unizo/sdk
import { Unizo } from '@unizo/sdk';
const client = new Unizo({
apiKey: process.env.UNIZO_API_KEY
});
// One call - normalized vulnerabilities from ALL connected scanners
// (Qualys, Tenable, Snyk, etc. - doesn't matter which your customer uses)
const vulnerabilities = await client.security.vulnerabilities.list({
severity: 'high',
status: 'open'
});
// Iterate over normalized results
vulnerabilities.forEach(vuln => {
console.log(${vuln.id}: ${vuln.title} (${vuln.severity}));
});
Or if you prefer raw REST:
// Direct REST call to the same endpoint
const response = await fetch(
'https://api.unizo.ai/v1/security/vulnerabilities?severity=high&status=open',
{
headers: {
'Authorization': Bearer ${process.env.UNIZO_API_KEY},
'Content-Type': 'application/json'
}
}
);
const vulnerabilities = await response.json();
The SDK handles auth, retries, rate limits, and pagination. You get:
One API call to get normalized data across all EDR/VMS/Identity vendors
One webhook endpoint for real-time events from all sources
One auth flow (Connect UI) for your customers to connect any tool
Vendor API changes handled upstream, not in your codebase
The Decision Framework
Before you decide to build:
Count your integrations: How many do you need now? In a year?
Calculate the cost: Fully-loaded engineer cost × months of work
Factor in maintenance: 2+ FTEs ongoing, forever
Consider opportunity cost: What else could those engineers build?
Then compare to embedding existing infrastructure. The math usually favors buying unless integrations are literally your core product.
TL;DR
POC integrations are easy. Production integrations are 10x harder.
Multi-tenancy adds another 5x complexity.
Maintenance is 70% of the work, and it never ends.
The real cost isn’t just engineering time. It’s opportunity cost.
Unless integrations are your core product, you probably shouldn’t build from scratch.
Top comments (0)