Makini

Posted on Apr 3

Designing APIs for Heterogeneous Enterprise Systems: Lessons from 2000+ Integrations

#architecture #api

At Makini, we've spent six years building integrations across enterprise systems—ERP platforms like SAP and Oracle, CMMS solutions like IBM Maximo, WMS systems like Manhattan, and over 2000 other industrial software platforms. This experience has taught us hard lessons about API design, data modeling, and the unique challenges of creating unified interfaces for fundamentally different systems.

The Standardization Problem:

Enterprise systems weren't designed with third-party integrations in mind. They evolved to solve specific industry problems, which means their data models reflect decades of domain-specific decisions. A "purchase order" in SAP has different required fields, validation rules, and lifecycle states than a purchase order in Oracle. Yet conceptually, they represent the same business entity.

The naive approach is to create a unified data model that's the union of all fields across all systems. This produces an API with hundreds of optional fields where clients never know which combinations are valid for which systems. The result is an interface that's technically complete but practically unusable.

Our Approach: Core Models with Extension Points:

We've settled on a pattern where unified models define a core set of common fields that exist across most systems, use consistent naming conventions regardless of source system terminology, include standard validation rules that apply universally, and provide clear semantic meaning for each field.

System-specific fields that don't fit the common model go into an extended_properties object that's indexed by field name and includes metadata about the field's type and validation rules. This preserves access to all data while keeping the common case simple.

Here's a simplified example of how this looks in practice:

{
  "id": "po_123",
  "type": "purchase_order",
  "vendor_id": "vendor_456",
  "total_amount": 15000.00,
  "currency": "USD",
  "status": "approved",
  "line_items": [...],
  "extended_properties": {
    "sap_plant_code": {
      "value": "US01",
      "type": "string",
      "description": "SAP-specific plant identifier"
    },
    "custom_approval_level": {
      "value": 3,
      "type": "integer",
      "description": "Internal approval hierarchy level"
    }
  }
}

Handling Authentication Flows:

Enterprise systems use wildly different authentication mechanisms. We've encountered OAuth 2.0 with various grant types, API keys with different header formats, session-based authentication requiring login flows, SAML and other SSO protocols, and legacy systems with custom authentication schemes.

Rather than exposing this complexity to API consumers, we implement a credential management layer that normalizes authentication across systems. Developers work with connection objects that abstract the underlying auth mechanism. Our authentication module handles token refresh, session management, and credential storage automatically.

The key architectural decision was separating connection establishment (which happens once per customer) from API requests (which happen continuously). Connection establishment can be complex and user-interactive; API requests must be simple and programmatic.

Rate Limiting and Backpressure:

Different systems have vastly different rate limits, and some don't document their limits at all. We've built adaptive rate limiting that learns system behavior over time, implements per-connection backpressure to avoid overwhelming source systems, queues requests when approaching rate limits, and provides webhook-based notifications when async operations complete.

For systems without documented rate limits, we start conservatively and gradually increase request rates while monitoring for rate limit errors. This auto-tuning approach has proven more reliable than trying to discover limits through documentation or support channels.

Versioning Strategy:

With 2000+ integrations, we need to handle API versioning at multiple levels. The source system's API version can change, our unified API version evolves with new features, and customer-specific configurations may be pinned to specific versions.

Our versioning approach uses semantic versioning for the unified API with backward compatibility guarantees, per-connection version pinning so customers control when they upgrade, transformation layers that translate between unified API versions, and deprecation windows with clear migration paths.

We maintain multiple API versions simultaneously in production, which adds operational complexity but prevents breaking changes from disrupting customer operations.

Error Handling and Observability:

When an API call fails, it could be due to network issues, the source system being down, invalid data format, business rule violations in the source system, or authentication/permission problems. Each category requires different handling and different information for debugging.

Our error responses include structured error codes that clients can switch on programmatically, request IDs that trace through our entire stack and into source systems where possible, contextual information about what operation was being attempted, and actionable remediation steps when we can determine them.

For observability, we emit structured logs at API boundaries, track request latency percentiles per source system, monitor error rates with automatic alerting, and maintain dashboards showing health metrics for each connected system.

The Webhook Challenge:

Many enterprise systems support webhooks for real-time notifications, but implementations vary wildly. Some send webhooks with full payloads, others send just IDs requiring follow-up API calls, signature schemes for verification differ across systems, and retry behavior is inconsistent or undocumented.

We normalize this by receiving webhooks from source systems, verifying signatures using system-specific methods, transforming payloads into our unified event format, and delivering to customer webhook endpoints with guaranteed delivery semantics.

Our webhook infrastructure includes replay capabilities for debugging, automatic retry with exponential backoff, dead letter queues for persistently failing deliveries, and monitoring to detect when source systems stop sending expected webhooks.

Performance Considerations:

Synchronizing data across thousands of connections requires careful attention to performance. We use connection pooling with per-system tuning for optimal throughput, parallel request processing with per-connection rate limiting, intelligent caching based on data volatility patterns, and incremental sync strategies that minimize data transfer.

For large datasets, we implement cursor-based pagination that works across systems with different pagination schemes, streaming responses to avoid memory pressure, and background jobs for operations that exceed reasonable request timeouts.

Testing Strategy:

Testing integrations with hundreds of enterprise systems presents unique challenges. Most systems are expensive to spin up test instances for, many have complex setup requirements, and production data can't be used for testing due to compliance requirements.

Our testing approach includes contract tests that verify API response schemas, integration tests against sandbox environments where available, snapshot testing to detect unexpected API changes, and synthetic monitoring in production to catch issues before customers do.

We also maintain a library of anonymized test fixtures captured from real customer connections, which helps us catch edge cases that wouldn't appear in clean test data.

Lessons Learned:

After building thousands of integrations, some patterns are clear. Documentation is often incomplete or outdated—observing actual API behavior is more reliable than trusting docs. Error messages from enterprise systems can be cryptic or entirely absent—good error handling must account for this. System behavior varies between versions even when APIs are supposedly compatible. Hidden rate limits and undocumented throttling are common.

The most important lesson is that building a unified API isn't about finding the perfect abstraction—it's about making pragmatic trade-offs that optimize for developer experience while preserving system-specific capabilities when needed.

Open Questions:

We're still exploring better approaches to several challenges. How do you design APIs that gracefully handle source system downtime without confusing clients? What's the right balance between eagerly validating input versus passing it through to source systems? How do you version data models when the underlying systems evolve in incompatible ways? What testing strategies work for systems you can't easily replicate?

If you're working on similar problems—API design for heterogeneous systems, integration platforms, or enterprise software connectivity—we'd love to hear your approaches and learn from your experiences.

More on integration architecture: https://www.makini.io/integrations

DEV Community

Designing APIs for Heterogeneous Enterprise Systems: Lessons from 2000+ Integrations

Top comments (0)