It worked yesterday. It worked in staging. It passed all your tests. Now it's 2 AM, your phone is buzzing, and the integration is returning garbage.
APIs break. Not because anyone did something wrong, but because distributed systems are inherently fragile. The question isn't whether your integration will fail — it's whether you've prepared for when it does.
The Usual Suspects
After watching hundreds of integrations fail, patterns emerge. Most failures fall into a handful of categories.
The API Changed
Not maliciously. Not announced with a banner. Just... quietly changed.
- A new required field appeared in the request
- A field you depended on now returns
nullsometimes - The date format shifted from ISO 8601 to Unix timestamps
- A nested object became an array (or vice versa)
- Error codes got renumbered
Versioned APIs help, but not everyone versions properly. And even versioned APIs eventually deprecate old versions.
How to survive it: Don't assume response structures are static. Validate that expected fields exist before accessing them. Log anomalies so you notice changes before they cause outages.
Network Issues
The internet is a series of tubes, and sometimes tubes get clogged.
- DNS resolution fails
- TLS handshake times out
- The connection succeeds but the response never arrives
- The response arrives but it's truncated
- A proxy in the middle mangles the request
These failures are transient. Retry and they usually work. But if your code doesn't retry, a single packet drop becomes a user-facing error.
How to survive it: Implement retries with exponential backoff. Set reasonable timeouts — not too short (false failures) or too long (hung requests block everything).
Authentication Problems
Tokens expire. Keys get rotated. Scopes get restricted.
- Your OAuth token expired and refresh failed
- Someone rotated the API key and forgot to update the config
- The key works but doesn't have permission for this specific endpoint
- You're sending the key in query params and a proxy stripped it
How to survive it: Handle 401 and 403 responses explicitly. Implement token refresh flows. Don't embed keys in code — use environment variables or secret managers that can be updated without deployment.
Rate Limits and Quotas
You hit the ceiling. Requests start failing with 429 or counting against tomorrow's quota.
Sometimes this is your fault (inefficient code, missing caching). Sometimes it's growth — you launched a feature that tripled API usage overnight. Sometimes it's external — a bot hit your site and your backend hammered the API.
How to survive it: Monitor usage trends. Implement request queuing. Cache aggressively. Have a plan for when you need to upgrade tiers quickly.
The Provider Is Down
It happens. Even the biggest providers have outages. AWS has bad days. Stripe has bad days. Everyone has bad days.
When the API is down, your retry logic won't help. The API simply isn't there.
How to survive it: Detect extended failures and degrade gracefully. Queue non-critical requests for later. Show users a meaningful error instead of a stack trace. Consider backup providers for mission-critical functions.
Signs Your Integration Is Fragile
Some patterns look fine until they don't:
Hardcoded values that should be dynamic. API base URLs, version numbers, field names — anything that could change but is baked into your code.
No timeout specified. Default timeouts are often too generous. A hung request with no timeout blocks your thread indefinitely.
Swallowing errors silently. catch (e) { /* ignore */ } means you'll never know something broke until users complain.
Assuming happy paths. Your code handles 200 OK perfectly. What about 201? 202? 204? 301? They all mean different things.
Tight coupling to response structure. If you're accessing response.data.user.profile.email.address without null checks, one missing property crashes everything.
Building Resilient Integrations
Resilience isn't one big thing. It's a collection of small practices that add up.
Validate before you trust
Don't assume the response is what you expect. Check that required fields exist and have the right types.
function validateUserResponse(data) {
if (!data?.id || typeof data.id !== 'string') {
throw new Error('Invalid user response: missing or invalid id');
}
if (!data?.email) {
throw new Error('Invalid user response: missing email');
}
return data;
}
This adds lines of code. It also catches breaking changes before they propagate through your system.
Fail loudly, recover quietly
Log every failure with enough context to debug: the endpoint, the request payload, the response (or lack thereof), the timestamp.
But don't fail loudly to users. Show them something useful while you figure out what went wrong.
Implement circuit breakers
If an API fails 10 times in a row, stop hammering it. A circuit breaker "opens" after repeated failures, rejecting new requests immediately until the API recovers.
This prevents cascading failures. Your downed dependency doesn't take down your entire application.
Have fallback behavior
What should happen if the email validator is unavailable?
- Accept the email anyway and validate later?
- Use a basic regex check as a fallback?
- Block signup entirely with a "try again later" message?
Each has trade-offs. The point is to decide before the outage, not during.
Monitor third-party health
Don't wait for user complaints to discover an API is down. Check the pinger API periodically or set up uptime monitoring. Detect problems before they become incidents.
The Consistency Factor
Part of why integrations break is inconsistency across providers. Different auth methods, different error formats, different response schemas.
When you're integrating with five APIs, you're learning five different patterns. Edge cases multiply. Testing becomes complex.
This is why unified API platforms exist. One auth method, one error format, one response structure — across hundreds of APIs. Fewer surprises means fewer 2 AM phone calls.
Debugging When It Breaks
When something fails, gather evidence before guessing:
- Check their status page. Is the provider having an outage?
- Test the exact request. Use curl or Postman to isolate your code from the equation.
- Compare to what worked. What changed? New deployment? Updated dependency? Different input data?
- Read the full response. Error messages often tell you exactly what went wrong — if you look.
- Check timestamps. Did the failure start at a specific time? Coincide with a deploy?
Most failures have obvious causes once you stop assuming and start observing.
API integrations are dependencies, and dependencies break. The difference between a robust integration and a fragile one isn't luck — it's preparation.
Build integrations that expect failure, handle it gracefully, and tell you when it happens. Your future self (and your sleep schedule) will thank you.
Ready to build? Get your API key and check out the error handling docs.
Originally published at APIVerve Blog
Top comments (0)