Designing APIs for Scale -Lessons from Building Systems Handling 10M+ Requests

#api #performance #distributedsystems

When I first started building APIs, I thought scalability was just about adding more servers. After architecting systems that now handle over 10 million requests daily, I've learned that true scalability starts with fundamental design decisions made long before you hit your first performance bottleneck.

The Hidden Costs of Poor API Design

In 2022, our team inherited an API that was buckling under 2 million daily requests. The symptoms were familiar: response times was towards 5 seconds, frequent timeouts, and a growing list of tickets, and a growing list of unhappy and frustrated clients. Initially, I thought the problem was infrastructure but it wasn't — it was architectural debt accumulated through well-intentioned but shortsighted design decisions.

The API suffered from three critical design flaws that I see in many growing systems:

Over-fetching by design: Endpoints returned massive JSON payloads because "clients might need this data someday." A simple user profile endpoint was returning 847 fields, including nested objects that required 6 additional database queries.

Synchronous dependency chains: Every request triggered a cascade of internal API calls, where any single service degradation brought down the entire system.

Resource-agnostic pagination: Same pagination strategy that was applied to lightweight user data was also applied to heavy analytics reports, creating memory exhaustion especially during peak hours.

Principle 1: Design for Selective Data Retrieval

The most impactful change we made was implementing field selection at the API contract level. Instead of returning everything and letting clients ignore what they don't need, we made intentionality the default.

Before: 847 fields, 6 DB queries, 2.3s average response
GET /api/users/123

After: Client specifies needs, 1 DB query, 120ms average response
GET /api/users/123?fields=id,name,email,last_login

This wasn't just about adding query parameters. We redesigned our data access layer to construct database queries dynamically based on requested fields. The result: 95% reduction in average response time and 78% reduction in database load.

Implementation insight: We built a field dependency graph that automatically includes related fields when needed. Request user.profile.avatar_url? The system knows to fetch user.profile.avatar_id without explicit instruction.

Principle 2: Embrace Asynchronous Operations

One of our most successful architectural decisions was distinguishing between operations that must be synchronous and those that could be asynchronous. This fundamental shift changed how we approached API design entirely.

For operations with side effects—sending emails, generating reports, updating external systems—we implemented a pattern I call "acknowledge and process":
POST /api/reports/generate
{
"type": "user_analytics",
"date_range": "2023-01-01/2023-12-31"
}

Response (201 Created):
{
"job_id": "analytics_789",
"status": "processing",
"estimated_completion": "2023-11-15T10:30:00Z",
"status_url": "/api/jobs/analytics_789"
}

This pattern reduced our 95th percentile response times from 12 seconds to under 500ms. More importantly, it eliminated the cascading failures that occurred when downstream services experienced latency spikes.

Key insight: We discovered that 73% of our "urgent" operations weren't actually urgent to users. They needed to know the operation was initiated successfully, not that it was completed immediately.

Principle 3: Implement Intelligent Caching Layers

Caching seems straightforward until you're debugging why users see stale data or why your cache hit rate is inexplicably low. We learned that effective caching requires understanding data access patterns at a granular level.
Our breakthrough came from implementing semantic cache keys rather than generic ones:

Generic approach - poor hit rates
cache_key = f"user_data_{user_id}"

Semantic approach - 94% hit rate
cache_key = f"user_profile_{user_id}_{last_modified_hash}"

We also implemented cache warming based on access patterns. Our analytics showed that 34% of API calls followed predictable sequences—users viewing their profile, then their settings, then their recent activity. We pre-populate these related cache entries proactively.

Performance impact: Cache hit rates improved from 61% to 94%, reducing database load by 87% during peak hours.

Principle 4: Build Resilience Into the Contract

APIs fail, but how they fail determines whether your system degrades gracefully or not. We embedded resilience directly into our API contracts through circuit breaker patterns and graceful degradation.
{
"user": {
"id": 123,
"name": "Sarah Chen",
"email": "sarah@example.com"
},
"preferences": {
"status": "partial_failure",
"message": "Preferences service temporarily unavailable",
"data": null,
"retry_after": 30
},
"recommendations": {
"status": "degraded",
"message": "Using cached recommendations",
"data": [...],
"freshness": "2023-11-14T09:00:00Z"
}
}

This approach meant that even when 40% of our microservices experienced issues during a major outage, core user functionality remained available with clear communication about what was affected.

Principle 5: Monitor Intent, Not Just Performance

Traditional API monitoring focuses on response times, error rates, and throughput. While these metrics are essential, they don't tell you whether your API is actually serving its intended purpose effectively.

We implemented intent-based monitoring that tracks whether APIs enable successful user workflows:

Completion rates: What percentage of users who start a multi-step process through our APIs actually complete it?
Retry patterns: Are clients repeatedly calling endpoints, suggesting inadequate responses?
Abandonment points: Where in API-driven workflows do users give up?
This monitoring revealed that our checkout API had excellent technical metrics (99.7% uptime, 200ms average response) but terrible business metrics (67% abandonment rate). The issue wasn't performance—it was that error messages provided insufficient detail for users to resolve payment issues.

Scaling Beyond Infrastructure

The most important lesson from building APIs that handle 10+ million requests daily is that scalability is primarily an architectural challenge, not an infrastructure one. Our current system runs on roughly the same infrastructure as our original 2-million-request setup, but serves 5x the traffic with better performance.

Key architectural decisions that enabled this:

Design APIs around client needs, not database structure
Default to asynchronous for anything that isn't truly real-time
Implement caching as a core architectural component, not an afterthought
Build failure handling into contracts from day one
Monitor user success, not just system health

The transition from 2 million to 10+ million daily requests taught us that sustainable scale comes from making the right architectural decisions early and having the discipline to refactor when those decisions prove insufficient.

Looking Forward

As we prepare for our next scaling challenge—reaching 50 million daily requests—we're applying these same principles while exploring emerging patterns like GraphQL federation and event-driven architectures. The specific technologies evolve, but the fundamental principles of intentional design, asynchronous thinking, and user-focused monitoring remain constant.
The APIs that scale aren't just fast—they're thoughtfully designed to handle growth, failure, and changing requirements with equal grace.