Rajkiran

Posted on Jun 11

System Design - 12. REST vs GraphQL vs gRPC: Choosing the Right API for Every Job

#api #architecture #backend #systemdesign

REST vs GraphQL vs gRPC: Choosing the Right API for Every Job

Covers: REST, GraphQL, gRPC, API Gateway, BFF Pattern, Rate Limiting, API Versioning

The API That Cost Stripe Millions

In 2013, Stripe had a problem. Their REST API was so well-designed that developers around the world loved it. Businesses built entire payment flows on it.

Then they needed to make a breaking change.

They couldn't just update the API — thousands of businesses would instantly break. They couldn't force everyone to migrate overnight. So Stripe did something brilliant: they maintained multiple API versions simultaneously, routing each client to the version it was built against.

Today Stripe supports API versions going back to 2011. Every version is still alive. Every old integration still works.

The cost? A dedicated versioning infrastructure, careful compatibility testing across every version for every change, and an engineering team focused purely on API stability.

The lesson: API design decisions compound over years. The choice of REST, GraphQL, or gRPC — and how you version, gateway, and rate-limit your API — determines your architecture's flexibility for the next decade.

Let's make these decisions well.

REST: The Universal Language of the Web

REST (Representational State Transfer) is an architectural style built around HTTP verbs and resource URLs. It's been the dominant API style since Roy Fielding defined it in his 2000 dissertation.

REST's 6 Principles

1. Stateless       — Every request contains all info needed. No server-side sessions.
2. Client-Server   — Frontend and backend are independent. Either can change.
3. Cacheable       — Responses declare whether they can be cached.
4. Uniform Interface — Standard HTTP verbs + resource URLs.
5. Layered System  — Client doesn't know if it's talking to server, cache, or LB.
6. Code on Demand  — (optional) Server can send executable code to client.

REST in Practice

Resources are nouns. HTTP verbs are actions.

GET    /users/123          → Retrieve user 123
POST   /users              → Create a new user
PUT    /users/123          → Replace user 123 entirely
PATCH  /users/123          → Partially update user 123
DELETE /users/123          → Delete user 123

GET    /users/123/orders   → Get orders belonging to user 123
POST   /users/123/orders   → Create an order for user 123

HTTP status codes carry meaning:

200 OK            → success, body contains data
201 Created       → resource created (POST success)
204 No Content    → success, no body (DELETE success)
400 Bad Request   → client sent invalid data
401 Unauthorized  → not authenticated
403 Forbidden     → authenticated but not allowed
404 Not Found     → resource doesn't exist
409 Conflict      → state conflict (e.g., duplicate email)
422 Unprocessable → valid format but business rule violation
429 Too Many Req  → rate limited
500 Server Error  → something broke on our side

REST is cacheable by design:

GET /products/456
Cache-Control: max-age=3600  → Cache this for 1 hour
ETag: "abc123"               → Fingerprint for conditional fetches

GET requests are idempotent and safe — CDNs and browsers can cache them aggressively.

REST's Weakness: Over and Under-Fetching

Over-fetching: The API returns more data than the client needs.

GET /users/123

Response:
{
  "id": 123,
  "name": "Priya Sharma",
  "email": "priya@example.com",
  "phone": "...",
  "address": "...",
  "billing_info": { ... },  ← client only needed name and email
  "subscription": { ... },  ← but gets everything
  "preferences": { ... }
}

The mobile client wanted to display "Hello, Priya" and got 50 fields. Wasteful on mobile bandwidth and battery.

Under-fetching: The client needs multiple requests to get all the data it needs.

// Show a user's feed: need user + their posts + each post's comments

GET /users/123           → 1 request
GET /posts?user_id=123   → 1 request
GET /comments?post_id=1  → 1 request per post (N+1 problem!)
GET /comments?post_id=2
GET /comments?post_id=3
...                      → Could be dozens of requests for one screen

These problems become severe on mobile networks and led to the creation of GraphQL.

GraphQL: The Client Takes Control

GraphQL was created by Facebook in 2012 (open-sourced 2015) to solve the over/under-fetching problem that was killing their mobile app performance.

The core insight: let the client declare exactly what data it needs, and return only that.

How GraphQL Works

A single endpoint (/graphql) accepts queries describing exactly what the client wants:

query GetUserFeed {
  user(id: "123") {
    name                    # ← only the fields we need
    posts(first: 10) {      # ← only 10 posts
      title
      createdAt
      comments(first: 3) {  # ← only 3 comments per post
        text
        author {
          name              # ← just the name, not full profile
        }
      }
    }
  }
}

One request. Exactly the data the client needs. Nothing more.

Server response:

{
  "data": {
    "user": {
      "name": "Priya Sharma",
      "posts": [
        {
          "title": "My first post",
          "createdAt": "2024-01-15",
          "comments": [
            { "text": "Great!", "author": { "name": "Arjun" } }
          ]
        }
      ]
    }
  }
}

GraphQL Mutations and Subscriptions

# Mutation — modifying data
mutation CreatePost {
  createPost(input: { title: "Hello World", content: "..." }) {
    id
    title
    createdAt
  }
}

# Subscription — real-time updates via WebSocket
subscription OnNewComment {
  commentAdded(postId: "456") {
    text
    author { name }
  }
}

GraphQL's Real Trade-offs

Advantages:

Eliminates over/under-fetching — mobile apps love this
Single endpoint, self-documenting schema
Strong type system catches errors at compile time
Excellent developer tooling (GraphiQL playground, code generation)
Enables rapid frontend iteration without backend API changes

Disadvantages:

Caching is hard: POST to /graphql is not cacheable by default (CDNs cache by URL). Requires persisted queries or specialized GraphQL CDN caching.
N+1 query problem: A naive GraphQL implementation can fire hundreds of database queries for one client request. Requires DataLoader (batching) to fix.
Complex queries can be expensive: A malicious or poorly written client can request deeply nested data that destroys your database. Requires query depth limiting and cost analysis.
Overkill for simple APIs: If your API is a handful of straightforward endpoints, GraphQL's complexity isn't worth it.

Who uses it: Facebook (created it), GitHub API v4, Shopify, Twitter, Airbnb, Pinterest — all complex product APIs where different clients need different data shapes.

gRPC: The Internal Services Protocol

gRPC (Google Remote Procedure Call) is a high-performance RPC framework built on HTTP/2 and Protocol Buffers. It's designed for service-to-service communication inside your backend — not for public APIs.

How gRPC Works

You define your API as a .proto file — a strongly-typed schema:

syntax = "proto3";

service UserService {
  rpc GetUser (UserRequest) returns (UserResponse);
  rpc CreateUser (CreateUserRequest) returns (UserResponse);
  rpc StreamUserEvents (UserRequest) returns (stream UserEvent);  // streaming!
}

message UserRequest {
  string user_id = 1;
}

message UserResponse {
  string id = 1;
  string name = 2;
  string email = 3;
  int64 created_at = 4;
}

From this .proto file, gRPC generates client and server code in any language — Go, Python, Java, C++, Node.js, Rust. The generated code handles serialization, network calls, and error handling automatically.

On the wire: Data is serialized as Protocol Buffers (protobuf) — a compact binary format. A JSON object that takes 200 bytes takes 30 bytes as protobuf. This is why gRPC is 3-10x more bandwidth-efficient than REST/JSON.

gRPC's 4 Communication Patterns

1. Unary RPC (like REST):
   Client sends one request → Server sends one response

2. Server Streaming:
   Client sends one request → Server streams many responses
   (e.g., "stream me all events since timestamp X")

3. Client Streaming:
   Client streams many requests → Server sends one response
   (e.g., upload a large file in chunks)

4. Bidirectional Streaming:
   Both client and server stream simultaneously
   (e.g., real-time collaborative editing, chat)

REST can only do Unary. gRPC does all four natively.

gRPC's Trade-offs

Advantages:

3-10x smaller payload than JSON (protobuf binary vs verbose text)
Strong typing across languages — schema is the contract, code is generated
Bidirectional streaming built-in (HTTP/2 multiplexing)
Excellent for microservices — Uber uses gRPC for all internal service calls

Disadvantages:

Not human-readable — protobuf is binary, hard to debug with curl
Not browser-native — browsers can't call gRPC directly without gRPC-Web
Tight coupling — client and server must share .proto definitions
Overkill for simple, few services

Who uses it: Google (built it), Uber (all internal calls), Netflix, Square, Lyft. Any company with many microservices that need high-throughput, low-latency inter-service communication.

Choosing the Right API Style

┌─────────────────────────────────────────────────────────┐
│ Public-facing API (web/mobile clients)?                 │
│                                                         │
│  Simple CRUD, standard operations?          → REST      │
│  Complex data, many clients with different  │           │
│  data needs (mobile, web, partner APIs)?    → GraphQL   │
│                                                         │
│ Internal service-to-service communication?  → gRPC      │
│                                                         │
│ Mixed? Use all three:                                   │
│  External public API    → REST or GraphQL               │
│  Internal microservices → gRPC                          │
└─────────────────────────────────────────────────────────┘

Real architecture (Uber):

Mobile app → REST API Gateway → internal gRPC calls between services
Partner integrations → REST webhooks
Real-time ride tracking → WebSocket (neither REST nor gRPC)

API Gateway: The Front Door of Your Microservices

An API Gateway is a reverse proxy that sits at the entry point of your backend, handling cross-cutting concerns so individual services don't have to.

Clients
  │
  ▼
[API Gateway]
  ├── Authentication & Authorization
  ├── Rate Limiting
  ├── SSL Termination
  ├── Request Routing
  ├── Request/Response Transformation
  ├── Logging & Metrics
  ├── Circuit Breaking
  └── Caching
  │
  ├──► [User Service]
  ├──► [Order Service]
  ├──► [Payment Service]
  └──► [Notification Service]

Without an API Gateway, every microservice must implement auth, rate limiting, logging, and SSL themselves. That's N implementations of the same logic — duplicated, inconsistently, across every service.

With an API Gateway: implement once, enforce everywhere.

What API Gateways Handle

Routing:

/api/users/*    → User Service
/api/orders/*   → Order Service
/api/payments/* → Payment Service

Authentication:

Request arrives with JWT token
Gateway validates JWT (signature, expiry, claims)
If valid: add user_id header, forward to service
If invalid: return 401 immediately (service never sees the request)

Rate Limiting:

Tier: free → 100 req/min
Tier: pro  → 10,000 req/min
Tier: enterprise → unlimited

Gateway checks user's tier and request count in Redis
Exceeds limit → 429 Too Many Requests
Within limit → increment counter, forward request

Request Transformation:

Old client sends:  { "user_name": "Priya" }
New service expects: { "username": "Priya" }
Gateway transforms the request body in-flight

Real examples: AWS API Gateway, Kong (open source, Nginx-based), Nginx (with Lua scripting), Envoy, Traefik, Apigee (Google).

BFF Pattern: Backend for Frontend

A single API trying to serve a mobile app, a desktop web app, and third-party partners simultaneously has a fundamental problem: each client has radically different needs.

Mobile needs minimal data (battery, bandwidth)
Desktop needs rich data (big screen, fast connection)
Partner API needs different data shapes and authentication

BFF (Backend for Frontend) creates a dedicated API gateway per client type:

Mobile App     ──► [Mobile BFF]  ──► [Core Services]
Web App        ──► [Web BFF]     ──► [Core Services]
Partner API    ──► [Partner BFF] ──► [Core Services]

Each BFF is owned by the frontend team that uses it. The Mobile BFF team can optimize for mobile without worrying about breaking the Web BFF.

Who invented it: SoundCloud and Netflix were early adopters. Netflix runs dozens of BFFs — one per device type (TV, mobile, tablet, web, smart fridge).

API Versioning Strategies

APIs change. How you handle those changes determines whether you break your users.

Strategy 1: URL Versioning (Most Common)

/api/v1/users/123
/api/v2/users/123  ← new version with breaking changes

Simple. Explicit. Easy to route in API Gateway. The version is obvious in logs and browser history.
Downside: clutters URL space, clients must know which version to use.

Strategy 2: Header Versioning

GET /users/123
Accept: application/vnd.myapp.v2+json

Clean URLs. But less visible — harder to debug, can't test in browser directly.

Strategy 3: Query Parameter

GET /users/123?version=2

Easy to test. But versions in query params feel like an afterthought and break caching (same resource, different query).

Stripe's approach (timestamp-based versions):

Stripe-Version: 2023-10-16

Each API key is pinned to the version that existed when it was created. Changes never break existing integrations. New integrations get the latest version. Genius — and extremely expensive to maintain.

The rule: Never make breaking changes without a new version. A breaking change is: removing a field, changing a field's type, changing required fields, changing error formats.

Interview Scenario: REST vs GraphQL Trade-offs

Q: When would you use GraphQL over REST?

"I'd reach for GraphQL when I have multiple client types with meaningfully different data needs — like a mobile app that needs minimal data and a desktop app that needs the full dataset. The ability for clients to request exactly what they need eliminates over-fetching, which matters a lot on mobile.

I'd also choose GraphQL if the product is evolving fast and frontend teams need to iterate without waiting for backend API changes — they can just request new fields.

But I'd stick with REST when the API is simple and stable (few endpoints, consistent data shapes), when caching at the HTTP layer is important (CDN caching is trivial with REST, painful with GraphQL), or when the API is public and needs to be easily consumable without GraphQL client libraries.

For internal microservices, I'd use gRPC for both — REST and GraphQL carry too much HTTP overhead for high-throughput internal calls."

Key Takeaways

REST: Simple, cacheable, universal. Best for public APIs, CRUD operations, stable contracts.
GraphQL: Client-controlled queries, eliminates over/under-fetching. Best for complex products with many client types. Watch for N+1 and query cost.
gRPC: Binary protocol, streaming, strongly typed. Best for internal microservice communication. Not browser-native.
API Gateway centralizes auth, rate limiting, routing, logging — don't implement these in every service.
BFF pattern gives each client type a dedicated gateway optimized for its needs.
Versioning: URL versioning is most practical. Never make breaking changes without a version bump.
Most mature architectures use all three: GraphQL or REST externally, gRPC internally, API Gateway at the boundary.

Day 4 Complete

You've covered the full networking layer: Load Balancing (traffic distribution), CDNs (content delivery at the edge), and API Design (how clients talk to your system). These three topics are the entry points for every request that enters your architecture.

Tomorrow (Day 5) we go async — message queues, event-driven architecture, and the Saga pattern for distributed transactions. How systems communicate when they don't need an immediate answer, and why that changes everything about how you design for scale.

Tags: system-design api-design rest graphql grpc backend interview-prep

DEV Community