Rajkiran

Posted on May 31

System Design - 7.SQL vs NoSQL, How Uber Uses Both in the Same System (And Why That's the Right Answer)

#architecture #database #sql #systemdesign

The Question Nobody Should Be Asking

"Should I use SQL or NoSQL?"

It's the most Googled database question in engineering. And it's the wrong question.
Uber uses PostgreSQL and Cassandra in the same system. Netflix runs MySQL and DynamoDB side by side. Instagram has PostgreSQL, Cassandra, and Redis all playing different roles in their stack.

The right question isn't "which one?" — it's "which one for this specific job?"

Because SQL and NoSQL don't compete. They solve different problems. And the engineers who understand why are the ones who design systems that actually survive at scale.
Today we unpack both — from first principles to production decisions — so you can answer this question confidently in any interview and any codebase.

SQL: The Reliable Foundation
SQL databases (PostgreSQL, MySQL, SQLite, Oracle) have been the backbone of software for 50 years. They earned that position.
What Makes SQL Powerful
Structured schema with enforced relationships:
Your data has a defined shape. A users table has columns. A orders table has a foreign key to users. The database enforces these relationships — you cannot insert an order for a user that doesn't exist.

sql
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id), -- FK enforced
amount DECIMAL(10,2) NOT NULL,
status VARCHAR(50)
);

JOINs — the killer feature:
SQL lets you query across multiple related tables in a single operation:
sql
SELECT u.email, COUNT(o.id) as order_count, SUM(o.amount) as total_spent
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.email
ORDER BY total_spent DESC;
One query. Across two tables. With aggregation and filtering. NoSQL cannot do this natively.

ACID transactions — the guarantee that matters for money:
A — Atomicity: Either all operations succeed or none do. No half-writes.
C — Consistency: Data always satisfies all defined constraints and rules.
I — Isolation: Concurrent transactions don't interfere with each other.
D — Durability: Once committed, data survives crashes and power failures.

Why ACID matters: Imagine transferring $500 from Account A to Account B:
sqlBEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 'A'; -- debit
UPDATE accounts SET balance = balance + 500 WHERE id = 'B'; -- credit
COMMIT;
If the server crashes after the debit but before the credit, ACID atomicity rolls back the entire transaction. Account A keeps its $500. No money disappears. This is why every bank, payment processor, and e-commerce checkout runs on SQL.
When SQL Is the Right Choice

Financial transactions, orders, payments — anything requiring ACID
Complex relational data with many joins (e-commerce product catalogs, ERP systems)
Reporting and analytics queries across multiple entities
When your schema is well-defined and relatively stable
Moderate scale (millions to low billions of records) with vertical scaling

SQL's Real Ceiling
SQL scales vertically well — give it more CPU and RAM and it flies. But horizontal scaling (sharding across multiple servers) is painful with SQL because of the foreign key and JOIN requirements. This is where NoSQL was born.

NoSQL: Four Different Tools for Four Different Jobs
"NoSQL" is an umbrella term for four fundamentally different database types. Grouping them together is like saying "non-hammer tools" — technically accurate, completely unhelpful.

Type 1: Key-Value Stores
Examples: Redis, DynamoDB, Memcached
The mental model: A giant dictionary. You store a value with a key, you retrieve it by key. Nothing more, nothing less.
SET user🔢session "{'cart': ['item_a', 'item_b'], 'logged_in': true}"
GET user🔢session → "{'cart': ['item_a', 'item_b'], 'logged_in': true}"

When to use:

Session storage (user login state, shopping cart)
Caching (pre-computed results you want to retrieve in microseconds)
Rate limiting counters
Feature flags
Real-time leaderboards (Redis sorted sets)

Superpower: Single-digit millisecond reads and writes. Redis reads from memory — no disk involved. It's 100-1000x faster than a SQL query for simple key lookups.

Limitation: No querying. You can only retrieve by exact key. You can't ask "give me all users who logged in today" — you'd need a separate index or a different database for that.

Real usage: Twitter caches pre-computed user timelines in Redis. When you open Twitter, your feed is assembled from Redis in ~2ms. Without caching, each timeline would require aggregating thousands of tweets from Cassandra — taking 300ms+.

Type 2: Document Stores
Examples: MongoDB, Firestore, CouchDB
The mental model: Store JSON documents. Each document can have a different structure. Query them by any field.

json// Document 1: Basic user
{
"_id": "user_123",
"name": "Priya Sharma",
"email": "priya@example.com",
"preferences": { "theme": "dark", "notifications": true }
}

// Document 2: Power user (different structure — totally fine in document DB)
{
"_id": "user_456",
"name": "Arjun Mehta",
"email": "arjun@example.com",
"preferences": { "theme": "light" },
"verified": true,
"subscription": { "plan": "pro", "expires": "2026-12-31" }
}

When to use:

User profiles (each user has different optional fields)
Product catalogs (each product type has different attributes)
CMS content (articles, blog posts with varying metadata)
Rapid prototyping (schema evolves without migrations)

Superpower: Schema flexibility. When requirements change, you don't write a migration — you just add the new field to new documents. Old documents don't break.

Limitation: No cross-document JOINs. If you need to join users to their orders to their products in a single query, you'll either need multiple queries or denormalize your data (store redundant copies).

Real usage: Airbnb uses MongoDB for listing data — each property has wildly different fields (apartments have floor numbers, houses have garage info, boats have slip numbers). Document stores handle this naturally.

Type 3: Wide-Column Stores
Examples: Cassandra, HBase, Google Bigtable
The mental model: Think of a table, but where each row can have a completely different set of columns, and the table is distributed across thousands of servers automatically.
Row Key: "user_123"
Columns: { "2024-01-01:post_1": "content...",
"2024-01-01:post_2": "content...",
"2024-01-15:post_3": "content...",
"2024-02-01:post_4": "content..." }

Row Key: "user_456"
Columns: { "2024-03-01:post_1": "content..." }

When to use:

Time-series data (IoT sensor readings, application logs, metrics)
Write-heavy workloads at extreme scale (billions of writes per day)
Data that's always queried by a known partition key + time range
When you need linear horizontal scalability without a ceiling

Superpower: Cassandra can handle 1 million writes per second across a cluster. It's specifically designed for write-heavy workloads that no SQL database can match. Adding nodes linearly increases capacity — there's no practical upper limit.

Limitation: Query patterns must be known upfront. In Cassandra, your data model is your query pattern. You design tables around the queries you need, not around entities. Changing query patterns requires rebuilding tables.

Real usage: Discord stores 4 billion messages in Cassandra. Every Discord message ever sent is in a Cassandra cluster, partitioned by channel ID and clustered by timestamp. At peak, Discord writes hundreds of thousands of messages per second.

Type 4: Graph Databases
Examples: Neo4j, Amazon Neptune, ArangoDB
The mental model: Store entities (nodes) and their relationships (edges) as first-class citizens. Queries traverse relationships.
(Alice)-[:FOLLOWS]->(Bob)
(Bob)-[:FOLLOWS]->(Carol)
(Alice)-[:LIKES]->(Post_1)
(Post_1)-[:WRITTEN_BY]->(Carol)

Query: "Find posts liked by people Alice follows"
→ Traverse: Alice → FOLLOWS → Bob, Carol → WRITTEN_BY → their posts
→ Filter: posts that Alice LIKES

When to use:

Social networks (friend-of-friend, mutual connections)
Recommendation engines ("people who bought X also bought Y")
Fraud detection (detecting transaction rings)
Knowledge graphs
Network topology (infrastructure dependencies) Superpower: Multi-hop relationship queries that would require dozens of JOINs in SQL run in milliseconds in a graph database. Finding "friends of friends of friends" is trivially fast. Limitation: Not suitable as a general-purpose database. Storage and query overhead is high for non-relationship data. Real usage: LinkedIn uses graph databases for their "People You May Know" feature — finding 2nd and 3rd degree connections across 900 million users efficiently.

ACID vs BASE: The Trade-off Formalized
These two acronyms capture the fundamental philosophical difference between SQL and most NoSQL:

ACID (SQL):
Atomicity — All or nothing
Consistency — Rules always satisfied
Isolation — Transactions don't interfere
Durability — Committed = permanent

BASE (NoSQL):
Basically Available — System always responds (even if data is stale)
Soft State — State may change over time even without writes (replication)
Eventually — Consistent data across nodes eventually
Consistent

The honest summary:

ACID = "I guarantee correctness, even if I'm slower"
BASE = "I guarantee availability, even if data is briefly imperfect"

Neither is superior. Your data requirements determine which you need.

Polyglot Persistence: Using Multiple Databases in One System
The most important concept in modern data architecture:

Polyglot persistence means using multiple different database types within a single application, each for what it does best.

Real example — Uber's data stack:
PostgreSQL → Trip records, driver accounts, payment history
(ACID transactions, relational queries for financial data)

Cassandra → Location history, driver GPS logs, analytics events
(write-heavy, time-series, billions of rows)

Redis → Active driver locations (ephemeral, auto-expiring)
(millisecond reads for real-time matching)

Elasticsearch → Ride search, trip history search
(full-text search, complex filtering)
Each database does exactly what it's best at. PostgreSQL handles the money. Cassandra handles the firehose of location data. Redis handles the real-time state. Elasticsearch handles the search.

Netflix's data stack:
MySQL → Billing, subscriptions (ACID, financial)
DynamoDB → Viewing history, user preferences (key-value, fast reads)
Cassandra → Play events, interaction data (high-write time-series)
Elasticsearch → Content search (full-text)
Redis → Session data, AB test assignments (cache)

The mental model for system design interviews: When asked "what database would you use?", the correct senior-engineer answer is almost never a single database. It's: "For X use case I'd use Y, for Z use case I'd use W, because..."

The Decision Framework
When choosing a database, run through these questions:
1. Does the data have complex relationships that you'll need to JOIN?
Yes → Start with SQL
2. Do you need ACID transactions? (money, inventory, bookings)
Yes → SQL (PostgreSQL strongly preferred)
3. Will you write at millions of records per second?
Yes → Wide-column (Cassandra)
4. Is your schema unpredictable or rapidly evolving?
Yes → Document store (MongoDB)
5. Do you need millisecond reads for a known key?
Yes → Key-value (Redis, DynamoDB)
6. Are relationships between entities the primary query?
Yes → Graph database (Neo4j)
7. Do you need full-text search across large content?
Yes → Search engine (Elasticsearch)

Interview Scenario: "When Would You Use MongoDB Over PostgreSQL?"
The answer that impresses:
"I'd reach for MongoDB when the data is document-shaped and the schema evolves rapidly — like user-generated content with unpredictable fields, or a product catalog where different product types have completely different attributes. The flexibility of not needing migrations during early development is genuinely valuable.
But I'd use PostgreSQL when I need transactions, when data is highly relational, or when I need complex queries across multiple entities. The ACID guarantees are non-negotiable for anything involving money.
In a real system I'd probably use both — PostgreSQL for the transactional core, MongoDB for flexible content storage, and Redis in front of both for caching hot reads."

Key Takeaways

SQL gives you ACID, JOINs, and structured data. It's the right choice for financial data and complex relational queries.
NoSQL is not one thing — it's four different database types, each with a distinct specialty.
Key-value (Redis): millisecond reads, caching, sessions.
Document (MongoDB): flexible schemas, user profiles, content.
Wide-column (Cassandra): extreme write throughput, time-series, logs.
Graph (Neo4j): relationship-heavy queries, social networks, recommendations.
ACID = correctness guaranteed. BASE = availability guaranteed. Your business requirements choose.
Polyglot persistence — using multiple database types in one system — is how every large-scale production system is actually built.
Never answer "SQL or NoSQL?" without asking "for which part of the system and for what access patterns?"

DEV Community

System Design - 7.SQL vs NoSQL, How Uber Uses Both in the Same System (And Why That's the Right Answer)

Top comments (0)