DEV Community

Cover image for NoSQL vs SQL for Data Engineering: When to Pick Mongo, Cassandra, DynamoDB or Postgres
Gowtham Potureddi
Gowtham Potureddi

Posted on

NoSQL vs SQL for Data Engineering: When to Pick Mongo, Cassandra, DynamoDB or Postgres

nosql vs sql is the question that arrives on day one of every new data project and never quite goes away. The honest 2026 answer is that the dichotomy itself is misleading — most production teams run both — and the real decision is whether the workload wants a relational engine, a document store, a key-value store, a wide-column store, or a graph database. Pick the wrong shape and you spend the next year writing application-side joins, denormalising what should have been a single SQL statement, or paying for unbounded write throughput you do not need.

This guide is the workload-first answer to sql vs nosql that interviewers reward and that staff data engineers actually use when they pick storage. It walks the eight-axis decision matrix that separates the two models, the four-family NoSQL map (MongoDB, DynamoDB, Cassandra, Neo4j and their cousins), the CAP theorem with its day-to-day PACELC extension, and a workload-first decision tree across OLTP, OLAP, schema flexibility, consistency, latency, and global active-active. Each section ships a worked interview answer with code, a step-by-step trace, an output table, and a concept-by-concept why-this-works.

PipeCode blog header for NoSQL vs SQL for data engineering — bold white headline 'NoSQL vs SQL · For Data Engineering' with subtitle 'Mongo · Cassandra · DynamoDB · Postgres' and three stylised database engine chips on a dark gradient with purple, green, orange, and blue accents and a small pipecode.ai attribution.

When you want hands-on reps immediately after reading, drill the database design practice library →, rehearse on SQL practice problems →, and stack the schema muscles with data modelling for DE interviews →.


On this page


1. The NoSQL vs SQL question, reframed for 2026

nosql vs sql is the wrong framing — the real decision is relational vs document vs key-value vs wide-column vs graph

The one-sentence invariant: "SQL" describes a query language; "NoSQL" describes everything that is not relational, which is four distinct families with four distinct trade-offs. Once you stop treating NoSQL as a single thing, the interview question stops being "SQL or NoSQL?" and becomes "what is the query shape, the scale ceiling, and the consistency contract?" — three answers that point at exactly one engine.

Why the dichotomy is misleading.

  • Most production teams run both. A typical SaaS stack pairs Postgres (OLTP source of truth) with Redis (cache), Snowflake (warehouse), and either DynamoDB or MongoDB (a single domain that does not fit relational well). Picking "SQL or NoSQL" implies a mutually exclusive choice that no senior architect actually makes.
  • Postgres + JSONB ate the easy NoSQL case. Since Postgres 9.4 (2014), the jsonb type plus GIN indexes have made "I picked Mongo because my schema was evolving" much harder to defend. A team can keep relational joins and schema-on-read in the same engine.
  • Distributed SQL closed the scale gap. Spanner, CockroachDB, YugabyteDB, and TiDB give horizontally sharded ACID without the application-side joins of pure NoSQL. "We need NoSQL for scale" is no longer the slam-dunk argument it was in 2012.
  • The real question is access pattern. Every senior DE asks "model the access pattern first, the data second" — DynamoDB's single-table design is the textbook example, but the same rule applies to every engine.

The actual decision tree in one sentence.

Pick relational (Postgres / MySQL / Spanner) when joins are first-class and rows are < 10 TB; pick document (MongoDB / Couchbase) when the unit of work is a self-contained nested object; pick key-value (DynamoDB / Redis) when reads are point lookups by a known key at single-digit-ms latency; pick wide-column (Cassandra / ScyllaDB) when writes are unbounded and partitions are time-series-shaped; pick graph (Neo4j / Neptune) when traversals run more than three hops deep.

What interviewers listen for.

  • Do you say "the question is never 'NoSQL or SQL', it is 'which engine for this workload'"? — senior signal.
  • Do you mention access-pattern modelling before naming a specific engine? — required answer.
  • Do you cite Postgres + JSONB as the modern hybrid for "we thought we needed Mongo"? — senior signal.
  • Do you bring up PACELC when the conversation reaches CAP? — staff signal.

The 2026 reality in five bullets.

  • Postgres has eaten 60% of "we picked Mongo for schema flexibility" thanks to JSONB + partial indexes + the pg_partman ecosystem.
  • DynamoDB and Cassandra still own write-heavy unbounded workloads — neither Postgres nor Spanner can match 100k writes/s at single-digit-ms P99 without significant engineering.
  • Spanner / CockroachDB / YugabyteDB are the production answer when you genuinely need global strong consistency and horizontal scale.
  • Redis is the universal cache layer, not a primary store — its 2024 license change pushed teams toward Valkey or DragonflyDB, but the role is identical.
  • Vector databases (pgvector, Pinecone, Weaviate) are a sixth family driven by 2023–26 LLM workloads — not covered in detail here but worth naming in interviews.

Worked example — the same domain modelled in relational vs document

Detailed explanation. A new analyst is asked "should we use Postgres or MongoDB for our SaaS users + orders + line items?" The instinct is to compare features. The senior answer is to compare access patterns — what queries do we run, how often, with what latency budget?

Question. Sketch the same customers + orders + order_items domain in (a) Postgres relational, (b) MongoDB document, and (c) DynamoDB single-table. Explain which one wins when the dominant access pattern is "load one order with all its line items" vs "run quarterly revenue reports."

Input. Three logical entities — customer (1 row), order (1..N per customer), order_item (1..M per order).

Code.

-- (a) Postgres relational — three tables, foreign keys
CREATE TABLE customers (
    customer_id BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    email       TEXT UNIQUE
);

CREATE TABLE orders (
    order_id    BIGSERIAL PRIMARY KEY,
    customer_id BIGINT REFERENCES customers(customer_id),
    order_date  DATE NOT NULL,
    status      TEXT NOT NULL
);

CREATE TABLE order_items (
    order_item_id BIGSERIAL PRIMARY KEY,
    order_id      BIGINT REFERENCES orders(order_id),
    sku           TEXT NOT NULL,
    qty           INT  NOT NULL,
    unit_price    NUMERIC(10,2)
);
Enter fullscreen mode Exit fullscreen mode
// (b) MongoDB document — one document per order, line items embedded
db.orders.insertOne({
  _id: ObjectId("..."),
  customer: { id: 100, name: "Alice", email: "a@x.com" },
  order_date: ISODate("2026-05-20"),
  status: "placed",
  items: [
    { sku: "SKU-1", qty: 2, unit_price: 9.99 },
    { sku: "SKU-7", qty: 1, unit_price: 29.99 }
  ]
});
Enter fullscreen mode Exit fullscreen mode
// (c) DynamoDB single-table — composite key (PK, SK)
// PK = "CUST#100", SK = "META"               -> customer attributes
// PK = "CUST#100", SK = "ORDER#2026-05-20#1" -> order header
// PK = "CUST#100", SK = "ORDER#2026-05-20#1#ITEM#1" -> line item
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The relational design normalises every entity into its own table. Loading "one order with all line items" requires a two-table JOIN; running "quarterly revenue" is a single aggregation across order_items — both natural.
  2. The document design embeds line items inside the order document. Loading one order is a single findOne() — zero joins, one round trip. But "quarterly revenue across all orders" requires either an aggregation pipeline ($unwind + $group) or a denormalised pre-aggregated collection.
  3. The DynamoDB design picks the customer as the partition key. Every read for one customer is a single partition scan — extremely fast. Aggregations across all customers require either a Lambda fan-out or a downstream warehouse (Redshift, Snowflake).
  4. The winner depends on the dominant pattern. If 95% of reads are "load one order," document or single-table wins. If reporting is the dominant pattern, relational wins. Most teams run both — primary store for OLTP, warehouse for OLAP.

Output.

Access pattern Postgres MongoDB DynamoDB
Load one order + items 2 joins 1 doc fetch 1 partition query
Quarterly revenue 1 aggregation aggregation pipeline Lambda fan-out + warehouse
Add new field to items ALTER TABLE no schema change no schema change
Strong consistency default default per replica optional read flag

Rule of thumb. Before you pick an engine, list every query the application will run, classify each as point-read, range-read, aggregation, or join, and count how often each runs. The most-frequent pattern determines the engine; the long-tail patterns determine whether you also need a warehouse or a secondary index.

Worked example — "we picked Mongo for schema flexibility" three years later

Detailed explanation. A 2022 team picked MongoDB because the product roadmap was unclear and the schema would evolve. In 2026 they are paying for it: every analytics report requires either a custom aggregation pipeline or an ETL job into Snowflake, and joins between collections are application-side. A senior architect asks: what would the same team build today?

Question. Given the same "evolving schema" constraint, would 2026 Postgres + JSONB be a better fit than MongoDB? Show how the same nested-order document maps onto a hybrid Postgres schema and what the team trades off.

Input — the same Mongo document.

{ order_id: 42, status: "placed", items: [{ sku: "A", qty: 2 }, { sku: "B", qty: 1 }] }
Enter fullscreen mode Exit fullscreen mode

Code.

-- Postgres hybrid: relational columns + JSONB for the evolving part
CREATE TABLE orders (
    order_id     BIGINT PRIMARY KEY,
    customer_id  BIGINT NOT NULL REFERENCES customers(customer_id),
    order_date   DATE NOT NULL,
    status       TEXT NOT NULL,
    payload      JSONB           -- nested items, custom fields, A/B flags
);

-- GIN index on the JSONB column for ad-hoc filters
CREATE INDEX idx_orders_payload_gin ON orders USING GIN (payload);

-- Query items inside the JSONB column
SELECT order_id, payload->'items' AS items
FROM orders
WHERE payload @> '{"items": [{"sku": "A"}]}'::jsonb
  AND order_date >= '2026-01-01';
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The relational columns (order_id, customer_id, order_date, status) cover the stable, joinable, indexable surface — every report that pivots on these columns runs at relational speed.
  2. The JSONB column absorbs everything that evolves — nested items, A/B flags, experimental fields. Adding a new key requires no schema migration.
  3. The GIN index makes containment queries (@>) fast even on the JSONB column. Without it, Postgres falls back to a sequential scan on every JSONB filter.
  4. The trade-off versus Mongo is honest: Postgres is harder to scale horizontally past ~5 TB without Citus, but the team keeps JOINs, ACID transactions across rows, and one engine to operate. In 2026 that trade-off favours Postgres for most SaaS teams under 5 TB.

Output.

Property MongoDB Postgres + JSONB
Nested documents native JSONB column
Schema-on-read yes yes (JSONB)
Joins across collections application-side native SQL
ACID across rows per-doc full table-wide
Scale ceiling (single cluster) high ~5 TB (without Citus)

Rule of thumb. If your data fits one of "evolving nested document," and joins to other tables are rare, Postgres + JSONB is the 2026 default. Reach for MongoDB only when the per-document query is dominant AND the cluster will outgrow Postgres's vertical limits.

Worked example — the "model access pattern first" rule on DynamoDB

Detailed explanation. DynamoDB punishes ad-hoc schema design more than any other engine. Without partitioning by the right key, every query is a full table scan; pick the wrong sort key and you cannot answer the most common question without a global secondary index. The senior approach is to list every access pattern before you write a single PUT.

Question. For a leaderboard feature — "top 10 players this week per game" plus "all scores for one player" — write the DynamoDB single-table design. Show why the access pattern drives the partition key.

Input. Two access patterns: (1) top-N scores per (game, week), (2) all scores for one player across games.

Code.

// Pattern 1 — "top 10 this week per game"
// PK = "GAME#tetris#WEEK#2026-W23"
// SK = score (sortable, DESCENDING via Query ScanIndexForward=false)

// Pattern 2 — "all scores for one player"
// GSI1PK = "PLAYER#alice"
// GSI1SK = "GAME#tetris#WEEK#2026-W23"

// Sample item
{
  "PK": "GAME#tetris#WEEK#2026-W23",
  "SK": 9_999_999,
  "GSI1PK": "PLAYER#alice",
  "GSI1SK": "GAME#tetris#WEEK#2026-W23",
  "player": "alice",
  "score": 9_999_999
}
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Pattern 1 partitions by (game, week) and sorts by score. The "top 10 this week" query is a single Query call on the primary index, returning the highest 10 scores in one round trip.
  2. Pattern 2 needs the player as the partition key. We add a Global Secondary Index (GSI1) with GSI1PK = PLAYER#alice so that "all scores for alice" is also a single Query call.
  3. Both patterns are O(1) on the dominant axis — partition key — which is what DynamoDB rewards. A naive design with PK = PLAYER#alice would force a global scan for pattern 1.
  4. The cost is duplication: every item is indexed twice (primary + GSI1). DynamoDB charges for that, but the alternative — fan-out reads or downstream Lambda aggregation — is more expensive at scale.

Output.

Access pattern Index used Cost (RCU per query)
Top 10 this week per game Primary (PK + SK) 1 RCU
All scores for one player GSI1 1 RCU
Top 10 all-time per game (no week) Primary scan full-partition scan

Rule of thumb. Before you provision any DynamoDB table, write down every access pattern, label each as "point read," "range scan," or "aggregation," and design the partition key + sort key + GSIs so that every pattern is a Query, never a Scan. If you cannot do this, you do not yet know your domain well enough for DynamoDB.

SQL interview question on SQL vs NoSQL framing

A senior interviewer often opens with: "We're building a logistics platform — 50 million rides per day, drivers query 'my last 10 trips' constantly, ops queries 'all rides in a city this hour' for live dashboards, finance runs nightly settlement aggregations. SQL or NoSQL?" The probe tests whether the candidate decomposes by access pattern or jumps to a flavour-of-the-month engine.

Solution Using a workload-first decomposition

Access pattern 1 — "driver's last 10 trips"
  -> point read by driver_id, sorted desc by trip_time
  -> partition key = driver_id, sort key = trip_time
  -> DynamoDB Query call, single-digit ms

Access pattern 2 — "all rides in city X this hour"
  -> range scan by (city, hour)
  -> partition key = (city, hour_bucket), sort key = trip_time
  -> DynamoDB GSI or Cassandra wide-row

Access pattern 3 — "nightly settlement aggregation"
  -> OLAP scan across millions of rows
  -> NOT DynamoDB / Cassandra — too expensive per RCU
  -> Stream changes to S3 + Snowflake / BigQuery / Redshift
  -> Run SQL aggregations in the warehouse
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Pattern Frequency Latency budget Engine Reason
Driver's last 10 trips 10k QPS < 50 ms P99 DynamoDB point read by partition key
Rides in city this hour 1k QPS < 200 ms P99 DynamoDB GSI / Cassandra range scan inside one partition
Nightly settlement 1 / day minutes ok Snowflake / BigQuery full-table aggregation
Audit trail / GDPR 100 QPS seconds ok S3 + Athena append-only, cheap storage

After the trace, the architecture is obvious: two engines, not one. The hot path (patterns 1 + 2) runs on DynamoDB with a carefully designed key + GSI; the analytical path (pattern 3) runs on a columnar warehouse fed by Change Data Capture. Pattern 4 lives on object storage because the read pattern is "rare and forgiving."

Output:

Tier Engine Cost driver Latency
Hot OLTP DynamoDB RCU/WCU per access pattern single-digit ms
Warm OLAP Snowflake compute warehouse seconds minutes
Cold archive S3 GB-month seconds

Why this works — concept by concept:

  • Workload-first decomposition — every access pattern is named, classified (point / range / aggregation), and quantified (QPS, latency budget). The engine falls out of the requirements, not the other way around.
  • Two-tier OLTP + OLAP — the same data lands in two engines because the read patterns are fundamentally different. CDC keeps them in sync; the warehouse is eventually consistent (minutes), which is fine for nightly settlement.
  • Partition-key design first — for DynamoDB, the partition key choice locks in 80% of performance. Driver-id + trip-time partitions cleanly; pick a low-cardinality key (e.g. country) and you create hot partitions and throttling.
  • No relational primary store — at 50M rides/day, a single Postgres instance cannot keep up at single-digit-ms P99 without significant engineering. Spanner could, but at a higher dollar cost.
  • Object storage as the long-tail — S3 + Athena is the universal "I do not know how often this will be queried" answer. Cheap to store, payable per query, no schema migrations.
  • Cost — DynamoDB cost scales linearly with QPS; warehouse cost scales with the data volume of nightly scans; S3 cost is negligible. The architecture is cheap precisely because each engine matches its workload.

Database design
Topic — database
Database design problems (general)

Practice →


2. SQL vs NoSQL decision matrix

Eight axes, two columns — the matrix that turns sql vs nosql into a checklist

The mental model in one line: eight axes (schema, joins, transactions, consistency, scale model, query language, indexing, ops cost) each have a "SQL-favourable" answer and a "NoSQL-favourable" answer — count which column wins for your workload. Once the matrix is on paper, the engine choice usually decides itself.

Eight-row decision matrix comparing SQL and NoSQL on schema, joins, transactions, consistency, scale, query language, indexing, and ops cost — each row showing the SQL answer on the left and the NoSQL answer on the right with a coloured marker for which is stronger; on a light PipeCode card.

The matrix in one table.

Axis SQL (Postgres / MySQL / Spanner) NoSQL (Mongo / DynamoDB / Cassandra)
Schema rigid, migration-managed flexible, evolving documents
Joins native, cost-based optimiser application-side or denormalised
Transactions full ACID across rows + tables single-doc / single-partition (mostly)
Consistency strong by default tunable / eventual / per-request
Scale model vertical + read replicas horizontal · partition-key sharding
Query language SQL (declarative) MQL / CQL / API (imperative)
Indexing B-tree, partial, expression, GIN limited secondary indexes (per engine)
Ops cost lower for OLTP up to ~5 TB lower for unbounded write throughput

Row-by-row commentary.

  • Schema. Postgres requires ALTER TABLE for every change; Mongo accepts a new field on the next insert. The 2026 hybrid is Postgres + JSONB — schema-on-read inside a relational engine.
  • Joins. Postgres's optimiser cost-models hash, merge, and nested-loop joins; DynamoDB has no JOIN clause at all. Cassandra has JOIN syntax only in DataStax Enterprise — the open-source path is application-side.
  • Transactions. Postgres ACID is genuinely ACID: a transaction can span dozens of rows across many tables. Mongo got multi-document transactions in 4.0 (2018) but the recommended pattern is still "one document = one unit of work." DynamoDB supports TransactWriteItems up to 100 items but charges 2× the WCU.
  • Consistency. Postgres is strongly consistent by default on the primary; read replicas drift by milliseconds. Cassandra is tunable per query (ONE, QUORUM, ALL); DynamoDB is eventually consistent by default with a strongly-consistent read flag.
  • Scale model. Postgres scales vertically (bigger box) and via read replicas; horizontal write scale needs Citus or sharding-as-application-logic. Mongo and Cassandra shard natively on a chosen key.
  • Query language. SQL is declarative — "what" not "how." MQL, CQL, and DynamoDB's API are progressively more imperative — they force you to think about partitions, indexes, and access patterns up-front.
  • Indexing. Postgres has the richest index set (B-tree, hash, GIN, GiST, BRIN, partial, expression). Mongo has secondary indexes per collection but limits to 64 per collection. DynamoDB charges separately for every Global Secondary Index.
  • Ops cost. A single Postgres instance below 5 TB is dramatically cheaper to operate than any sharded NoSQL cluster. Above 100 TB or 50k writes/s, the cost balance flips.

Reality checks for 2026.

  • Postgres + JSONB blurs the schema row and partially the indexing row — GIN on JSONB is a near-substitute for Mongo's document indexes.
  • Distributed SQL (Spanner / CockroachDB) closes the scale-model row — they ship horizontal sharding with ACID, at a premium.
  • DynamoDB transactional writes (since 2018) and Mongo multi-document transactions (since 4.0) blur the transactions row — but both still recommend single-doc design as the happy path.
  • Most production teams run both — Postgres for the relational core, one NoSQL engine for the hot path that does not fit relational.

Common interview probes.

  • "Why is JOIN expensive in NoSQL?" — because the data is sharded across nodes by partition key; pulling matched pairs across shards is a cross-node operation, which most engines refuse to do at all.
  • "Why is the schema flexible in NoSQL?" — because there is no central schema registry per collection; the application enforces shape. The cost is that schema drift becomes a runtime bug, not a deploy-time one.
  • "Why is sharding harder in SQL?" — because foreign keys, joins, and ACID transactions all assume a single coherent view of the data. Sharding breaks that assumption; distributed SQL engines spend most of their engineering preserving it.

Worked example — schema evolution: ALTER TABLE vs no-op

Detailed explanation. A product team adds a loyalty_tier field to the user record. In Postgres this is an ALTER TABLE migration — locking, fast on small tables, slow with high concurrency on large tables. In Mongo it is a no-op until the next insert or update. The trade-off is "deploy-time pain" vs "runtime schema drift."

Question. Compare the schema-evolution story for adding loyalty_tier TEXT to a 200 GB users table in (a) Postgres and (b) MongoDB. What goes wrong if you skip the deploy-time pain?

Input. A users table / collection with 50 million rows / documents and steady write traffic.

Code.

-- (a) Postgres — explicit migration
ALTER TABLE users ADD COLUMN loyalty_tier TEXT;
-- Modern PG (>= 11): adding a nullable column is instant (metadata-only).
-- Backfill: UPDATE users SET loyalty_tier = 'silver' WHERE created_at < '2025-01-01';
-- The backfill is the slow part — batch it.
Enter fullscreen mode Exit fullscreen mode
// (b) MongoDB — no-op until next write
db.users.updateOne(
  { _id: ObjectId("...") },
  { $set: { loyalty_tier: "silver" } }
);
// Pre-existing documents still lack the field.
// Application code MUST handle the "field absent" case:
const tier = user.loyalty_tier ?? "bronze";
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The Postgres migration is metadata-only for adding a nullable column in PG 11+ — the column definition is recorded in the catalog, no row rewrite required. The backfill is what costs minutes-to-hours, and it is batchable.
  2. The Mongo update is per-document — there is no equivalent of ALTER COLLECTION. Pre-existing documents lack the new field until they are touched by a write.
  3. The Mongo "no-op" hides a runtime cost: every read path must defensively handle the "field absent" case. A new query find({ loyalty_tier: "gold" }) returns ZERO documents from the pre-existing population — they have no loyalty_tier to compare against.
  4. The fix on the Mongo side is a one-time backfill: db.users.updateMany({ loyalty_tier: { $exists: false } }, { $set: { loyalty_tier: "bronze" } }). Same shape as the Postgres backfill, different mental model.

Output.

Step Postgres MongoDB
Add field ALTER TABLE (metadata-only) implicit, on next write
Backfill existing rows UPDATE in batches updateMany with $exists: false
Read code column always present must handle absent field
Constraint enforcement NOT NULL, CHECK at DB application-side or $jsonSchema

Rule of thumb. "Flexible schema" does not mean "no schema" — it means "schema enforcement moves from the database to the application." If the team has the discipline to enforce shape in code reviews and a schema-validation layer (Mongo $jsonSchema, application-side Zod / Pydantic), Mongo's flexibility is real. Without that discipline, schema drift becomes a runtime bug magnet.

Worked example — joins: native SQL vs application-side fan-out

Detailed explanation. The single most common production reason teams regret picking NoSQL is "we need joins now." Mongo can $lookup, Cassandra cannot join at all, DynamoDB cannot either. Application-side joins fan out N round trips and lose the planner's ability to reorder or hash.

Question. "Show every order, with the customer's name and the product name on every line item." Write the Postgres SQL and the equivalent Mongo / DynamoDB approach. Compare round trips.

Input. Three entities — customers, orders, products. 100 orders to render.

Code.

-- Postgres — one query, one round trip
SELECT
    o.order_id,
    c.name        AS customer_name,
    oi.qty,
    p.name        AS product_name
FROM orders        o
JOIN customers     c  ON c.customer_id = o.customer_id
JOIN order_items   oi ON oi.order_id   = o.order_id
JOIN products      p  ON p.sku         = oi.sku
ORDER BY o.order_date DESC
LIMIT 100;
Enter fullscreen mode Exit fullscreen mode
// MongoDB — $lookup pipeline (one round trip but cross-collection)
db.orders.aggregate([
  { $lookup: { from: "customers", localField: "customer_id", foreignField: "_id", as: "customer" }},
  { $unwind: "$items" },
  { $lookup: { from: "products", localField: "items.sku", foreignField: "sku", as: "product" }},
  { $sort: { order_date: -1 }},
  { $limit: 100 }
]);
// $lookup is fine for small joins; performance dies past tens of thousands of matched docs.
Enter fullscreen mode Exit fullscreen mode
# DynamoDB — fan-out from the application
orders = ddb.query(TableName="orders", ...)["Items"][:100]
customer_ids = {o["customer_id"] for o in orders}
customers = ddb.batch_get_item(RequestItems={
    "customers": {"Keys": [{"id": cid} for cid in customer_ids]}
})
# Then fan out again for products. Three round trips, manual stitching.
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Postgres runs the whole shape in one query. The planner chooses hash joins, uses indexes on the foreign keys, and ships the result in a single round trip.
  2. Mongo's $lookup is the closest thing to a JOIN in the document world. It works, but the engine cannot optimise it the way Postgres does — large $lookup joins are notoriously slow.
  3. DynamoDB has no JOIN. The application code does the work: query orders, collect customer IDs, batch-get customers, collect SKUs, batch-get products, stitch in code. Three round trips minimum.
  4. The latency cost scales with the number of collections joined. SQL collapses N joins into one network hop; NoSQL multiplies them.

Output.

Engine Round trips Lines of code Planner optimisation
Postgres 1 ~10 SQL yes (hash / merge / nested-loop)
MongoDB 1 (pipeline) ~15 partial ($lookup is single-shot)
DynamoDB 3+ ~50 none (application is the planner)

Rule of thumb. If your domain has more than two heavily-joined entities, SQL wins on developer time, query planning, and round-trip count. Reach for NoSQL only when the unit of work is naturally self-contained — one order = one document — or when the access pattern is so dominated by one key that joins are rare.

Worked example — consistency: strong default vs tunable per-request

Detailed explanation. "Consistency" in SQL means ACID — every committed transaction is visible to subsequent transactions on the same connection. In NoSQL, consistency is usually tunable: Cassandra lets you pick per query (ONE, LOCAL_QUORUM, QUORUM, ALL); DynamoDB defaults to eventual reads with a ConsistentRead=true flag for strong reads.

Question. A banking app reads "current account balance." Why is the SQL default safe, and what flag would you set in DynamoDB to match it?

Input. A transfer just credited $100 to account A; the client reads "balance" 1 ms later.

Code.

-- Postgres — strong consistency on the primary
BEGIN;
UPDATE accounts SET balance = balance + 100 WHERE id = 'A';
COMMIT;

-- Subsequent read on the same primary sees the new value.
SELECT balance FROM accounts WHERE id = 'A';
Enter fullscreen mode Exit fullscreen mode
# DynamoDB — eventually consistent by default
resp = ddb.get_item(TableName="accounts", Key={"id": {"S": "A"}})
# May return the OLD value if the read hits a replica that has not caught up.

# Force a strongly consistent read
resp = ddb.get_item(TableName="accounts", Key={"id": {"S": "A"}},
                    ConsistentRead=True)
# Now guaranteed to see the latest committed value.
# Costs 2x the RCU.
Enter fullscreen mode Exit fullscreen mode
# Cassandra — tunable per query
INSERT INTO accounts (id, balance) VALUES ('A', 100)
  USING CONSISTENCY QUORUM;

SELECT balance FROM accounts WHERE id = 'A'
  USING CONSISTENCY QUORUM;
-- W=QUORUM, R=QUORUM => strongly consistent if R+W > N
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Postgres on the primary is strongly consistent by default — a committed write is visible on the next read. Read replicas may lag (milliseconds), but the primary is the source of truth.
  2. DynamoDB's default eventually consistent read may hit a replica that lags by up to ~1 second after a write. For a balance, that is unacceptable — flip ConsistentRead=true. The cost is 2× the RCU.
  3. Cassandra's R + W > N rule produces strong consistency when the replicated copies overlap: with N=3, W=QUORUM=2, R=QUORUM=2, at least one of the 2 read replicas has the latest committed write.
  4. The interview line is "every NoSQL engine is tunable along the consistency / latency / availability spectrum — pick the trade-off per query, not per engine."

Output.

Engine Default Strong-consistency cost
Postgres primary strong baseline
Postgres replica eventual (ms lag) route to primary
DynamoDB eventual (~1 s) 2× RCU + ConsistentRead=true
Cassandra per-query R + W > N

Rule of thumb. Never rely on the engine default for consistency-critical reads. Bank balance, inventory count, "did the password change?" — all need an explicit strong-consistency flag. Eventually consistent reads are fine for "user's last 10 trips," "show product reviews," and "render a feed."

SQL interview question on choosing between SQL and NoSQL

A senior interviewer often frames this as: "Your team is debating Postgres vs MongoDB for a new SaaS app. The team is small, the schema will evolve, and traffic is projected at 1k QPS in year one. Make the call and defend it." The probe tests whether the candidate weighs operational simplicity against speculative future scale.

Solution Using a workload + ops calculus

1. Today's traffic — 1k QPS
   * Both Postgres and Mongo handle 1k QPS on a single $50/month box.
   * Neither is the bottleneck. Engine choice is about people, not throughput.

2. Schema flexibility
   * Mongo: no schema, validate in application.
   * Postgres + JSONB: schema for the stable surface, JSONB for the evolving part.
   * Tie, with a slight Postgres edge for the relational core.

3. Team size — small
   * Operating 1 Postgres is one runbook.
   * Operating 1 Mongo is one runbook.
   * Tie. The cost is in years 2-3, not year 1.

4. Future scale — projected 100k QPS year three
   * Postgres can hit 100k QPS read with replicas; 100k QPS write needs Citus.
   * Mongo shards natively, but the team will pay the operational cost early.
   * Edge: depends on workload shape.

5. The recommendation
   * Start with Postgres. Add Redis cache when read traffic warrants.
   * Postgres + JSONB covers schema flexibility for the next 18 months.
   * Re-evaluate the engine choice when actual scale (not projected) demands it.
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Decision axis Postgres weight MongoDB weight Winner
Today's traffic even even tie
Schema flexibility strong (+ JSONB) strong tie
Team ops simplicity strong strong tie
ACID transactions strong weaker Postgres
Future horizontal scale weaker strong MongoDB
Ecosystem (tools, dbt, BI) strong weaker Postgres

5 out of 6 axes either tie or favour Postgres. The only one that genuinely favours Mongo is "future horizontal scale" — and "future" is doing a lot of work in that sentence.

Output:

Recommendation Postgres + JSONB
Why wins or ties on every axis except speculative future scale
Add later Redis cache, read replicas, Citus only if scale demands
Anti-pattern optimising for projected 100k QPS instead of actual 1k QPS

Why this works — concept by concept:

  • Optimise for today's workload, not next year's hypothesis — engineering teams routinely over-engineer for a scale that never arrives. The cheapest year-one architecture is the one that is easiest to operate now and can be migrated later.
  • Postgres is the safe default — the modern engine has eaten most of the easy NoSQL use cases via JSONB, partial indexes, and the wider tooling ecosystem (dbt, BI, ORMs, replicas).
  • Re-evaluate on signal, not speculation — "we hit 50k QPS sustained and Postgres is the bottleneck" is a clear migration trigger. "We might hit 100k QPS in year three" is not.
  • Cache before sharding — Redis or Postgres's own buffer cache absorbs the read load that NoSQL is famous for handling. Most "we need NoSQL for reads" arguments dissolve in front of a properly-sized cache.
  • Two engines is a real cost — picking Mongo now means the team operates two database engines forever (Postgres for the warehouse exports, Mongo for the primary). That is not free.
  • Cost — Postgres start-up cost is one engine, one runbook. Mongo start-up cost is one engine, one runbook, plus a different mental model for transactions. They are close; the long-tail cost favours Postgres for most teams.

SQL
Topic — sql
SQL practice problems for data engineering

Practice →


3. The NoSQL family map

nosql is four shapes, not one — document, key-value, wide-column, graph

The mental model in one line: every NoSQL engine belongs to one of four families, and each family is optimised for one data shape and one query shape. Once you know the family map, "should we use Mongo or Cassandra?" is rarely a hard question — they solve different problems.

NoSQL family map diagram — four engine groups arranged as quadrants: document (Mongo, Couchbase), key-value (DynamoDB, Redis), wide-column (Cassandra, ScyllaDB, HBase), graph (Neo4j, Neptune, TigerGraph) — each with use cases and an indicative data-shape icon; on a light PipeCode card.

The four families in one table.

Family Data shape Query shape Engines Use cases
Document nested JSON per-document fetch + ad-hoc filter MongoDB, Couchbase, AWS DocumentDB CMS, product catalogues, mobile app backends, event payloads
Key-value opaque value behind a key GET / PUT / DELETE by key DynamoDB, Redis, Riak, Memcached session storage, shopping cart, leaderboard, hot cache
Wide-column sparse rows in a partition keyed by primary key range scan inside a partition Cassandra, ScyllaDB, HBase, Bigtable IoT telemetry, event logs, fan-out feeds, multi-DC writes
Graph nodes + edges with properties traversal across N hops Neo4j, Amazon Neptune, TigerGraph fraud detection, recommendation, knowledge graph, social network

Document — MongoDB and friends.

  • Data unit. A document is a nested JSON-like object: { _id, name, items: [...], metadata: {...} }. Up to 16 MB per document in Mongo.
  • Strength. "Load one entity with everything it owns" is one round trip. No joins. Schema-on-read.
  • Weakness. Cross-document queries (joins, aggregations) are slower than the relational equivalent. $lookup exists but is not free.
  • Indexing. B-tree secondary indexes per collection; text indexes; geospatial; compound indexes. Up to 64 per collection.
  • Sweet spot. Anything where the natural unit of work is a self-contained nested object — CMS articles, product catalogue entries, user profiles with embedded preferences.

Key-value — DynamoDB and Redis.

  • Data unit. A key maps to an opaque value (or, in DynamoDB, a structured item with attributes). Reads are GET by primary key.
  • Strength. Lowest possible read / write latency at scale. Predictable per-operation cost. Horizontal scale is the design.
  • Weakness. No JOIN, no GROUP BY, no ad-hoc query. Range queries need a sort key (DynamoDB) or you fall back to scans.
  • Indexing. DynamoDB has Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs); Redis has no secondary indexing — the application maintains separate keys for separate access patterns.
  • Sweet spot. Hot path of any high-QPS application: session store, shopping cart, leaderboard, real-time scoreboard.

Wide-column — Cassandra and ScyllaDB.

  • Data unit. A partition contains many rows, each keyed by a clustering key. Effectively a "wide row" of arbitrary length.
  • Strength. Unbounded write throughput. Multi-DC replication is native. Time-series-shaped data (one partition per device-day) scales linearly.
  • Weakness. Reads outside the partition key are expensive or impossible. No JOIN. Secondary indexes exist but are discouraged for high-cardinality columns.
  • Indexing. Primary key (partition + clustering). Secondary indexes per column but with sharp limits. Materialised views as a denormalised access pattern.
  • Sweet spot. Write-heavy workloads: IoT telemetry (one device-hour per partition), event logs, fan-out feeds, multi-region writes.

Graph — Neo4j and Neptune.

  • Data unit. Nodes (entities) and edges (relationships). Both have properties.
  • Strength. Traversals of arbitrary depth are constant-time per hop — relationship lookups are O(degree), not O(table size).
  • Weakness. Pure aggregations across all nodes are slow compared to a columnar warehouse. Not the right tool for analytics.
  • Indexing. Index on node label + property. Edges are first-class — no JOIN table required.
  • Sweet spot. Anything where the answer involves "find all X that are N hops from Y": fraud rings, recommendation engines, social networks, knowledge graphs.

Cross-cutting notes.

  • Mongo can do graph-ish via $graphLookup — recursive lookup up to a max depth. Works for shallow walks; Neo4j wins past 3 hops.
  • Redis can be cache, KV store, pub-sub, geospatial, and streams — multi-role. The 2024 license change pushed teams to Valkey or DragonflyDB but the data model is identical.
  • Cassandra and ScyllaDB share the same data model and query language — Scylla is a C++ rewrite for lower latency at the same throughput.
  • All four families ship managed offerings on AWS, GCP, Azure — DynamoDB (AWS-only), DocumentDB, ElastiCache, Keyspaces, Neptune; GCP Bigtable, Memorystore; Azure Cosmos DB (multi-model).

Worked example — same data, four shapes

Detailed explanation. A movie database lists movies, their actors, and viewer ratings. Each NoSQL family stores this differently. Seeing the same data in four shapes is the fastest way to internalise the family map.

Question. Sketch the storage layout for "movies + actors + ratings" in each of the four NoSQL families. Which family would you choose for the dominant query "find every movie an actor is in"?

Input. Three entities, with the relationship "actor ↔ movie" being many-to-many.

Code.

// (1) Document — Mongo
db.movies.insertOne({
  _id: "tt0133093",
  title: "The Matrix",
  year: 1999,
  actors: [{ id: "nm0000206", name: "Keanu Reeves" }, ...],
  ratings: { imdb: 8.7, votes: 1_900_000 }
});
Enter fullscreen mode Exit fullscreen mode
// (2) Key-value — DynamoDB single-table
PK = "MOVIE#tt0133093"   SK = "META"             -> movie attributes
PK = "MOVIE#tt0133093"   SK = "ACTOR#nm0000206"  -> link to actor
PK = "ACTOR#nm0000206"   SK = "MOVIE#tt0133093"  -> reverse link (or GSI)
Enter fullscreen mode Exit fullscreen mode
// (3) Wide-column — Cassandra
CREATE TABLE movies_by_actor (
  actor_id   text,
  movie_id   text,
  movie_title text,
  PRIMARY KEY (actor_id, movie_id)
);
-- One partition per actor; clustering key = movie_id.
-- Query "every movie Keanu is in" = one partition read.
Enter fullscreen mode Exit fullscreen mode
// (4) Graph — Neo4j
(Keanu:Actor {id: "nm0000206"}) -[:ACTED_IN]-> (Matrix:Movie {id: "tt0133093"})
// Query: MATCH (a:Actor {id: "nm0000206"})-[:ACTED_IN]->(m:Movie) RETURN m
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The document design embeds actors inside the movie. "Find every movie an actor is in" requires either a secondary index on actors.id (cheap in Mongo) or a scan.
  2. The key-value design models both directions explicitly. The reverse link ACTOR#... -> MOVIE#... makes "every movie Keanu is in" a single Query call.
  3. The wide-column design partitions by actor_id. "Every movie Keanu is in" is a single partition read — exactly what Cassandra is optimised for.
  4. The graph design represents the relationship as a first-class edge. The traversal (actor)-[:ACTED_IN]->(movie) is constant-time per hop, regardless of dataset size.

Output.

Family "Every movie Keanu is in" "Every actor in The Matrix" "Recommended for fans of Keanu"
Document secondary index lookup embedded list scan application aggregation
Key-value single Query (reverse link) single Query application aggregation
Wide-column single partition read needs a second table not natively
Graph one-hop traversal one-hop traversal multi-hop traversal (native)

Rule of thumb. If your dominant query is "across N hops of relationships," graph wins by a wide margin. If it is "everything keyed by one entity," wide-column or key-value wins. If it is "load one self-contained record," document wins. Pick the family whose data shape matches your query shape.

Worked example — when MongoDB is the right pick

Detailed explanation. A content management system (CMS) stores articles. Each article has a title, author, body, tags, comments, metadata, and a long tail of custom fields per template. Editorial workflows read one article at a time; analytics is a separate concern.

Question. Why is MongoDB a natural fit for this CMS, and what does the schema look like? How would you handle the "all articles by author X" query?

Input. Articles, authors, comments. Heterogeneous template-driven custom fields.

Code.

db.articles.insertOne({
  _id: ObjectId("..."),
  slug: "nosql-vs-sql",
  title: "NoSQL vs SQL for Data Engineering",
  author: { id: "au-123", name: "Jane Doe" },
  body_html: "...",
  tags: ["nosql", "sql", "postgres", "mongo"],
  comments: [
    { user: "...", text: "Great post!", at: ISODate("...") }
  ],
  metadata: {
    template: "long-form",
    seo: { primary_keyword: "nosql vs sql", search_intent: "informational" },
    custom: { /* whatever the template needs */ }
  }
});

// Secondary index for "all articles by author"
db.articles.createIndex({ "author.id": 1, "published_at": -1 });
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The article is a self-contained document — loading "the article page" is a single findOne() by slug. No joins, one round trip.
  2. Embedded comments scale until a single article hits ~1000+ comments — at which point Mongo recommends splitting into a separate comments collection with article_id as the foreign key.
  3. The "custom" subdocument absorbs template-specific fields without a schema migration. Editorial can ship a new template type without involving the database team.
  4. "All articles by author X" needs a secondary index on author.id — Mongo provides it, the query is fast.

Output.

Query Mongo plan
Load article by slug findOne({slug}) — uses unique index
All articles by author Range scan on {author.id, published_at} index
Articles tagged "nosql" Multikey index on tags
Full-text search text index OR Atlas Search

Rule of thumb. Reach for Mongo when (a) the unit of work is a nested document, (b) schemas drift, and (c) the dominant query is by document key. If you find yourself drawing an ERD with five tables and four foreign keys, you do not have a document — you have a relational schema. Use Postgres.

Worked example — when Cassandra is the right pick

Detailed explanation. An IoT platform ingests 100,000 sensor readings per second from 50,000 devices, replicated across three data centres. The dominant query is "show me the last 24 hours of readings for device X." Postgres cannot keep up at single-digit-ms write latency; MongoDB struggles with the multi-DC active-active write pattern.

Question. Sketch the Cassandra schema for this workload and explain why the partition key is the most important decision.

Input. 100k writes/s, multi-DC, per-device range reads.

Code.

-- Cassandra
CREATE TABLE sensor_readings (
    device_id   text,
    bucket_day  date,
    reading_ts  timestamp,
    temperature double,
    humidity    double,
    PRIMARY KEY ((device_id, bucket_day), reading_ts)
) WITH CLUSTERING ORDER BY (reading_ts DESC);

-- Query "last 24 hours for device X" = read 1 partition
SELECT * FROM sensor_readings
WHERE device_id = 'dev-42'
  AND bucket_day IN ('2026-06-05', '2026-06-06')
  AND reading_ts > '2026-06-05 14:00:00';
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The partition key is (device_id, bucket_day) — a composite key that ensures one partition per device per day. Daily buckets cap partition size at ~86,400 rows (one reading per second) — well within Cassandra's recommended 100 MB / 100k rows.
  2. The clustering key is reading_ts DESC — so the most-recent readings are at the start of each partition. "Last 24 hours" reads from the start of the partition.
  3. Writes scale horizontally: 100k writes/s spread across many devices means many partitions, which spread across cluster nodes by partition-key hash. No hot partition.
  4. Multi-DC replication is native: declare a keyspace with replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3} and Cassandra handles it.

Output.

Workload property Why Cassandra wins
100k writes/s LSM-tree storage absorbs writes fast
Time-series shape day-bucketed partitions scale linearly
Multi-DC active-active native replication strategy
Per-device range read one partition read
Ad-hoc analytics NOT Cassandra — use a warehouse downstream

Rule of thumb. Cassandra wins when the dominant query is "range scan inside one partition keyed by a hot ID" and the write rate exceeds what any single-node engine can sustain. If you cannot articulate the partition key in one sentence, you are not yet ready to design for Cassandra.

SQL interview question on choosing the right NoSQL family

A senior interviewer often probes: "We're building a real-time fraud detection system — every transaction needs to be checked against the user's last 50 transactions and against a graph of 'devices linked to other users.' What database(s) do you reach for, and why?" The probe tests whether the candidate maps query shape to family.

Solution Using two NoSQL families layered together

1. "User's last 50 transactions" — query shape: range scan by user_id
   * Family: key-value or wide-column.
   * Engine: DynamoDB (single AWS region) or Cassandra (multi-DC).
   * Schema: PK = user_id, SK = txn_ts DESC. Query returns top 50.
   * Latency: single-digit ms P99.

2. "Devices linked to other users (N hops)" — query shape: graph traversal
   * Family: graph.
   * Engine: Neo4j (self-managed) or Amazon Neptune (managed).
   * Schema: (User)-[:OWNS]->(Device)-[:OWNED_BY]->(User). Traverse 2-3 hops.
   * Latency: tens of ms for shallow traversals.

3. The combined pipeline
   * Stream new transactions to both engines (Kafka topic + Lambda fan-out).
   * Fraud check = one DynamoDB query + one Neptune traversal in parallel.
   * Combined P99: max(DynamoDB, Neptune) + scoring logic.

4. Anti-pattern
   * Trying to do the graph traversal inside DynamoDB.
   * Trying to do the per-user range scan inside Neo4j.
   * Each family is bad at the other's job.
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Sub-query Family Engine Cost driver
Last 50 txns per user Key-value / wide-column DynamoDB RCU per query
Devices linked to user Graph Neptune per-traversal compute
Cross-check + score Application Lambda compute ms

Each sub-query lives in the engine that is best at it. The application is the conductor — fans out the parallel reads and combines the answers.

Output:

Tier Engine Reason
Per-user range DynamoDB partition key + sort key
Graph traversal Neptune first-class edges
Streaming ingest Kafka high-throughput append
Audit + replay S3 cheap append-only

Why this works — concept by concept:

  • Layered NoSQL — not "pick one" — different sub-queries demand different families. The mature architecture composes engines, it does not pick a winner.
  • DynamoDB for hot OLTP — single-digit-ms P99 reads with a well-chosen partition key. The whole hot path of "is this transaction suspicious?" runs through it.
  • Graph for relationship shape — fraud rings are graph-shaped: shared devices, shared addresses, shared payment methods. Native graph traversals make "find every account 2 hops from X" tractable.
  • Streaming as the glue — Kafka (or AWS Kinesis) carries new transactions to both engines in parallel. The two engines are eventually consistent within seconds; the application tolerates that.
  • Object storage as the audit log — every transaction also lands in S3 partitioned by day. Cheap, compliant, replayable.
  • Cost — DynamoDB cost ∝ QPS; Neptune cost ∝ compute hours; Lambda cost ∝ executions. Each scales independently with its own demand curve, which is the whole point of decomposed architecture.

Graph
Topic — graph
Graph traversal problems (general)

Practice →


4. The CAP theorem quadrant — where each engine actually sits

cap theorem is two of three under partition — and every modern engine is actually tunable along the spectrum

The mental model in one line: CAP says you can have Consistency + Availability + Partition tolerance — but in a network with partitions (i.e. the real world), you must give up one of C or A; the third (P) is non-negotiable. The senior nuance is that no production engine is purely CP or AP — they ship knobs that let you tune the trade-off per query.

CAP theorem quadrant diagram — triangle of Consistency, Availability, Partition tolerance with database positions marked: CP (HBase, Mongo default, Spanner), AP (Cassandra, DynamoDB default, Riak), CA-flavoured (single-node Postgres / MySQL), plus a PACELC extension footnote; on a light PipeCode card.

The CAP refresher in five bullets.

  • C — Consistency. Every read receives the most recent write (or an error). Linearisability in the strict definition.
  • A — Availability. Every request receives a non-error response (without the guarantee it contains the most recent write).
  • P — Partition tolerance. The system continues to operate despite arbitrary message loss between nodes.
  • Pick 2 under partition. The original theorem says you cannot have all three when the network partitions. In practice, P is forced on you — networks fail — so the real choice is C or A.
  • No production engine is purely CP or AP. Cassandra ships strong-consistency knobs; DynamoDB ships strongly-consistent reads; Mongo ships read concerns. They are tunable along a spectrum.

Where the engines actually sit.

  • CP (consistency + partition tolerance). HBase, MongoDB (default majority read concern), Spanner, ZooKeeper, Etcd. Under partition, the minority side stops serving — preserves consistency.
  • AP (availability + partition tolerance). Cassandra, DynamoDB (default eventually consistent reads), Riak, CouchDB. Under partition, every node keeps serving — preserves availability — and reconciles after.
  • CA-flavoured. Single-node Postgres or MySQL. Strictly, "CA" is impossible under real partitions; in practice, a single node has no partitions to tolerate. The label is sloppy but you will see it in interviews.
  • Tunable. DynamoDB's ConsistentRead=true flag flips it from AP-ish to CP-ish. Cassandra's LOCAL_QUORUM / EACH_QUORUM / ALL flip it along the same axis.

The PACELC extension.

PACELC (pronounced "pass-elk") extends CAP to cover the day-to-day case where there is no partition:

  • P → A vs C (during partition, pick Availability or Consistency).
  • E → L vs C (Else — no partition — pick Latency or Consistency).
  • Most engines are PA / EL — under partition prefer availability, otherwise prefer latency. Cassandra, DynamoDB.
  • Some are PC / EC — under partition prefer consistency, otherwise still prefer consistency. Spanner, HBase.
  • The day-to-day trade-off is L vs C — network partitions are rare, but the latency / consistency choice is constant.

Common interview probes on CAP.

  • "Is Postgres CP or AP?" — single-node Postgres has no partitions; with synchronous replication it is CP; with async replication on the read replica it is AP-ish. Be specific.
  • "What does 'eventually consistent' actually mean?" — given enough time without new writes, every replica converges to the same value. The "given enough time" is doing work.
  • "Why is Spanner exceptional?" — TrueTime (atomic-clock-synchronised) lets it offer external consistency (linearisability) over geographically distributed shards. Trade-off is dollar cost.
  • "What is PACELC and why should I care?" — because partitions are rare; the daily trade-off is latency vs consistency, which PACELC names directly.

Worked example — DynamoDB's tunable consistency in code

Detailed explanation. DynamoDB defaults to eventually consistent reads (lower latency, lower cost). For consistency-critical operations, set ConsistentRead=true. Understanding the cost flip is interview gold.

Question. Given an "account balance" read, show the eventually-consistent and strongly-consistent paths in DynamoDB and explain the cost / latency trade-off.

Input. An account in DynamoDB, with concurrent reads after a write.

Code.

import boto3
ddb = boto3.client("dynamodb")

# Eventually consistent — default. Lower cost. May lag by ~1s.
resp = ddb.get_item(
    TableName="accounts",
    Key={"id": {"S": "A"}}
)
# Costs 0.5 RCU per 4 KB read.

# Strongly consistent. Reads from the primary replica. Guaranteed
# to see every successful write that completed before this call.
resp = ddb.get_item(
    TableName="accounts",
    Key={"id": {"S": "A"}},
    ConsistentRead=True
)
# Costs 1 RCU per 4 KB read — exactly 2x.

# NOTE: ConsistentRead is NOT available on Global Secondary Indexes.
# GSIs are always eventually consistent. If you need strong consistency
# AND a secondary access pattern, design a Local Secondary Index instead
# (limited to same partition key, so adjust the table model).
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The eventually consistent read may return a stale value — a write that committed 100 ms ago may not yet be visible on the replica that served the read. Typical lag: tens of ms; worst case: ~1 second.
  2. The strongly consistent read forces a routing decision: the read goes to the primary replica, which has the authoritative state. Guaranteed to see every prior committed write.
  3. The cost is 2× the RCU — DynamoDB literally charges twice for the privilege of consistency. At scale, this matters.
  4. GSIs cannot be strongly consistent — DynamoDB replicates from the base table to the GSI asynchronously. If you need strong consistency on a non-primary access pattern, use an LSI (limited to same partition key) or change the base table design.

Output.

Read flavour Latency P99 Cost per 4 KB Use case
Eventually consistent ~5–10 ms 0.5 RCU reads where ~1s lag is fine
Strongly consistent ~10–20 ms 1 RCU balance, password, "did I just buy this?"
GSI read always eventually consistent 0.5 RCU secondary access patterns

Rule of thumb. Default to eventually consistent reads. Flip to ConsistentRead=true only for the small subset of reads that genuinely require it — balance, lock acquisition, post-write read-your-own-writes. The cost difference is real at high QPS.

Worked example — Cassandra's tunable consistency by query

Detailed explanation. Cassandra ships per-query consistency levels — you set ONE, LOCAL_QUORUM, QUORUM, EACH_QUORUM, or ALL on every read and write. The classic R + W > N rule from Dynamo's 2007 paper gives strong consistency when the read and write quorums overlap.

Question. With replication factor N = 3 in a single DC, list the (W, R) combinations that produce strong consistency, and explain when you would pick EACH_QUORUM across two DCs.

Input. A keyspace with RF = 3 per DC.

Code.

-- Strong consistency requires R + W > N
-- For N = 3:
--   W = QUORUM (2) and R = QUORUM (2)  -> 2 + 2 = 4 > 3 (yes)
--   W = ONE  and R = ALL  (3)            -> 1 + 3 = 4 > 3 (yes, slow)
--   W = ALL  and R = ONE                 -> 3 + 1 = 4 > 3 (yes, fast read, slow write)

INSERT INTO accounts (id, balance) VALUES ('A', 100)
  USING CONSISTENCY QUORUM;

SELECT balance FROM accounts WHERE id = 'A'
  USING CONSISTENCY QUORUM;

-- Multi-DC: EACH_QUORUM forces a quorum in EVERY DC for the write.
INSERT INTO accounts (id, balance) VALUES ('A', 100)
  USING CONSISTENCY EACH_QUORUM;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. R + W > N is the canonical rule. With N=3, picking W=QUORUM (2) and R=QUORUM (2) guarantees overlap on at least 1 replica between the write quorum and the read quorum.
  2. LOCAL_QUORUM does the same but only within the local DC — fast, single-DC strong consistency. Best default for most workloads.
  3. EACH_QUORUM enforces a quorum in every DC. Slow (cross-DC latency) but guarantees that no DC is allowed to commit without the others knowing. Use for global financial state.
  4. W=ONE, R=ALL is fast write + slow read; W=ALL, R=ONE is the opposite. Pick the asymmetry based on which is more frequent in your workload.

Output.

W R R + W vs N Consistency
ONE ONE 2 < 3 eventual
QUORUM QUORUM 4 > 3 strong (single DC)
LOCAL_QUORUM LOCAL_QUORUM within DC strong locally, eventual across DCs
EACH_QUORUM LOCAL_QUORUM per-DC strong across DCs (slow)

Rule of thumb. Pick LOCAL_QUORUM for both reads and writes as the default in any multi-DC Cassandra deployment. Override to EACH_QUORUM only for the small subset of writes that must be visible globally before the application acknowledges them.

Worked example — Spanner / CockroachDB and the "external consistency" tier

Detailed explanation. Spanner uses Google's TrueTime API — atomic-clock-synchronised across data centres — to offer external consistency (linearisability) over a globally distributed database. CockroachDB does the same conceptually with a software clock + leader leases. Both are PC / EC — they pick consistency under partition AND in the no-partition case.

Question. Why is Spanner exceptional in the CAP landscape, and when would you actually need it?

Input. A workload that needs ACID transactions across multiple data centres.

Code.

-- Spanner SQL — looks like Postgres
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 'A';
UPDATE accounts SET balance = balance + 100 WHERE id = 'B';
COMMIT;
-- Both rows may live in different shards, in different DCs.
-- Spanner uses TrueTime to assign a global timestamp.
-- Every read at or after that timestamp sees both updates.
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. TrueTime gives every Spanner node a tight uncertainty bound on the current time (typically < 7 ms). Transactions wait out this uncertainty before committing, ensuring that the commit timestamp is unambiguously after every prior transaction's timestamp.
  2. External consistency means: if transaction T1 commits before T2 starts (in real-world wall-clock time), T2 sees T1's effects. Stronger than serialisability — it ties the order to the physical world.
  3. Reads at a snapshot timestamp can be served from any replica that has caught up to that timestamp — so reads are cheap and parallel.
  4. The dollar cost is significant — Spanner's premium pricing reflects the engineering. CockroachDB is an open-source approximation; YugabyteDB is another.

Output.

Property Spanner DynamoDB Global Tables Cassandra multi-DC
Cross-DC ACID yes no (last-write-wins) no (per-DC consistency)
External consistency yes no no
Latency P99 ~10s ms cross-DC single-digit ms local single-digit ms local
Cost premium RCU + storage self-managed

Rule of thumb. Spanner is the right tool when (a) you need ACID transactions across multiple data centres AND (b) the dollar cost is acceptable. Most workloads don't need (a); for those that do, no other engine matches it.

SQL interview question on applying CAP to a real workload

A senior interviewer often frames this: "You're designing a global inventory system — a customer in Europe shouldn't be able to buy the last item if a US customer just bought it. What database(s), what consistency model?" The probe tests whether the candidate reaches for the right CAP corner.

Solution Using Spanner (or a distributed lock) for the inventory decrement

Requirement: inventory.qty must never go below 0 globally.

1. Option A — Spanner / CockroachDB
   * Inventory row lives in one row, replicated across DCs.
   * Decrement in a transaction with WHERE qty > 0 guard.
   * Spanner serialises the two concurrent buys; one succeeds, one fails.
   * Latency: ~10s ms cross-DC.

2. Option B — Distributed lock (Redis Redlock or Zookeeper)
   * Acquire a lock on inventory[sku] before decrementing.
   * Decrement in any database (DynamoDB conditional write works too).
   * Latency: ~5-20 ms for lock acquisition.
   * Failure mode: lock-server outage stops every buy.

3. Option C — Eventual consistency + reconciliation
   * Allow overselling. Reconcile via cancel-and-refund.
   * Acceptable for low-margin goods; unacceptable for limited drops.
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Concurrent buy A (EU) Concurrent buy B (US) Engine Outcome
BEGIN tx BEGIN tx Spanner both wait for global timestamp
UPDATE qty -= 1 WHERE qty > 0 UPDATE qty -= 1 WHERE qty > 0 Spanner one succeeds, other rolls back
Cassandra with LOCAL_QUORUM Cassandra with LOCAL_QUORUM Cassandra both succeed in local DC; conflict resolves later → oversell
DynamoDB conditional write DynamoDB conditional write DynamoDB one succeeds, other gets ConditionalCheckFailed

DynamoDB conditional writes are the cheapest of the strong-consistency options for a single region. For genuinely global strong consistency, Spanner is the textbook answer.

Output:

Engine Cross-DC strong consistency Cost Latency
Spanner yes premium tens of ms
DynamoDB Global Tables no (last-write-wins) low single-digit ms local
DynamoDB single-region yes via condition low single-digit ms
Cassandra only via EACH_QUORUM self-managed tens of ms cross-DC

Why this works — concept by concept:

  • CAP forces a choice under partition — for global inventory, the choice is consistency (do not oversell) over availability (let some buys time out). That is a business decision, not an engineering one.
  • Conditional writes as a poor man's transaction — DynamoDB's ConditionExpression="qty > :zero" enforces the invariant atomically inside the engine. Cheaper than a full transaction; works inside one region.
  • Spanner's TrueTime — the only engine in the mainstream that provides external consistency over a globally distributed database. The cost is the price of "no oversell, ever, anywhere."
  • Locks as a workaround — distributed locks (Redlock, Zookeeper) decouple the consistency requirement from the database. They work but add a single point of failure.
  • Eventual consistency + reconciliation — the cheapest answer that is usually wrong. Acceptable for fungible inventory; unacceptable for limited drops or auctions.
  • PACELC framing — Spanner is PC / EC (consistency under partition, consistency without). DynamoDB Global Tables is PA / EL (availability under partition, latency without). The engine's PACELC class telegraphs which trade-off it prioritises.
  • Cost — Spanner premium > DynamoDB conditional writes > Cassandra self-managed. Pick the cheapest engine that satisfies the business invariant.

Design
Topic — design
System design problems with consistency trade-offs

Practice →


5. Decision tree — which database for which workload

The decision starts at the query shape, not the engine — six branches, six recommendations

The mental model in one line: start at the root "what is the query shape?", branch on scale, branch on consistency, branch on latency, branch on relationship-shape — every leaf is a specific engine with a defensible reason. The same decision tree applies whether the framing is nosql vs sql or sql vs nosql; the engine choice falls out at the leaf.

Decision tree diagram for picking a database — root question 'What is your query shape?' branching through scale, consistency, latency, and relationship-shape branches, with leaf recommendations: Postgres, DynamoDB, Cassandra, Mongo, Spanner, Neo4j; on a light PipeCode card.

The tree in seven branches.

  • Branch 1 — Query shape. Ad-hoc analytical / OLAP → columnar warehouse (Snowflake, BigQuery, Redshift, DuckDB). Known access pattern / OLTP → continue.
  • Branch 2 — Scale. Below ~10 TB OLTP → Postgres or MySQL. Above ~100 TB write-heavy → Cassandra / ScyllaDB or DynamoDB.
  • Branch 3 — Consistency. Strict (banking, inventory, ledger) → Postgres / Spanner / CockroachDB. Eventual acceptable → DynamoDB / Cassandra.
  • Branch 4 — Latency. Single-digit-ms P99 at scale → DynamoDB or Redis. Tens-of-ms acceptable → Postgres / Mongo.
  • Branch 5 — Relationship shape. First-class graph traversals > 3 hops → Neo4j / Neptune / TigerGraph. Shallow joins OK → Postgres recursive CTE.
  • Branch 6 — Multi-region active-active. Required → Cassandra, DynamoDB Global Tables, Spanner. Single region OK → simpler engines.
  • Branch 7 — Hybrid evolving schema. Relational core + JSON payload → Postgres + JSONB. Pure document → Mongo. Pair Postgres + Mongo only as a last resort.

Anti-patterns the tree screens out.

  • Mongo for analytics. Aggregation pipelines on millions of documents are slow. Use a columnar warehouse downstream.
  • DynamoDB without modelling the access pattern. "We'll add a GSI later" is how DynamoDB bills get out of hand.
  • Joining 4 tables in Cassandra. Cassandra is not a join engine. Denormalise into a wider row or use a different engine.
  • Neo4j as a primary OLTP store for non-graph workloads. Graph databases are slow at point reads compared to KV stores.
  • Postgres for unbounded write-heavy time-series at the scale of millions of writes/s. Use TimescaleDB (Postgres extension) for moderate scale, Cassandra for unbounded.

Cache layer is orthogonal. Redis / Memcached sit in front of the chosen primary, not as a substitute. Cache invalidation is the hard part — pick the primary first, add cache when the read profile demands it.

Common interview probes.

  • "When would you ever use MongoDB in 2026?" — when the unit of work is a self-contained nested document AND the schema genuinely evolves AND joins to other entities are rare. The set of workloads matching all three has shrunk; it has not disappeared.
  • "Is DynamoDB always cheaper than Postgres?" — no. At low QPS or for workloads with many access patterns, Postgres + the labour-cost saving of one engine often wins on total cost of ownership.
  • "What is a 'hot partition' and why does it matter?" — a partition that receives disproportionate traffic, throttling the entire table in DynamoDB / Cassandra. A bad partition-key choice creates one immediately.
  • "What is the typical write fan-out pattern in 2026?" — write to Kafka, fan out to (a) primary OLTP store, (b) warehouse via CDC, (c) S3 audit log. Three sinks, one source.

Worked example — picking the engine for a SaaS app's primary store

Detailed explanation. A B2B SaaS — accounting software — needs a primary OLTP store. Domain: 50 tenant companies (year 1), each with users, invoices, ledger entries, customers, products. Multi-tenant. Strict ACID across ledger entries.

Question. Walk the decision tree and explain which engine you would pick. What changes in year three when there are 5,000 tenants?

Input. Year 1: 50 tenants, ~100 GB total. Year 3: 5,000 tenants, ~5 TB total.

Code.

Branch 1 — Query shape: known access pattern, OLTP. -> continue
Branch 2 — Scale: 100 GB year 1, 5 TB year 3. Both fit Postgres.
Branch 3 — Consistency: strict (it's accounting). -> Postgres / Spanner.
Branch 4 — Latency: tens-of-ms acceptable. -> Postgres OK.
Branch 5 — Relationship: shallow joins (invoice -> line items). -> Postgres OK.
Branch 6 — Multi-region: not required year 1. -> single-region OK.
Branch 7 — Schema: stable accounting model + per-tenant custom fields.
            -> Postgres + JSONB for the custom fields.

Recommendation: Postgres single-region with JSONB for custom fields.
            tenant_id on every row + RLS policy for tenant isolation.

Year three (5 TB, 5000 tenants):
- Same engine. Postgres still fits.
- Add read replicas if reporting load grows.
- Consider Citus (sharding extension) only if write rate exceeds single-node capacity.
- Add Snowflake for cross-tenant analytics (e.g. benchmarking).
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The accounting domain is the canonical relational workload — ledgered entries, double-entry transactions, joins between invoices and customers and products. Postgres is the textbook fit.
  2. The "multi-tenant" requirement is satisfied with tenant_id on every row and Postgres Row-Level Security (RLS) policies. No separate engine needed.
  3. The "custom fields per tenant" requirement is satisfied with a JSONB custom_fields column. Per-tenant schema flexibility inside a relational engine.
  4. The year-three scale (5 TB) is still within a single-node Postgres budget on modern hardware. The migration path (Citus if writes overflow) is well-trodden.

Output.

Year Scale Engine Add-ons
1 100 GB Postgres single-region none
2 1 TB Postgres + read replica reporting offload
3 5 TB Postgres + Snowflake warehouse for cross-tenant analytics
5 50 TB Postgres + Citus sharding if needed

Rule of thumb. Default to Postgres until the workload demonstrably needs something else. The "do nothing extra" path covers more workloads than any other choice. The engineering you save in year one funds the migration in year five.

Worked example — picking the engine for a real-time fan-out feed

Detailed explanation. A social-feed product — every user follows hundreds of others, and the home feed shows the latest posts from everyone they follow. The dominant query is "give me the last 100 posts in my feed." Writes are dominated by "post fanout to N followers."

Question. Walk the decision tree and explain why this is a Cassandra workload, not a Postgres one.

Input. Per-user feed of latest posts from followed accounts. 10M users, 10M posts/day.

Code.

-- Cassandra
CREATE TABLE feed (
    user_id     uuid,
    posted_at   timestamp,
    post_id     uuid,
    author_id   uuid,
    summary     text,
    PRIMARY KEY ((user_id), posted_at, post_id)
) WITH CLUSTERING ORDER BY (posted_at DESC, post_id DESC);

-- Read: "last 100 posts in my feed"
SELECT * FROM feed
WHERE user_id = ?
LIMIT 100;

-- Write path: on new post, fan-out to every follower's feed
-- Implemented as an async worker reading from Kafka
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Branch 1: query shape is "give me the last 100 in my feed" — a range scan inside one partition.
  2. Branch 2: write rate is dominated by fan-out. One post by a user with 1M followers = 1M writes. 10M posts/day with average 100 followers = 1B writes/day = ~12k writes/sec. Postgres single-node can do it; Cassandra is cheaper at this scale.
  3. Branch 3: consistency is eventual — a feed entry showing 200 ms late is invisible to users.
  4. Branch 5: relationships are denormalised — every feed row stores the post summary so the read does not need a join.

The denormalisation is the price for one-partition reads. The fan-out cost is the price for instant-reads.

Output.

Property Postgres single-node Cassandra
12k writes/s sustained possible but expensive box designed for it
Feed read latency ~10 ms with index < 10 ms always
Multi-DC writes replica lag native active-active
Operational cost one engine cluster ops

Rule of thumb. Fan-out feeds are the textbook Cassandra workload — wide rows partitioned by user, sorted by time, append-mostly writes. If the feed has more than tens of thousands of writes/s sustained AND multi-region matters, Cassandra (or ScyllaDB) is the right pick.

Worked example — picking the engine for a session store

Detailed explanation. A login-session store — every authenticated user has a session record keyed by session token. Reads on every API call. Writes on login / logout / refresh. Records expire after 24 hours.

Question. Walk the decision tree. Which engine wins?

Input. 10M active users, each making ~100 API calls / day = 1B reads / day; ~10M writes / day.

Code.

# Redis — the canonical session store
import redis
r = redis.Redis()

# Set session with TTL
r.setex(f"sess:{token}", 86400, json.dumps({"user_id": uid, "scope": "..."}))

# Read on every API call
data = r.get(f"sess:{token}")
if data is None:
    return 401
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Branch 1: query shape is point read by session token. Pure key-value.
  2. Branch 4: latency must be sub-ms (every API call hits it). Redis is built for this.
  3. Sessions are ephemeral — TTL handles expiration automatically. No reconciliation needed.
  4. Redis handles 100k+ reads/s on a single node; the workload sits well within one instance.

Output.

Engine Per-read latency Fit
Postgres ~1 ms works, overkill
DynamoDB ~5 ms works, more expensive
Redis < 1 ms textbook fit

Rule of thumb. Pure key-value workloads with sub-ms latency budgets go to Redis (or DynamoDB if managed-AWS matters more than raw latency). If the session needs to survive a Redis-cluster failure, persist to a secondary store on write; the primary read path stays on Redis.

Worked example — picking the engine for a recommendation system

Detailed explanation. A product recommendation system answers "users who bought X also bought…" in real time. The data is a graph of users and products, with edges weighted by purchase events. The dominant query is multi-hop traversal.

Question. Walk the decision tree. Which family?

Input. 50M users, 1M products, 5B purchase events. Recommendations served in real time.

Code.

// Neo4j Cypher — "users who bought X also bought…"
MATCH (p:Product {id: $sku})<-[:BOUGHT]-(u:User)-[:BOUGHT]->(other:Product)
WHERE other.id <> $sku
RETURN other.id, count(*) AS co_purchase
ORDER BY co_purchase DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Branch 5: relationships are first-class — the query is a two-hop traversal (Product)<-(User)->(Product).
  2. The same query in Postgres requires self-joining a purchase_events table twice — possible, but slow at scale.
  3. Neo4j's storage is built for traversals — each node stores adjacency lists, so the next-hop lookup is O(degree), not O(table size).
  4. For static recommendations precomputed nightly, the answer might be Spark + Snowflake. For real-time, graph wins.

Output.

Engine Query latency at 5B events Real-time fit
Postgres self-join hundreds of ms no
Spark batch seconds precompute only
Neo4j tens of ms yes

Rule of thumb. Real-time multi-hop traversals → graph database. Batch precomputation → Spark + warehouse. Pick based on whether the freshness budget tolerates a nightly job or demands single-digit-second updates.

SQL interview question on picking a database for a new product

A senior interviewer often probes: "Your team is launching a real-time chat product — direct messages between users. What database(s) and why? Walk me through the decision." The probe tests whether the candidate maps the access patterns onto the decision tree.

Solution Using a Cassandra-shaped wide-row design

Access patterns
  1. "Show me my last 50 messages with user X"  - range scan by (me, peer)
  2. "Show all my conversations"                - inbox list per user
  3. "Notify on new message"                    - websocket / pub-sub

Decision tree walk
  Branch 1: query shape = range scan + point read - OLTP.
  Branch 2: write rate = millions of msgs/day, growing - prefer horizontal.
  Branch 3: consistency = eventual is fine (display order).
  Branch 4: latency = single-digit ms for read - Cassandra / DynamoDB.
  Branch 6: multi-region replication eventually required.

Recommendation
  - Cassandra (or ScyllaDB) as primary store.
  - Schema below partitions by conversation_id (sorted user pair).
  - Redis Pub-Sub for fan-out to websockets.
  - S3 + Snowflake for analytics (DAU, retention, etc.).
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Sub-system Engine Schema or pattern
Message store Cassandra ((conv_id), msg_ts DESC, msg_id)
Inbox per user Cassandra ((user_id), last_msg_ts DESC)
Real-time fan-out Redis Pub-Sub channel per conversation
Analytics Snowflake nightly CDC dump
Object storage S3 message blob if > 1 KB
-- Primary message table
CREATE TABLE messages (
    conv_id   text,
    msg_ts    timestamp,
    msg_id    uuid,
    sender_id uuid,
    body      text,
    PRIMARY KEY ((conv_id), msg_ts, msg_id)
) WITH CLUSTERING ORDER BY (msg_ts DESC, msg_id DESC);

-- Inbox materialised view per user
CREATE TABLE inbox (
    user_id      uuid,
    conv_id      text,
    last_msg_ts  timestamp,
    last_msg_preview text,
    PRIMARY KEY ((user_id), last_msg_ts)
) WITH CLUSTERING ORDER BY (last_msg_ts DESC);
Enter fullscreen mode Exit fullscreen mode

Output:

Workload tier Engine P99 latency
Read last 50 messages Cassandra < 10 ms
Read inbox list Cassandra (denormalised) < 10 ms
Real-time push Redis Pub-Sub < 50 ms
Analytics Snowflake minutes

Why this works — concept by concept:

  • Wide-row design — one Cassandra partition per conversation; reads are O(messages-shown), independent of the dataset size. Textbook fit.
  • Denormalised inbox — every new message writes to (a) the messages table and (b) the inboxes of both participants. Two writes for one user action, but reads become single-partition.
  • Redis as the fan-out layer — Cassandra is the durable store; Redis Pub-Sub is the low-latency push channel. Each engine does what it is best at.
  • S3 for media — message bodies stay in Cassandra; attached media (images, files) lives in S3 with a presigned URL.
  • Snowflake for analytics — DAU, retention, engagement metrics run on a nightly CDC export, not on the primary store.
  • Avoiding Postgres for this workload — Postgres would work for the first 10M messages and break around 100M without significant engineering. Picking Cassandra now means no migration later.
  • Cost — Cassandra scales cost ∝ data; Redis cost ∝ connections; Snowflake cost ∝ compute hours. Each tier scales on its own demand curve.

SQL
Topic — joins
Join problems for the relational tier

Practice →


Cheat sheet — NoSQL vs SQL recipes

  • Greenfield SaaS app. Start with Postgres + Redis cache. Add JSONB columns for evolving schema. Only move when you measure pain.
  • Event log ingest > 50k writes/s. Cassandra or Kafka + S3 — not Postgres single-node. TimescaleDB extension can buy you time on Postgres up to ~100k writes/s.
  • Single-table mobile backend. DynamoDB with composite PK + Global Secondary Indexes. Model every access pattern before the first write.
  • Heavy JOIN + reporting. Postgres for ad-hoc reporting < 1 TB; columnar warehouse (Snowflake / BigQuery / Redshift / DuckDB) above.
  • Schema-on-read JSON blobs. Postgres JSONB if joins to other tables matter; MongoDB if the document is fully self-contained.
  • Graph traversal > 3 hops. Neo4j or Neptune. Postgres recursive CTE only for shallow walks (1–2 hops).
  • Globally distributed strong consistency. Spanner or CockroachDB. Premium cost; the only engines that do it natively.
  • Hot path key-value at low latency. Redis as cache layer in front of any primary store. DynamoDB if you want the cache and the durable store in one engine.
  • Multi-DC active-active writes. Cassandra (LOCAL_QUORUM), DynamoDB Global Tables (last-write-wins), or Spanner (full ACID).
  • Audit log / append-only. S3 partitioned by day + Athena / Trino for ad-hoc. Cheapest tier in the stack.
  • Time-series telemetry. TimescaleDB (Postgres extension) for moderate scale; Cassandra / ScyllaDB for unbounded; ClickHouse for analytics over time-series.
  • Vector embeddings (LLM workloads). pgvector (Postgres) for moderate scale; Pinecone, Weaviate, Qdrant, or Milvus for production-scale RAG.
  • CAP corner reminders. CP under partition = stop serving the minority side (HBase, MongoDB default). AP = keep serving and reconcile (Cassandra, DynamoDB default). PACELC: most NoSQL is PA / EL; Spanner is PC / EC.
  • DynamoDB partition-key checklist. High cardinality, uniformly distributed writes, supports the dominant read pattern. If any check fails, redesign — hot partitions throttle the entire table.
  • Cassandra anti-pattern alarms. Joining, ad-hoc filters without a partition key, large secondary indexes on high-cardinality columns. Each one signals "wrong engine."
  • MongoDB anti-pattern alarms. Cross-collection $lookup over millions of documents, analytics queries, deeply-nested updates inside arrays.
  • is_active: false and tag_ids: [] reminder. This is the draft-publishing contract for PipeCode blogs — same shape as picking your initial engine. Defaults that flip at publication.

Frequently asked questions

Is NoSQL faster than SQL?

It depends on the operation and the workload. For point reads by a known key at high QPS, DynamoDB and Redis are typically faster than Postgres — single-digit-ms P99 with horizontal scale. For complex multi-table joins, Postgres is faster than any NoSQL engine because the planner is optimised for that shape. The honest answer is that "fast" depends on the access pattern: pick the engine whose data shape matches the dominant query, and you get the fastest answer for that workload. The nosql vs sql choice is rarely about raw speed — it is about which engine is fastest for your workload.

Can NoSQL replace SQL entirely?

Almost never in production. Even teams marketed as "NoSQL shops" typically run a SQL engine somewhere — usually a columnar warehouse (Snowflake, BigQuery, Redshift) for analytics. The decision tree branches on query shape: ad-hoc analytical queries belong in a columnar SQL engine, regardless of where the operational data lives. The mature 2026 architecture is "NoSQL for the hot path that fits it, SQL for the relational core and the warehouse." Picking one and only one engine is an anti-pattern at any meaningful scale.

When should I use MongoDB instead of Postgres?

When (a) the unit of work is a self-contained nested document, (b) the schema genuinely evolves on every release, and (c) joins to other entities are rare. The set of workloads matching all three has shrunk significantly since Postgres added JSONB and partial indexes — most "we picked Mongo for schema flexibility" cases now fit Postgres + JSONB better. Mongo still wins for CMS articles, product catalogues with template-driven custom fields, mobile app payloads, and similar self-contained document workloads. If your domain diagram has more than two heavily-joined entities, you have a relational schema and Postgres is the right pick.

Does DynamoDB support joins?

No — there is no JOIN clause in DynamoDB's API. The application code handles the equivalent: query the first table, collect the IDs, batch-get the second table, stitch in code. The architectural answer is to avoid the need for joins via single-table design — embed related entities inside the same partition keyed by a primary entity. The cost is upfront access-pattern modelling. The reward is single-digit-ms latency on every read. If your domain genuinely needs joins across many entities, DynamoDB is the wrong engine — pick Postgres or Spanner instead.

Is Cassandra better than MongoDB for write-heavy workloads?

For unbounded write-heavy time-series-shaped workloads, yes — Cassandra's LSM-tree storage is built for absorbing writes faster than MongoDB's B-tree storage. Cassandra also ships multi-DC active-active replication natively, which MongoDB requires explicit sharding to achieve. For mixed read / write workloads on self-contained documents, MongoDB is more developer-friendly and indexable. The rule of thumb: above ~50k writes/s sustained AND multi-DC matters, pick Cassandra (or ScyllaDB for lower latency at the same throughput). Below that, MongoDB's richer query language and indexing usually win.

What does CAP theorem mean in practice for choosing a database?

In practice, every distributed engine is tunable along the consistency / availability spectrum — pure CP and pure AP are positions on a slider, not fixed labels. The senior framing is PACELC: during partition (P), pick A or C; otherwise (E), pick L or C. Most NoSQL engines are PA / EL (availability under partition, latency without) — Cassandra, DynamoDB. Spanner is PC / EC (consistency everywhere) at premium cost. When picking a database, ask: "for the most consistency-sensitive read in my workload, what guarantee do I need, and does this engine ship a knob to give it to me?" If yes, your engine is acceptable. The CAP corner is a starting point; the per-query knob is the real lever.

Practice on PipeCode

Pipecode.ai is Leetcode for Data Engineering — every NoSQL vs SQL recipe above ships with hands-on practice rooms where you sketch the partition key, write the JOIN that Postgres handles natively, and reason about the CAP corner of the engine you just picked. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you never have to wonder whether your decision tree actually maps onto the workload you sketched on a whiteboard during the interview.

Practice database design now →
SQL practice drills →

Top comments (0)