nosql vs sql is the question that arrives on day one of every new data project and never quite goes away. The honest 2026 answer is that the dichotomy itself is misleading — most production teams run both — and the real decision is whether the workload wants a relational engine, a document store, a key-value store, a wide-column store, or a graph database. Pick the wrong shape and you spend the next year writing application-side joins, denormalising what should have been a single SQL statement, or paying for unbounded write throughput you do not need.
This guide is the workload-first answer to sql vs nosql that interviewers reward and that staff data engineers actually use when they pick storage. It walks the eight-axis decision matrix that separates the two models, the four-family NoSQL map (MongoDB, DynamoDB, Cassandra, Neo4j and their cousins), the CAP theorem with its day-to-day PACELC extension, and a workload-first decision tree across OLTP, OLAP, schema flexibility, consistency, latency, and global active-active. Each section ships a worked interview answer with code, a step-by-step trace, an output table, and a concept-by-concept why-this-works.
When you want hands-on reps immediately after reading, drill the database design practice library →, rehearse on SQL practice problems →, and stack the schema muscles with data modelling for DE interviews →.
On this page
- The NoSQL vs SQL question, reframed for 2026
- SQL vs NoSQL decision matrix
- The NoSQL family map
- The CAP theorem quadrant — where each engine actually sits
- Decision tree — which database for which workload
- Cheat sheet — NoSQL vs SQL recipes
- Frequently asked questions
- Practice on PipeCode
1. The NoSQL vs SQL question, reframed for 2026
nosql vs sql is the wrong framing — the real decision is relational vs document vs key-value vs wide-column vs graph
The one-sentence invariant: "SQL" describes a query language; "NoSQL" describes everything that is not relational, which is four distinct families with four distinct trade-offs. Once you stop treating NoSQL as a single thing, the interview question stops being "SQL or NoSQL?" and becomes "what is the query shape, the scale ceiling, and the consistency contract?" — three answers that point at exactly one engine.
Why the dichotomy is misleading.
- Most production teams run both. A typical SaaS stack pairs Postgres (OLTP source of truth) with Redis (cache), Snowflake (warehouse), and either DynamoDB or MongoDB (a single domain that does not fit relational well). Picking "SQL or NoSQL" implies a mutually exclusive choice that no senior architect actually makes.
-
Postgres + JSONB ate the easy NoSQL case. Since Postgres 9.4 (2014), the
jsonbtype plus GIN indexes have made "I picked Mongo because my schema was evolving" much harder to defend. A team can keep relational joins and schema-on-read in the same engine. - Distributed SQL closed the scale gap. Spanner, CockroachDB, YugabyteDB, and TiDB give horizontally sharded ACID without the application-side joins of pure NoSQL. "We need NoSQL for scale" is no longer the slam-dunk argument it was in 2012.
- The real question is access pattern. Every senior DE asks "model the access pattern first, the data second" — DynamoDB's single-table design is the textbook example, but the same rule applies to every engine.
The actual decision tree in one sentence.
Pick relational (Postgres / MySQL / Spanner) when joins are first-class and rows are < 10 TB; pick document (MongoDB / Couchbase) when the unit of work is a self-contained nested object; pick key-value (DynamoDB / Redis) when reads are point lookups by a known key at single-digit-ms latency; pick wide-column (Cassandra / ScyllaDB) when writes are unbounded and partitions are time-series-shaped; pick graph (Neo4j / Neptune) when traversals run more than three hops deep.
What interviewers listen for.
- Do you say "the question is never 'NoSQL or SQL', it is 'which engine for this workload'"? — senior signal.
- Do you mention access-pattern modelling before naming a specific engine? — required answer.
- Do you cite Postgres + JSONB as the modern hybrid for "we thought we needed Mongo"? — senior signal.
- Do you bring up PACELC when the conversation reaches CAP? — staff signal.
The 2026 reality in five bullets.
-
Postgres has eaten 60% of "we picked Mongo for schema flexibility" thanks to JSONB + partial indexes + the
pg_partmanecosystem. - DynamoDB and Cassandra still own write-heavy unbounded workloads — neither Postgres nor Spanner can match 100k writes/s at single-digit-ms P99 without significant engineering.
- Spanner / CockroachDB / YugabyteDB are the production answer when you genuinely need global strong consistency and horizontal scale.
- Redis is the universal cache layer, not a primary store — its 2024 license change pushed teams toward Valkey or DragonflyDB, but the role is identical.
- Vector databases (pgvector, Pinecone, Weaviate) are a sixth family driven by 2023–26 LLM workloads — not covered in detail here but worth naming in interviews.
Worked example — the same domain modelled in relational vs document
Detailed explanation. A new analyst is asked "should we use Postgres or MongoDB for our SaaS users + orders + line items?" The instinct is to compare features. The senior answer is to compare access patterns — what queries do we run, how often, with what latency budget?
Question. Sketch the same customers + orders + order_items domain in (a) Postgres relational, (b) MongoDB document, and (c) DynamoDB single-table. Explain which one wins when the dominant access pattern is "load one order with all its line items" vs "run quarterly revenue reports."
Input. Three logical entities — customer (1 row), order (1..N per customer), order_item (1..M per order).
Code.
-- (a) Postgres relational — three tables, foreign keys
CREATE TABLE customers (
customer_id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE
);
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
customer_id BIGINT REFERENCES customers(customer_id),
order_date DATE NOT NULL,
status TEXT NOT NULL
);
CREATE TABLE order_items (
order_item_id BIGSERIAL PRIMARY KEY,
order_id BIGINT REFERENCES orders(order_id),
sku TEXT NOT NULL,
qty INT NOT NULL,
unit_price NUMERIC(10,2)
);
// (b) MongoDB document — one document per order, line items embedded
db.orders.insertOne({
_id: ObjectId("..."),
customer: { id: 100, name: "Alice", email: "a@x.com" },
order_date: ISODate("2026-05-20"),
status: "placed",
items: [
{ sku: "SKU-1", qty: 2, unit_price: 9.99 },
{ sku: "SKU-7", qty: 1, unit_price: 29.99 }
]
});
// (c) DynamoDB single-table — composite key (PK, SK)
// PK = "CUST#100", SK = "META" -> customer attributes
// PK = "CUST#100", SK = "ORDER#2026-05-20#1" -> order header
// PK = "CUST#100", SK = "ORDER#2026-05-20#1#ITEM#1" -> line item
Step-by-step explanation.
- The relational design normalises every entity into its own table. Loading "one order with all line items" requires a two-table JOIN; running "quarterly revenue" is a single aggregation across
order_items— both natural. - The document design embeds line items inside the order document. Loading one order is a single
findOne()— zero joins, one round trip. But "quarterly revenue across all orders" requires either an aggregation pipeline ($unwind+$group) or a denormalised pre-aggregated collection. - The DynamoDB design picks the customer as the partition key. Every read for one customer is a single partition scan — extremely fast. Aggregations across all customers require either a Lambda fan-out or a downstream warehouse (Redshift, Snowflake).
- The winner depends on the dominant pattern. If 95% of reads are "load one order," document or single-table wins. If reporting is the dominant pattern, relational wins. Most teams run both — primary store for OLTP, warehouse for OLAP.
Output.
| Access pattern | Postgres | MongoDB | DynamoDB |
|---|---|---|---|
| Load one order + items | 2 joins | 1 doc fetch | 1 partition query |
| Quarterly revenue | 1 aggregation | aggregation pipeline | Lambda fan-out + warehouse |
| Add new field to items | ALTER TABLE | no schema change | no schema change |
| Strong consistency | default | default per replica | optional read flag |
Rule of thumb. Before you pick an engine, list every query the application will run, classify each as point-read, range-read, aggregation, or join, and count how often each runs. The most-frequent pattern determines the engine; the long-tail patterns determine whether you also need a warehouse or a secondary index.
Worked example — "we picked Mongo for schema flexibility" three years later
Detailed explanation. A 2022 team picked MongoDB because the product roadmap was unclear and the schema would evolve. In 2026 they are paying for it: every analytics report requires either a custom aggregation pipeline or an ETL job into Snowflake, and joins between collections are application-side. A senior architect asks: what would the same team build today?
Question. Given the same "evolving schema" constraint, would 2026 Postgres + JSONB be a better fit than MongoDB? Show how the same nested-order document maps onto a hybrid Postgres schema and what the team trades off.
Input — the same Mongo document.
{ order_id: 42, status: "placed", items: [{ sku: "A", qty: 2 }, { sku: "B", qty: 1 }] }
Code.
-- Postgres hybrid: relational columns + JSONB for the evolving part
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(customer_id),
order_date DATE NOT NULL,
status TEXT NOT NULL,
payload JSONB -- nested items, custom fields, A/B flags
);
-- GIN index on the JSONB column for ad-hoc filters
CREATE INDEX idx_orders_payload_gin ON orders USING GIN (payload);
-- Query items inside the JSONB column
SELECT order_id, payload->'items' AS items
FROM orders
WHERE payload @> '{"items": [{"sku": "A"}]}'::jsonb
AND order_date >= '2026-01-01';
Step-by-step explanation.
- The relational columns (
order_id,customer_id,order_date,status) cover the stable, joinable, indexable surface — every report that pivots on these columns runs at relational speed. - The
JSONBcolumn absorbs everything that evolves — nested items, A/B flags, experimental fields. Adding a new key requires no schema migration. - The GIN index makes containment queries (
@>) fast even on the JSONB column. Without it, Postgres falls back to a sequential scan on every JSONB filter. - The trade-off versus Mongo is honest: Postgres is harder to scale horizontally past ~5 TB without Citus, but the team keeps JOINs, ACID transactions across rows, and one engine to operate. In 2026 that trade-off favours Postgres for most SaaS teams under 5 TB.
Output.
| Property | MongoDB | Postgres + JSONB |
|---|---|---|
| Nested documents | native | JSONB column |
| Schema-on-read | yes | yes (JSONB) |
| Joins across collections | application-side | native SQL |
| ACID across rows | per-doc | full table-wide |
| Scale ceiling (single cluster) | high | ~5 TB (without Citus) |
Rule of thumb. If your data fits one of "evolving nested document," and joins to other tables are rare, Postgres + JSONB is the 2026 default. Reach for MongoDB only when the per-document query is dominant AND the cluster will outgrow Postgres's vertical limits.
Worked example — the "model access pattern first" rule on DynamoDB
Detailed explanation. DynamoDB punishes ad-hoc schema design more than any other engine. Without partitioning by the right key, every query is a full table scan; pick the wrong sort key and you cannot answer the most common question without a global secondary index. The senior approach is to list every access pattern before you write a single PUT.
Question. For a leaderboard feature — "top 10 players this week per game" plus "all scores for one player" — write the DynamoDB single-table design. Show why the access pattern drives the partition key.
Input. Two access patterns: (1) top-N scores per (game, week), (2) all scores for one player across games.
Code.
// Pattern 1 — "top 10 this week per game"
// PK = "GAME#tetris#WEEK#2026-W23"
// SK = score (sortable, DESCENDING via Query ScanIndexForward=false)
// Pattern 2 — "all scores for one player"
// GSI1PK = "PLAYER#alice"
// GSI1SK = "GAME#tetris#WEEK#2026-W23"
// Sample item
{
"PK": "GAME#tetris#WEEK#2026-W23",
"SK": 9_999_999,
"GSI1PK": "PLAYER#alice",
"GSI1SK": "GAME#tetris#WEEK#2026-W23",
"player": "alice",
"score": 9_999_999
}
Step-by-step explanation.
- Pattern 1 partitions by
(game, week)and sorts by score. The "top 10 this week" query is a single Query call on the primary index, returning the highest 10 scores in one round trip. - Pattern 2 needs the player as the partition key. We add a Global Secondary Index (GSI1) with
GSI1PK = PLAYER#aliceso that "all scores for alice" is also a single Query call. - Both patterns are O(1) on the dominant axis — partition key — which is what DynamoDB rewards. A naive design with
PK = PLAYER#alicewould force a global scan for pattern 1. - The cost is duplication: every item is indexed twice (primary + GSI1). DynamoDB charges for that, but the alternative — fan-out reads or downstream Lambda aggregation — is more expensive at scale.
Output.
| Access pattern | Index used | Cost (RCU per query) |
|---|---|---|
| Top 10 this week per game | Primary (PK + SK) | 1 RCU |
| All scores for one player | GSI1 | 1 RCU |
| Top 10 all-time per game (no week) | Primary scan | full-partition scan |
Rule of thumb. Before you provision any DynamoDB table, write down every access pattern, label each as "point read," "range scan," or "aggregation," and design the partition key + sort key + GSIs so that every pattern is a Query, never a Scan. If you cannot do this, you do not yet know your domain well enough for DynamoDB.
SQL interview question on SQL vs NoSQL framing
A senior interviewer often opens with: "We're building a logistics platform — 50 million rides per day, drivers query 'my last 10 trips' constantly, ops queries 'all rides in a city this hour' for live dashboards, finance runs nightly settlement aggregations. SQL or NoSQL?" The probe tests whether the candidate decomposes by access pattern or jumps to a flavour-of-the-month engine.
Solution Using a workload-first decomposition
Access pattern 1 — "driver's last 10 trips"
-> point read by driver_id, sorted desc by trip_time
-> partition key = driver_id, sort key = trip_time
-> DynamoDB Query call, single-digit ms
Access pattern 2 — "all rides in city X this hour"
-> range scan by (city, hour)
-> partition key = (city, hour_bucket), sort key = trip_time
-> DynamoDB GSI or Cassandra wide-row
Access pattern 3 — "nightly settlement aggregation"
-> OLAP scan across millions of rows
-> NOT DynamoDB / Cassandra — too expensive per RCU
-> Stream changes to S3 + Snowflake / BigQuery / Redshift
-> Run SQL aggregations in the warehouse
Step-by-step trace.
| Pattern | Frequency | Latency budget | Engine | Reason |
|---|---|---|---|---|
| Driver's last 10 trips | 10k QPS | < 50 ms P99 | DynamoDB | point read by partition key |
| Rides in city this hour | 1k QPS | < 200 ms P99 | DynamoDB GSI / Cassandra | range scan inside one partition |
| Nightly settlement | 1 / day | minutes ok | Snowflake / BigQuery | full-table aggregation |
| Audit trail / GDPR | 100 QPS | seconds ok | S3 + Athena | append-only, cheap storage |
After the trace, the architecture is obvious: two engines, not one. The hot path (patterns 1 + 2) runs on DynamoDB with a carefully designed key + GSI; the analytical path (pattern 3) runs on a columnar warehouse fed by Change Data Capture. Pattern 4 lives on object storage because the read pattern is "rare and forgiving."
Output:
| Tier | Engine | Cost driver | Latency |
|---|---|---|---|
| Hot OLTP | DynamoDB | RCU/WCU per access pattern | single-digit ms |
| Warm OLAP | Snowflake | compute warehouse seconds | minutes |
| Cold archive | S3 | GB-month | seconds |
Why this works — concept by concept:
- Workload-first decomposition — every access pattern is named, classified (point / range / aggregation), and quantified (QPS, latency budget). The engine falls out of the requirements, not the other way around.
- Two-tier OLTP + OLAP — the same data lands in two engines because the read patterns are fundamentally different. CDC keeps them in sync; the warehouse is eventually consistent (minutes), which is fine for nightly settlement.
-
Partition-key design first — for DynamoDB, the partition key choice locks in 80% of performance. Driver-id + trip-time partitions cleanly; pick a low-cardinality key (e.g.
country) and you create hot partitions and throttling. - No relational primary store — at 50M rides/day, a single Postgres instance cannot keep up at single-digit-ms P99 without significant engineering. Spanner could, but at a higher dollar cost.
- Object storage as the long-tail — S3 + Athena is the universal "I do not know how often this will be queried" answer. Cheap to store, payable per query, no schema migrations.
- Cost — DynamoDB cost scales linearly with QPS; warehouse cost scales with the data volume of nightly scans; S3 cost is negligible. The architecture is cheap precisely because each engine matches its workload.
Database design
Topic — database
Database design problems (general)
2. SQL vs NoSQL decision matrix
Eight axes, two columns — the matrix that turns sql vs nosql into a checklist
The mental model in one line: eight axes (schema, joins, transactions, consistency, scale model, query language, indexing, ops cost) each have a "SQL-favourable" answer and a "NoSQL-favourable" answer — count which column wins for your workload. Once the matrix is on paper, the engine choice usually decides itself.
The matrix in one table.
| Axis | SQL (Postgres / MySQL / Spanner) | NoSQL (Mongo / DynamoDB / Cassandra) |
|---|---|---|
| Schema | rigid, migration-managed | flexible, evolving documents |
| Joins | native, cost-based optimiser | application-side or denormalised |
| Transactions | full ACID across rows + tables | single-doc / single-partition (mostly) |
| Consistency | strong by default | tunable / eventual / per-request |
| Scale model | vertical + read replicas | horizontal · partition-key sharding |
| Query language | SQL (declarative) | MQL / CQL / API (imperative) |
| Indexing | B-tree, partial, expression, GIN | limited secondary indexes (per engine) |
| Ops cost | lower for OLTP up to ~5 TB | lower for unbounded write throughput |
Row-by-row commentary.
-
Schema. Postgres requires
ALTER TABLEfor every change; Mongo accepts a new field on the next insert. The 2026 hybrid is Postgres + JSONB — schema-on-read inside a relational engine. -
Joins. Postgres's optimiser cost-models hash, merge, and nested-loop joins; DynamoDB has no JOIN clause at all. Cassandra has
JOINsyntax only in DataStax Enterprise — the open-source path is application-side. -
Transactions. Postgres ACID is genuinely ACID: a transaction can span dozens of rows across many tables. Mongo got multi-document transactions in 4.0 (2018) but the recommended pattern is still "one document = one unit of work." DynamoDB supports
TransactWriteItemsup to 100 items but charges 2× the WCU. -
Consistency. Postgres is strongly consistent by default on the primary; read replicas drift by milliseconds. Cassandra is tunable per query (
ONE,QUORUM,ALL); DynamoDB is eventually consistent by default with a strongly-consistent read flag. - Scale model. Postgres scales vertically (bigger box) and via read replicas; horizontal write scale needs Citus or sharding-as-application-logic. Mongo and Cassandra shard natively on a chosen key.
- Query language. SQL is declarative — "what" not "how." MQL, CQL, and DynamoDB's API are progressively more imperative — they force you to think about partitions, indexes, and access patterns up-front.
- Indexing. Postgres has the richest index set (B-tree, hash, GIN, GiST, BRIN, partial, expression). Mongo has secondary indexes per collection but limits to 64 per collection. DynamoDB charges separately for every Global Secondary Index.
- Ops cost. A single Postgres instance below 5 TB is dramatically cheaper to operate than any sharded NoSQL cluster. Above 100 TB or 50k writes/s, the cost balance flips.
Reality checks for 2026.
- Postgres + JSONB blurs the schema row and partially the indexing row — GIN on JSONB is a near-substitute for Mongo's document indexes.
- Distributed SQL (Spanner / CockroachDB) closes the scale-model row — they ship horizontal sharding with ACID, at a premium.
- DynamoDB transactional writes (since 2018) and Mongo multi-document transactions (since 4.0) blur the transactions row — but both still recommend single-doc design as the happy path.
- Most production teams run both — Postgres for the relational core, one NoSQL engine for the hot path that does not fit relational.
Common interview probes.
- "Why is JOIN expensive in NoSQL?" — because the data is sharded across nodes by partition key; pulling matched pairs across shards is a cross-node operation, which most engines refuse to do at all.
- "Why is the schema flexible in NoSQL?" — because there is no central schema registry per collection; the application enforces shape. The cost is that schema drift becomes a runtime bug, not a deploy-time one.
- "Why is sharding harder in SQL?" — because foreign keys, joins, and ACID transactions all assume a single coherent view of the data. Sharding breaks that assumption; distributed SQL engines spend most of their engineering preserving it.
Worked example — schema evolution: ALTER TABLE vs no-op
Detailed explanation. A product team adds a loyalty_tier field to the user record. In Postgres this is an ALTER TABLE migration — locking, fast on small tables, slow with high concurrency on large tables. In Mongo it is a no-op until the next insert or update. The trade-off is "deploy-time pain" vs "runtime schema drift."
Question. Compare the schema-evolution story for adding loyalty_tier TEXT to a 200 GB users table in (a) Postgres and (b) MongoDB. What goes wrong if you skip the deploy-time pain?
Input. A users table / collection with 50 million rows / documents and steady write traffic.
Code.
-- (a) Postgres — explicit migration
ALTER TABLE users ADD COLUMN loyalty_tier TEXT;
-- Modern PG (>= 11): adding a nullable column is instant (metadata-only).
-- Backfill: UPDATE users SET loyalty_tier = 'silver' WHERE created_at < '2025-01-01';
-- The backfill is the slow part — batch it.
// (b) MongoDB — no-op until next write
db.users.updateOne(
{ _id: ObjectId("...") },
{ $set: { loyalty_tier: "silver" } }
);
// Pre-existing documents still lack the field.
// Application code MUST handle the "field absent" case:
const tier = user.loyalty_tier ?? "bronze";
Step-by-step explanation.
- The Postgres migration is metadata-only for adding a nullable column in PG 11+ — the column definition is recorded in the catalog, no row rewrite required. The backfill is what costs minutes-to-hours, and it is batchable.
- The Mongo update is per-document — there is no equivalent of
ALTER COLLECTION. Pre-existing documents lack the new field until they are touched by a write. - The Mongo "no-op" hides a runtime cost: every read path must defensively handle the "field absent" case. A new query
find({ loyalty_tier: "gold" })returns ZERO documents from the pre-existing population — they have noloyalty_tierto compare against. - The fix on the Mongo side is a one-time backfill:
db.users.updateMany({ loyalty_tier: { $exists: false } }, { $set: { loyalty_tier: "bronze" } }). Same shape as the Postgres backfill, different mental model.
Output.
| Step | Postgres | MongoDB |
|---|---|---|
| Add field | ALTER TABLE (metadata-only) | implicit, on next write |
| Backfill existing rows | UPDATE in batches | updateMany with $exists: false
|
| Read code | column always present | must handle absent field |
| Constraint enforcement | NOT NULL, CHECK at DB | application-side or $jsonSchema
|
Rule of thumb. "Flexible schema" does not mean "no schema" — it means "schema enforcement moves from the database to the application." If the team has the discipline to enforce shape in code reviews and a schema-validation layer (Mongo $jsonSchema, application-side Zod / Pydantic), Mongo's flexibility is real. Without that discipline, schema drift becomes a runtime bug magnet.
Worked example — joins: native SQL vs application-side fan-out
Detailed explanation. The single most common production reason teams regret picking NoSQL is "we need joins now." Mongo can $lookup, Cassandra cannot join at all, DynamoDB cannot either. Application-side joins fan out N round trips and lose the planner's ability to reorder or hash.
Question. "Show every order, with the customer's name and the product name on every line item." Write the Postgres SQL and the equivalent Mongo / DynamoDB approach. Compare round trips.
Input. Three entities — customers, orders, products. 100 orders to render.
Code.
-- Postgres — one query, one round trip
SELECT
o.order_id,
c.name AS customer_name,
oi.qty,
p.name AS product_name
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.sku = oi.sku
ORDER BY o.order_date DESC
LIMIT 100;
// MongoDB — $lookup pipeline (one round trip but cross-collection)
db.orders.aggregate([
{ $lookup: { from: "customers", localField: "customer_id", foreignField: "_id", as: "customer" }},
{ $unwind: "$items" },
{ $lookup: { from: "products", localField: "items.sku", foreignField: "sku", as: "product" }},
{ $sort: { order_date: -1 }},
{ $limit: 100 }
]);
// $lookup is fine for small joins; performance dies past tens of thousands of matched docs.
# DynamoDB — fan-out from the application
orders = ddb.query(TableName="orders", ...)["Items"][:100]
customer_ids = {o["customer_id"] for o in orders}
customers = ddb.batch_get_item(RequestItems={
"customers": {"Keys": [{"id": cid} for cid in customer_ids]}
})
# Then fan out again for products. Three round trips, manual stitching.
Step-by-step explanation.
- Postgres runs the whole shape in one query. The planner chooses hash joins, uses indexes on the foreign keys, and ships the result in a single round trip.
- Mongo's
$lookupis the closest thing to a JOIN in the document world. It works, but the engine cannot optimise it the way Postgres does — large$lookupjoins are notoriously slow. - DynamoDB has no JOIN. The application code does the work: query orders, collect customer IDs, batch-get customers, collect SKUs, batch-get products, stitch in code. Three round trips minimum.
- The latency cost scales with the number of collections joined. SQL collapses N joins into one network hop; NoSQL multiplies them.
Output.
| Engine | Round trips | Lines of code | Planner optimisation |
|---|---|---|---|
| Postgres | 1 | ~10 SQL | yes (hash / merge / nested-loop) |
| MongoDB | 1 (pipeline) | ~15 | partial ($lookup is single-shot) |
| DynamoDB | 3+ | ~50 | none (application is the planner) |
Rule of thumb. If your domain has more than two heavily-joined entities, SQL wins on developer time, query planning, and round-trip count. Reach for NoSQL only when the unit of work is naturally self-contained — one order = one document — or when the access pattern is so dominated by one key that joins are rare.
Worked example — consistency: strong default vs tunable per-request
Detailed explanation. "Consistency" in SQL means ACID — every committed transaction is visible to subsequent transactions on the same connection. In NoSQL, consistency is usually tunable: Cassandra lets you pick per query (ONE, LOCAL_QUORUM, QUORUM, ALL); DynamoDB defaults to eventual reads with a ConsistentRead=true flag for strong reads.
Question. A banking app reads "current account balance." Why is the SQL default safe, and what flag would you set in DynamoDB to match it?
Input. A transfer just credited $100 to account A; the client reads "balance" 1 ms later.
Code.
-- Postgres — strong consistency on the primary
BEGIN;
UPDATE accounts SET balance = balance + 100 WHERE id = 'A';
COMMIT;
-- Subsequent read on the same primary sees the new value.
SELECT balance FROM accounts WHERE id = 'A';
# DynamoDB — eventually consistent by default
resp = ddb.get_item(TableName="accounts", Key={"id": {"S": "A"}})
# May return the OLD value if the read hits a replica that has not caught up.
# Force a strongly consistent read
resp = ddb.get_item(TableName="accounts", Key={"id": {"S": "A"}},
ConsistentRead=True)
# Now guaranteed to see the latest committed value.
# Costs 2x the RCU.
# Cassandra — tunable per query
INSERT INTO accounts (id, balance) VALUES ('A', 100)
USING CONSISTENCY QUORUM;
SELECT balance FROM accounts WHERE id = 'A'
USING CONSISTENCY QUORUM;
-- W=QUORUM, R=QUORUM => strongly consistent if R+W > N
Step-by-step explanation.
- Postgres on the primary is strongly consistent by default — a committed write is visible on the next read. Read replicas may lag (milliseconds), but the primary is the source of truth.
- DynamoDB's default
eventually consistentread may hit a replica that lags by up to ~1 second after a write. For a balance, that is unacceptable — flipConsistentRead=true. The cost is 2× the RCU. - Cassandra's
R + W > Nrule produces strong consistency when the replicated copies overlap: withN=3, W=QUORUM=2, R=QUORUM=2, at least one of the 2 read replicas has the latest committed write. - The interview line is "every NoSQL engine is tunable along the consistency / latency / availability spectrum — pick the trade-off per query, not per engine."
Output.
| Engine | Default | Strong-consistency cost |
|---|---|---|
| Postgres primary | strong | baseline |
| Postgres replica | eventual (ms lag) | route to primary |
| DynamoDB | eventual (~1 s) | 2× RCU + ConsistentRead=true
|
| Cassandra | per-query | R + W > N |
Rule of thumb. Never rely on the engine default for consistency-critical reads. Bank balance, inventory count, "did the password change?" — all need an explicit strong-consistency flag. Eventually consistent reads are fine for "user's last 10 trips," "show product reviews," and "render a feed."
SQL interview question on choosing between SQL and NoSQL
A senior interviewer often frames this as: "Your team is debating Postgres vs MongoDB for a new SaaS app. The team is small, the schema will evolve, and traffic is projected at 1k QPS in year one. Make the call and defend it." The probe tests whether the candidate weighs operational simplicity against speculative future scale.
Solution Using a workload + ops calculus
1. Today's traffic — 1k QPS
* Both Postgres and Mongo handle 1k QPS on a single $50/month box.
* Neither is the bottleneck. Engine choice is about people, not throughput.
2. Schema flexibility
* Mongo: no schema, validate in application.
* Postgres + JSONB: schema for the stable surface, JSONB for the evolving part.
* Tie, with a slight Postgres edge for the relational core.
3. Team size — small
* Operating 1 Postgres is one runbook.
* Operating 1 Mongo is one runbook.
* Tie. The cost is in years 2-3, not year 1.
4. Future scale — projected 100k QPS year three
* Postgres can hit 100k QPS read with replicas; 100k QPS write needs Citus.
* Mongo shards natively, but the team will pay the operational cost early.
* Edge: depends on workload shape.
5. The recommendation
* Start with Postgres. Add Redis cache when read traffic warrants.
* Postgres + JSONB covers schema flexibility for the next 18 months.
* Re-evaluate the engine choice when actual scale (not projected) demands it.
Step-by-step trace.
| Decision axis | Postgres weight | MongoDB weight | Winner |
|---|---|---|---|
| Today's traffic | even | even | tie |
| Schema flexibility | strong (+ JSONB) | strong | tie |
| Team ops simplicity | strong | strong | tie |
| ACID transactions | strong | weaker | Postgres |
| Future horizontal scale | weaker | strong | MongoDB |
| Ecosystem (tools, dbt, BI) | strong | weaker | Postgres |
5 out of 6 axes either tie or favour Postgres. The only one that genuinely favours Mongo is "future horizontal scale" — and "future" is doing a lot of work in that sentence.
Output:
| Recommendation | Postgres + JSONB |
|---|---|
| Why | wins or ties on every axis except speculative future scale |
| Add later | Redis cache, read replicas, Citus only if scale demands |
| Anti-pattern | optimising for projected 100k QPS instead of actual 1k QPS |
Why this works — concept by concept:
- Optimise for today's workload, not next year's hypothesis — engineering teams routinely over-engineer for a scale that never arrives. The cheapest year-one architecture is the one that is easiest to operate now and can be migrated later.
- Postgres is the safe default — the modern engine has eaten most of the easy NoSQL use cases via JSONB, partial indexes, and the wider tooling ecosystem (dbt, BI, ORMs, replicas).
- Re-evaluate on signal, not speculation — "we hit 50k QPS sustained and Postgres is the bottleneck" is a clear migration trigger. "We might hit 100k QPS in year three" is not.
- Cache before sharding — Redis or Postgres's own buffer cache absorbs the read load that NoSQL is famous for handling. Most "we need NoSQL for reads" arguments dissolve in front of a properly-sized cache.
- Two engines is a real cost — picking Mongo now means the team operates two database engines forever (Postgres for the warehouse exports, Mongo for the primary). That is not free.
- Cost — Postgres start-up cost is one engine, one runbook. Mongo start-up cost is one engine, one runbook, plus a different mental model for transactions. They are close; the long-tail cost favours Postgres for most teams.
SQL
Topic — sql
SQL practice problems for data engineering
3. The NoSQL family map
nosql is four shapes, not one — document, key-value, wide-column, graph
The mental model in one line: every NoSQL engine belongs to one of four families, and each family is optimised for one data shape and one query shape. Once you know the family map, "should we use Mongo or Cassandra?" is rarely a hard question — they solve different problems.
The four families in one table.
| Family | Data shape | Query shape | Engines | Use cases |
|---|---|---|---|---|
| Document | nested JSON | per-document fetch + ad-hoc filter | MongoDB, Couchbase, AWS DocumentDB | CMS, product catalogues, mobile app backends, event payloads |
| Key-value | opaque value behind a key | GET / PUT / DELETE by key | DynamoDB, Redis, Riak, Memcached | session storage, shopping cart, leaderboard, hot cache |
| Wide-column | sparse rows in a partition keyed by primary key | range scan inside a partition | Cassandra, ScyllaDB, HBase, Bigtable | IoT telemetry, event logs, fan-out feeds, multi-DC writes |
| Graph | nodes + edges with properties | traversal across N hops | Neo4j, Amazon Neptune, TigerGraph | fraud detection, recommendation, knowledge graph, social network |
Document — MongoDB and friends.
-
Data unit. A document is a nested JSON-like object:
{ _id, name, items: [...], metadata: {...} }. Up to 16 MB per document in Mongo. - Strength. "Load one entity with everything it owns" is one round trip. No joins. Schema-on-read.
-
Weakness. Cross-document queries (joins, aggregations) are slower than the relational equivalent.
$lookupexists but is not free. - Indexing. B-tree secondary indexes per collection; text indexes; geospatial; compound indexes. Up to 64 per collection.
- Sweet spot. Anything where the natural unit of work is a self-contained nested object — CMS articles, product catalogue entries, user profiles with embedded preferences.
Key-value — DynamoDB and Redis.
- Data unit. A key maps to an opaque value (or, in DynamoDB, a structured item with attributes). Reads are GET by primary key.
- Strength. Lowest possible read / write latency at scale. Predictable per-operation cost. Horizontal scale is the design.
- Weakness. No JOIN, no GROUP BY, no ad-hoc query. Range queries need a sort key (DynamoDB) or you fall back to scans.
- Indexing. DynamoDB has Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs); Redis has no secondary indexing — the application maintains separate keys for separate access patterns.
- Sweet spot. Hot path of any high-QPS application: session store, shopping cart, leaderboard, real-time scoreboard.
Wide-column — Cassandra and ScyllaDB.
- Data unit. A partition contains many rows, each keyed by a clustering key. Effectively a "wide row" of arbitrary length.
- Strength. Unbounded write throughput. Multi-DC replication is native. Time-series-shaped data (one partition per device-day) scales linearly.
- Weakness. Reads outside the partition key are expensive or impossible. No JOIN. Secondary indexes exist but are discouraged for high-cardinality columns.
- Indexing. Primary key (partition + clustering). Secondary indexes per column but with sharp limits. Materialised views as a denormalised access pattern.
- Sweet spot. Write-heavy workloads: IoT telemetry (one device-hour per partition), event logs, fan-out feeds, multi-region writes.
Graph — Neo4j and Neptune.
- Data unit. Nodes (entities) and edges (relationships). Both have properties.
- Strength. Traversals of arbitrary depth are constant-time per hop — relationship lookups are O(degree), not O(table size).
- Weakness. Pure aggregations across all nodes are slow compared to a columnar warehouse. Not the right tool for analytics.
- Indexing. Index on node label + property. Edges are first-class — no JOIN table required.
- Sweet spot. Anything where the answer involves "find all X that are N hops from Y": fraud rings, recommendation engines, social networks, knowledge graphs.
Cross-cutting notes.
-
Mongo can do graph-ish via
$graphLookup— recursive lookup up to a max depth. Works for shallow walks; Neo4j wins past 3 hops. - Redis can be cache, KV store, pub-sub, geospatial, and streams — multi-role. The 2024 license change pushed teams to Valkey or DragonflyDB but the data model is identical.
- Cassandra and ScyllaDB share the same data model and query language — Scylla is a C++ rewrite for lower latency at the same throughput.
- All four families ship managed offerings on AWS, GCP, Azure — DynamoDB (AWS-only), DocumentDB, ElastiCache, Keyspaces, Neptune; GCP Bigtable, Memorystore; Azure Cosmos DB (multi-model).
Worked example — same data, four shapes
Detailed explanation. A movie database lists movies, their actors, and viewer ratings. Each NoSQL family stores this differently. Seeing the same data in four shapes is the fastest way to internalise the family map.
Question. Sketch the storage layout for "movies + actors + ratings" in each of the four NoSQL families. Which family would you choose for the dominant query "find every movie an actor is in"?
Input. Three entities, with the relationship "actor ↔ movie" being many-to-many.
Code.
// (1) Document — Mongo
db.movies.insertOne({
_id: "tt0133093",
title: "The Matrix",
year: 1999,
actors: [{ id: "nm0000206", name: "Keanu Reeves" }, ...],
ratings: { imdb: 8.7, votes: 1_900_000 }
});
// (2) Key-value — DynamoDB single-table
PK = "MOVIE#tt0133093" SK = "META" -> movie attributes
PK = "MOVIE#tt0133093" SK = "ACTOR#nm0000206" -> link to actor
PK = "ACTOR#nm0000206" SK = "MOVIE#tt0133093" -> reverse link (or GSI)
// (3) Wide-column — Cassandra
CREATE TABLE movies_by_actor (
actor_id text,
movie_id text,
movie_title text,
PRIMARY KEY (actor_id, movie_id)
);
-- One partition per actor; clustering key = movie_id.
-- Query "every movie Keanu is in" = one partition read.
// (4) Graph — Neo4j
(Keanu:Actor {id: "nm0000206"}) -[:ACTED_IN]-> (Matrix:Movie {id: "tt0133093"})
// Query: MATCH (a:Actor {id: "nm0000206"})-[:ACTED_IN]->(m:Movie) RETURN m
Step-by-step explanation.
- The document design embeds actors inside the movie. "Find every movie an actor is in" requires either a secondary index on
actors.id(cheap in Mongo) or a scan. - The key-value design models both directions explicitly. The reverse link
ACTOR#... -> MOVIE#...makes "every movie Keanu is in" a single Query call. - The wide-column design partitions by
actor_id. "Every movie Keanu is in" is a single partition read — exactly what Cassandra is optimised for. - The graph design represents the relationship as a first-class edge. The traversal
(actor)-[:ACTED_IN]->(movie)is constant-time per hop, regardless of dataset size.
Output.
| Family | "Every movie Keanu is in" | "Every actor in The Matrix" | "Recommended for fans of Keanu" |
|---|---|---|---|
| Document | secondary index lookup | embedded list scan | application aggregation |
| Key-value | single Query (reverse link) | single Query | application aggregation |
| Wide-column | single partition read | needs a second table | not natively |
| Graph | one-hop traversal | one-hop traversal | multi-hop traversal (native) |
Rule of thumb. If your dominant query is "across N hops of relationships," graph wins by a wide margin. If it is "everything keyed by one entity," wide-column or key-value wins. If it is "load one self-contained record," document wins. Pick the family whose data shape matches your query shape.
Worked example — when MongoDB is the right pick
Detailed explanation. A content management system (CMS) stores articles. Each article has a title, author, body, tags, comments, metadata, and a long tail of custom fields per template. Editorial workflows read one article at a time; analytics is a separate concern.
Question. Why is MongoDB a natural fit for this CMS, and what does the schema look like? How would you handle the "all articles by author X" query?
Input. Articles, authors, comments. Heterogeneous template-driven custom fields.
Code.
db.articles.insertOne({
_id: ObjectId("..."),
slug: "nosql-vs-sql",
title: "NoSQL vs SQL for Data Engineering",
author: { id: "au-123", name: "Jane Doe" },
body_html: "...",
tags: ["nosql", "sql", "postgres", "mongo"],
comments: [
{ user: "...", text: "Great post!", at: ISODate("...") }
],
metadata: {
template: "long-form",
seo: { primary_keyword: "nosql vs sql", search_intent: "informational" },
custom: { /* whatever the template needs */ }
}
});
// Secondary index for "all articles by author"
db.articles.createIndex({ "author.id": 1, "published_at": -1 });
Step-by-step explanation.
- The article is a self-contained document — loading "the article page" is a single
findOne()by slug. No joins, one round trip. - Embedded comments scale until a single article hits ~1000+ comments — at which point Mongo recommends splitting into a separate
commentscollection witharticle_idas the foreign key. - The "custom" subdocument absorbs template-specific fields without a schema migration. Editorial can ship a new template type without involving the database team.
- "All articles by author X" needs a secondary index on
author.id— Mongo provides it, the query is fast.
Output.
| Query | Mongo plan |
|---|---|
| Load article by slug |
findOne({slug}) — uses unique index |
| All articles by author | Range scan on {author.id, published_at} index |
| Articles tagged "nosql" | Multikey index on tags
|
| Full-text search | text index OR Atlas Search |
Rule of thumb. Reach for Mongo when (a) the unit of work is a nested document, (b) schemas drift, and (c) the dominant query is by document key. If you find yourself drawing an ERD with five tables and four foreign keys, you do not have a document — you have a relational schema. Use Postgres.
Worked example — when Cassandra is the right pick
Detailed explanation. An IoT platform ingests 100,000 sensor readings per second from 50,000 devices, replicated across three data centres. The dominant query is "show me the last 24 hours of readings for device X." Postgres cannot keep up at single-digit-ms write latency; MongoDB struggles with the multi-DC active-active write pattern.
Question. Sketch the Cassandra schema for this workload and explain why the partition key is the most important decision.
Input. 100k writes/s, multi-DC, per-device range reads.
Code.
-- Cassandra
CREATE TABLE sensor_readings (
device_id text,
bucket_day date,
reading_ts timestamp,
temperature double,
humidity double,
PRIMARY KEY ((device_id, bucket_day), reading_ts)
) WITH CLUSTERING ORDER BY (reading_ts DESC);
-- Query "last 24 hours for device X" = read 1 partition
SELECT * FROM sensor_readings
WHERE device_id = 'dev-42'
AND bucket_day IN ('2026-06-05', '2026-06-06')
AND reading_ts > '2026-06-05 14:00:00';
Step-by-step explanation.
- The partition key is
(device_id, bucket_day)— a composite key that ensures one partition per device per day. Daily buckets cap partition size at ~86,400 rows (one reading per second) — well within Cassandra's recommended 100 MB / 100k rows. - The clustering key is
reading_ts DESC— so the most-recent readings are at the start of each partition. "Last 24 hours" reads from the start of the partition. - Writes scale horizontally: 100k writes/s spread across many devices means many partitions, which spread across cluster nodes by partition-key hash. No hot partition.
- Multi-DC replication is native: declare a keyspace with
replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3}and Cassandra handles it.
Output.
| Workload property | Why Cassandra wins |
|---|---|
| 100k writes/s | LSM-tree storage absorbs writes fast |
| Time-series shape | day-bucketed partitions scale linearly |
| Multi-DC active-active | native replication strategy |
| Per-device range read | one partition read |
| Ad-hoc analytics | NOT Cassandra — use a warehouse downstream |
Rule of thumb. Cassandra wins when the dominant query is "range scan inside one partition keyed by a hot ID" and the write rate exceeds what any single-node engine can sustain. If you cannot articulate the partition key in one sentence, you are not yet ready to design for Cassandra.
SQL interview question on choosing the right NoSQL family
A senior interviewer often probes: "We're building a real-time fraud detection system — every transaction needs to be checked against the user's last 50 transactions and against a graph of 'devices linked to other users.' What database(s) do you reach for, and why?" The probe tests whether the candidate maps query shape to family.
Solution Using two NoSQL families layered together
1. "User's last 50 transactions" — query shape: range scan by user_id
* Family: key-value or wide-column.
* Engine: DynamoDB (single AWS region) or Cassandra (multi-DC).
* Schema: PK = user_id, SK = txn_ts DESC. Query returns top 50.
* Latency: single-digit ms P99.
2. "Devices linked to other users (N hops)" — query shape: graph traversal
* Family: graph.
* Engine: Neo4j (self-managed) or Amazon Neptune (managed).
* Schema: (User)-[:OWNS]->(Device)-[:OWNED_BY]->(User). Traverse 2-3 hops.
* Latency: tens of ms for shallow traversals.
3. The combined pipeline
* Stream new transactions to both engines (Kafka topic + Lambda fan-out).
* Fraud check = one DynamoDB query + one Neptune traversal in parallel.
* Combined P99: max(DynamoDB, Neptune) + scoring logic.
4. Anti-pattern
* Trying to do the graph traversal inside DynamoDB.
* Trying to do the per-user range scan inside Neo4j.
* Each family is bad at the other's job.
Step-by-step trace.
| Sub-query | Family | Engine | Cost driver |
|---|---|---|---|
| Last 50 txns per user | Key-value / wide-column | DynamoDB | RCU per query |
| Devices linked to user | Graph | Neptune | per-traversal compute |
| Cross-check + score | Application | Lambda | compute ms |
Each sub-query lives in the engine that is best at it. The application is the conductor — fans out the parallel reads and combines the answers.
Output:
| Tier | Engine | Reason |
|---|---|---|
| Per-user range | DynamoDB | partition key + sort key |
| Graph traversal | Neptune | first-class edges |
| Streaming ingest | Kafka | high-throughput append |
| Audit + replay | S3 | cheap append-only |
Why this works — concept by concept:
- Layered NoSQL — not "pick one" — different sub-queries demand different families. The mature architecture composes engines, it does not pick a winner.
- DynamoDB for hot OLTP — single-digit-ms P99 reads with a well-chosen partition key. The whole hot path of "is this transaction suspicious?" runs through it.
- Graph for relationship shape — fraud rings are graph-shaped: shared devices, shared addresses, shared payment methods. Native graph traversals make "find every account 2 hops from X" tractable.
- Streaming as the glue — Kafka (or AWS Kinesis) carries new transactions to both engines in parallel. The two engines are eventually consistent within seconds; the application tolerates that.
- Object storage as the audit log — every transaction also lands in S3 partitioned by day. Cheap, compliant, replayable.
- Cost — DynamoDB cost ∝ QPS; Neptune cost ∝ compute hours; Lambda cost ∝ executions. Each scales independently with its own demand curve, which is the whole point of decomposed architecture.
Graph
Topic — graph
Graph traversal problems (general)
4. The CAP theorem quadrant — where each engine actually sits
cap theorem is two of three under partition — and every modern engine is actually tunable along the spectrum
The mental model in one line: CAP says you can have Consistency + Availability + Partition tolerance — but in a network with partitions (i.e. the real world), you must give up one of C or A; the third (P) is non-negotiable. The senior nuance is that no production engine is purely CP or AP — they ship knobs that let you tune the trade-off per query.
The CAP refresher in five bullets.
- C — Consistency. Every read receives the most recent write (or an error). Linearisability in the strict definition.
- A — Availability. Every request receives a non-error response (without the guarantee it contains the most recent write).
- P — Partition tolerance. The system continues to operate despite arbitrary message loss between nodes.
- Pick 2 under partition. The original theorem says you cannot have all three when the network partitions. In practice, P is forced on you — networks fail — so the real choice is C or A.
- No production engine is purely CP or AP. Cassandra ships strong-consistency knobs; DynamoDB ships strongly-consistent reads; Mongo ships read concerns. They are tunable along a spectrum.
Where the engines actually sit.
-
CP (consistency + partition tolerance). HBase, MongoDB (default
majorityread concern), Spanner, ZooKeeper, Etcd. Under partition, the minority side stops serving — preserves consistency. - AP (availability + partition tolerance). Cassandra, DynamoDB (default eventually consistent reads), Riak, CouchDB. Under partition, every node keeps serving — preserves availability — and reconciles after.
- CA-flavoured. Single-node Postgres or MySQL. Strictly, "CA" is impossible under real partitions; in practice, a single node has no partitions to tolerate. The label is sloppy but you will see it in interviews.
-
Tunable. DynamoDB's
ConsistentRead=trueflag flips it from AP-ish to CP-ish. Cassandra'sLOCAL_QUORUM/EACH_QUORUM/ALLflip it along the same axis.
The PACELC extension.
PACELC (pronounced "pass-elk") extends CAP to cover the day-to-day case where there is no partition:
- P → A vs C (during partition, pick Availability or Consistency).
- E → L vs C (Else — no partition — pick Latency or Consistency).
- Most engines are PA / EL — under partition prefer availability, otherwise prefer latency. Cassandra, DynamoDB.
- Some are PC / EC — under partition prefer consistency, otherwise still prefer consistency. Spanner, HBase.
- The day-to-day trade-off is L vs C — network partitions are rare, but the latency / consistency choice is constant.
Common interview probes on CAP.
- "Is Postgres CP or AP?" — single-node Postgres has no partitions; with synchronous replication it is CP; with async replication on the read replica it is AP-ish. Be specific.
- "What does 'eventually consistent' actually mean?" — given enough time without new writes, every replica converges to the same value. The "given enough time" is doing work.
- "Why is Spanner exceptional?" — TrueTime (atomic-clock-synchronised) lets it offer external consistency (linearisability) over geographically distributed shards. Trade-off is dollar cost.
- "What is PACELC and why should I care?" — because partitions are rare; the daily trade-off is latency vs consistency, which PACELC names directly.
Worked example — DynamoDB's tunable consistency in code
Detailed explanation. DynamoDB defaults to eventually consistent reads (lower latency, lower cost). For consistency-critical operations, set ConsistentRead=true. Understanding the cost flip is interview gold.
Question. Given an "account balance" read, show the eventually-consistent and strongly-consistent paths in DynamoDB and explain the cost / latency trade-off.
Input. An account in DynamoDB, with concurrent reads after a write.
Code.
import boto3
ddb = boto3.client("dynamodb")
# Eventually consistent — default. Lower cost. May lag by ~1s.
resp = ddb.get_item(
TableName="accounts",
Key={"id": {"S": "A"}}
)
# Costs 0.5 RCU per 4 KB read.
# Strongly consistent. Reads from the primary replica. Guaranteed
# to see every successful write that completed before this call.
resp = ddb.get_item(
TableName="accounts",
Key={"id": {"S": "A"}},
ConsistentRead=True
)
# Costs 1 RCU per 4 KB read — exactly 2x.
# NOTE: ConsistentRead is NOT available on Global Secondary Indexes.
# GSIs are always eventually consistent. If you need strong consistency
# AND a secondary access pattern, design a Local Secondary Index instead
# (limited to same partition key, so adjust the table model).
Step-by-step explanation.
- The eventually consistent read may return a stale value — a write that committed 100 ms ago may not yet be visible on the replica that served the read. Typical lag: tens of ms; worst case: ~1 second.
- The strongly consistent read forces a routing decision: the read goes to the primary replica, which has the authoritative state. Guaranteed to see every prior committed write.
- The cost is 2× the RCU — DynamoDB literally charges twice for the privilege of consistency. At scale, this matters.
- GSIs cannot be strongly consistent — DynamoDB replicates from the base table to the GSI asynchronously. If you need strong consistency on a non-primary access pattern, use an LSI (limited to same partition key) or change the base table design.
Output.
| Read flavour | Latency P99 | Cost per 4 KB | Use case |
|---|---|---|---|
| Eventually consistent | ~5–10 ms | 0.5 RCU | reads where ~1s lag is fine |
| Strongly consistent | ~10–20 ms | 1 RCU | balance, password, "did I just buy this?" |
| GSI read | always eventually consistent | 0.5 RCU | secondary access patterns |
Rule of thumb. Default to eventually consistent reads. Flip to ConsistentRead=true only for the small subset of reads that genuinely require it — balance, lock acquisition, post-write read-your-own-writes. The cost difference is real at high QPS.
Worked example — Cassandra's tunable consistency by query
Detailed explanation. Cassandra ships per-query consistency levels — you set ONE, LOCAL_QUORUM, QUORUM, EACH_QUORUM, or ALL on every read and write. The classic R + W > N rule from Dynamo's 2007 paper gives strong consistency when the read and write quorums overlap.
Question. With replication factor N = 3 in a single DC, list the (W, R) combinations that produce strong consistency, and explain when you would pick EACH_QUORUM across two DCs.
Input. A keyspace with RF = 3 per DC.
Code.
-- Strong consistency requires R + W > N
-- For N = 3:
-- W = QUORUM (2) and R = QUORUM (2) -> 2 + 2 = 4 > 3 (yes)
-- W = ONE and R = ALL (3) -> 1 + 3 = 4 > 3 (yes, slow)
-- W = ALL and R = ONE -> 3 + 1 = 4 > 3 (yes, fast read, slow write)
INSERT INTO accounts (id, balance) VALUES ('A', 100)
USING CONSISTENCY QUORUM;
SELECT balance FROM accounts WHERE id = 'A'
USING CONSISTENCY QUORUM;
-- Multi-DC: EACH_QUORUM forces a quorum in EVERY DC for the write.
INSERT INTO accounts (id, balance) VALUES ('A', 100)
USING CONSISTENCY EACH_QUORUM;
Step-by-step explanation.
-
R + W > Nis the canonical rule. WithN=3, pickingW=QUORUM (2)andR=QUORUM (2)guarantees overlap on at least 1 replica between the write quorum and the read quorum. -
LOCAL_QUORUMdoes the same but only within the local DC — fast, single-DC strong consistency. Best default for most workloads. -
EACH_QUORUMenforces a quorum in every DC. Slow (cross-DC latency) but guarantees that no DC is allowed to commit without the others knowing. Use for global financial state. -
W=ONE, R=ALLis fast write + slow read;W=ALL, R=ONEis the opposite. Pick the asymmetry based on which is more frequent in your workload.
Output.
| W | R | R + W vs N | Consistency |
|---|---|---|---|
| ONE | ONE | 2 < 3 | eventual |
| QUORUM | QUORUM | 4 > 3 | strong (single DC) |
| LOCAL_QUORUM | LOCAL_QUORUM | within DC | strong locally, eventual across DCs |
| EACH_QUORUM | LOCAL_QUORUM | per-DC | strong across DCs (slow) |
Rule of thumb. Pick LOCAL_QUORUM for both reads and writes as the default in any multi-DC Cassandra deployment. Override to EACH_QUORUM only for the small subset of writes that must be visible globally before the application acknowledges them.
Worked example — Spanner / CockroachDB and the "external consistency" tier
Detailed explanation. Spanner uses Google's TrueTime API — atomic-clock-synchronised across data centres — to offer external consistency (linearisability) over a globally distributed database. CockroachDB does the same conceptually with a software clock + leader leases. Both are PC / EC — they pick consistency under partition AND in the no-partition case.
Question. Why is Spanner exceptional in the CAP landscape, and when would you actually need it?
Input. A workload that needs ACID transactions across multiple data centres.
Code.
-- Spanner SQL — looks like Postgres
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 'A';
UPDATE accounts SET balance = balance + 100 WHERE id = 'B';
COMMIT;
-- Both rows may live in different shards, in different DCs.
-- Spanner uses TrueTime to assign a global timestamp.
-- Every read at or after that timestamp sees both updates.
Step-by-step explanation.
- TrueTime gives every Spanner node a tight uncertainty bound on the current time (typically < 7 ms). Transactions wait out this uncertainty before committing, ensuring that the commit timestamp is unambiguously after every prior transaction's timestamp.
- External consistency means: if transaction T1 commits before T2 starts (in real-world wall-clock time), T2 sees T1's effects. Stronger than serialisability — it ties the order to the physical world.
- Reads at a snapshot timestamp can be served from any replica that has caught up to that timestamp — so reads are cheap and parallel.
- The dollar cost is significant — Spanner's premium pricing reflects the engineering. CockroachDB is an open-source approximation; YugabyteDB is another.
Output.
| Property | Spanner | DynamoDB Global Tables | Cassandra multi-DC |
|---|---|---|---|
| Cross-DC ACID | yes | no (last-write-wins) | no (per-DC consistency) |
| External consistency | yes | no | no |
| Latency P99 | ~10s ms cross-DC | single-digit ms local | single-digit ms local |
| Cost | premium | RCU + storage | self-managed |
Rule of thumb. Spanner is the right tool when (a) you need ACID transactions across multiple data centres AND (b) the dollar cost is acceptable. Most workloads don't need (a); for those that do, no other engine matches it.
SQL interview question on applying CAP to a real workload
A senior interviewer often frames this: "You're designing a global inventory system — a customer in Europe shouldn't be able to buy the last item if a US customer just bought it. What database(s), what consistency model?" The probe tests whether the candidate reaches for the right CAP corner.
Solution Using Spanner (or a distributed lock) for the inventory decrement
Requirement: inventory.qty must never go below 0 globally.
1. Option A — Spanner / CockroachDB
* Inventory row lives in one row, replicated across DCs.
* Decrement in a transaction with WHERE qty > 0 guard.
* Spanner serialises the two concurrent buys; one succeeds, one fails.
* Latency: ~10s ms cross-DC.
2. Option B — Distributed lock (Redis Redlock or Zookeeper)
* Acquire a lock on inventory[sku] before decrementing.
* Decrement in any database (DynamoDB conditional write works too).
* Latency: ~5-20 ms for lock acquisition.
* Failure mode: lock-server outage stops every buy.
3. Option C — Eventual consistency + reconciliation
* Allow overselling. Reconcile via cancel-and-refund.
* Acceptable for low-margin goods; unacceptable for limited drops.
Step-by-step trace.
| Concurrent buy A (EU) | Concurrent buy B (US) | Engine | Outcome |
|---|---|---|---|
| BEGIN tx | BEGIN tx | Spanner | both wait for global timestamp |
| UPDATE qty -= 1 WHERE qty > 0 | UPDATE qty -= 1 WHERE qty > 0 | Spanner | one succeeds, other rolls back |
| Cassandra with LOCAL_QUORUM | Cassandra with LOCAL_QUORUM | Cassandra | both succeed in local DC; conflict resolves later → oversell |
| DynamoDB conditional write | DynamoDB conditional write | DynamoDB | one succeeds, other gets ConditionalCheckFailed
|
DynamoDB conditional writes are the cheapest of the strong-consistency options for a single region. For genuinely global strong consistency, Spanner is the textbook answer.
Output:
| Engine | Cross-DC strong consistency | Cost | Latency |
|---|---|---|---|
| Spanner | yes | premium | tens of ms |
| DynamoDB Global Tables | no (last-write-wins) | low | single-digit ms local |
| DynamoDB single-region | yes via condition | low | single-digit ms |
| Cassandra | only via EACH_QUORUM | self-managed | tens of ms cross-DC |
Why this works — concept by concept:
- CAP forces a choice under partition — for global inventory, the choice is consistency (do not oversell) over availability (let some buys time out). That is a business decision, not an engineering one.
-
Conditional writes as a poor man's transaction — DynamoDB's
ConditionExpression="qty > :zero"enforces the invariant atomically inside the engine. Cheaper than a full transaction; works inside one region. - Spanner's TrueTime — the only engine in the mainstream that provides external consistency over a globally distributed database. The cost is the price of "no oversell, ever, anywhere."
- Locks as a workaround — distributed locks (Redlock, Zookeeper) decouple the consistency requirement from the database. They work but add a single point of failure.
- Eventual consistency + reconciliation — the cheapest answer that is usually wrong. Acceptable for fungible inventory; unacceptable for limited drops or auctions.
- PACELC framing — Spanner is PC / EC (consistency under partition, consistency without). DynamoDB Global Tables is PA / EL (availability under partition, latency without). The engine's PACELC class telegraphs which trade-off it prioritises.
- Cost — Spanner premium > DynamoDB conditional writes > Cassandra self-managed. Pick the cheapest engine that satisfies the business invariant.
Design
Topic — design
System design problems with consistency trade-offs
5. Decision tree — which database for which workload
The decision starts at the query shape, not the engine — six branches, six recommendations
The mental model in one line: start at the root "what is the query shape?", branch on scale, branch on consistency, branch on latency, branch on relationship-shape — every leaf is a specific engine with a defensible reason. The same decision tree applies whether the framing is nosql vs sql or sql vs nosql; the engine choice falls out at the leaf.
The tree in seven branches.
- Branch 1 — Query shape. Ad-hoc analytical / OLAP → columnar warehouse (Snowflake, BigQuery, Redshift, DuckDB). Known access pattern / OLTP → continue.
- Branch 2 — Scale. Below ~10 TB OLTP → Postgres or MySQL. Above ~100 TB write-heavy → Cassandra / ScyllaDB or DynamoDB.
- Branch 3 — Consistency. Strict (banking, inventory, ledger) → Postgres / Spanner / CockroachDB. Eventual acceptable → DynamoDB / Cassandra.
- Branch 4 — Latency. Single-digit-ms P99 at scale → DynamoDB or Redis. Tens-of-ms acceptable → Postgres / Mongo.
- Branch 5 — Relationship shape. First-class graph traversals > 3 hops → Neo4j / Neptune / TigerGraph. Shallow joins OK → Postgres recursive CTE.
- Branch 6 — Multi-region active-active. Required → Cassandra, DynamoDB Global Tables, Spanner. Single region OK → simpler engines.
- Branch 7 — Hybrid evolving schema. Relational core + JSON payload → Postgres + JSONB. Pure document → Mongo. Pair Postgres + Mongo only as a last resort.
Anti-patterns the tree screens out.
- Mongo for analytics. Aggregation pipelines on millions of documents are slow. Use a columnar warehouse downstream.
- DynamoDB without modelling the access pattern. "We'll add a GSI later" is how DynamoDB bills get out of hand.
- Joining 4 tables in Cassandra. Cassandra is not a join engine. Denormalise into a wider row or use a different engine.
- Neo4j as a primary OLTP store for non-graph workloads. Graph databases are slow at point reads compared to KV stores.
- Postgres for unbounded write-heavy time-series at the scale of millions of writes/s. Use TimescaleDB (Postgres extension) for moderate scale, Cassandra for unbounded.
Cache layer is orthogonal. Redis / Memcached sit in front of the chosen primary, not as a substitute. Cache invalidation is the hard part — pick the primary first, add cache when the read profile demands it.
Common interview probes.
- "When would you ever use MongoDB in 2026?" — when the unit of work is a self-contained nested document AND the schema genuinely evolves AND joins to other entities are rare. The set of workloads matching all three has shrunk; it has not disappeared.
- "Is DynamoDB always cheaper than Postgres?" — no. At low QPS or for workloads with many access patterns, Postgres + the labour-cost saving of one engine often wins on total cost of ownership.
- "What is a 'hot partition' and why does it matter?" — a partition that receives disproportionate traffic, throttling the entire table in DynamoDB / Cassandra. A bad partition-key choice creates one immediately.
- "What is the typical write fan-out pattern in 2026?" — write to Kafka, fan out to (a) primary OLTP store, (b) warehouse via CDC, (c) S3 audit log. Three sinks, one source.
Worked example — picking the engine for a SaaS app's primary store
Detailed explanation. A B2B SaaS — accounting software — needs a primary OLTP store. Domain: 50 tenant companies (year 1), each with users, invoices, ledger entries, customers, products. Multi-tenant. Strict ACID across ledger entries.
Question. Walk the decision tree and explain which engine you would pick. What changes in year three when there are 5,000 tenants?
Input. Year 1: 50 tenants, ~100 GB total. Year 3: 5,000 tenants, ~5 TB total.
Code.
Branch 1 — Query shape: known access pattern, OLTP. -> continue
Branch 2 — Scale: 100 GB year 1, 5 TB year 3. Both fit Postgres.
Branch 3 — Consistency: strict (it's accounting). -> Postgres / Spanner.
Branch 4 — Latency: tens-of-ms acceptable. -> Postgres OK.
Branch 5 — Relationship: shallow joins (invoice -> line items). -> Postgres OK.
Branch 6 — Multi-region: not required year 1. -> single-region OK.
Branch 7 — Schema: stable accounting model + per-tenant custom fields.
-> Postgres + JSONB for the custom fields.
Recommendation: Postgres single-region with JSONB for custom fields.
tenant_id on every row + RLS policy for tenant isolation.
Year three (5 TB, 5000 tenants):
- Same engine. Postgres still fits.
- Add read replicas if reporting load grows.
- Consider Citus (sharding extension) only if write rate exceeds single-node capacity.
- Add Snowflake for cross-tenant analytics (e.g. benchmarking).
Step-by-step explanation.
- The accounting domain is the canonical relational workload — ledgered entries, double-entry transactions, joins between invoices and customers and products. Postgres is the textbook fit.
- The "multi-tenant" requirement is satisfied with
tenant_idon every row and Postgres Row-Level Security (RLS) policies. No separate engine needed. - The "custom fields per tenant" requirement is satisfied with a
JSONB custom_fieldscolumn. Per-tenant schema flexibility inside a relational engine. - The year-three scale (5 TB) is still within a single-node Postgres budget on modern hardware. The migration path (Citus if writes overflow) is well-trodden.
Output.
| Year | Scale | Engine | Add-ons |
|---|---|---|---|
| 1 | 100 GB | Postgres single-region | none |
| 2 | 1 TB | Postgres + read replica | reporting offload |
| 3 | 5 TB | Postgres + Snowflake | warehouse for cross-tenant analytics |
| 5 | 50 TB | Postgres + Citus | sharding if needed |
Rule of thumb. Default to Postgres until the workload demonstrably needs something else. The "do nothing extra" path covers more workloads than any other choice. The engineering you save in year one funds the migration in year five.
Worked example — picking the engine for a real-time fan-out feed
Detailed explanation. A social-feed product — every user follows hundreds of others, and the home feed shows the latest posts from everyone they follow. The dominant query is "give me the last 100 posts in my feed." Writes are dominated by "post fanout to N followers."
Question. Walk the decision tree and explain why this is a Cassandra workload, not a Postgres one.
Input. Per-user feed of latest posts from followed accounts. 10M users, 10M posts/day.
Code.
-- Cassandra
CREATE TABLE feed (
user_id uuid,
posted_at timestamp,
post_id uuid,
author_id uuid,
summary text,
PRIMARY KEY ((user_id), posted_at, post_id)
) WITH CLUSTERING ORDER BY (posted_at DESC, post_id DESC);
-- Read: "last 100 posts in my feed"
SELECT * FROM feed
WHERE user_id = ?
LIMIT 100;
-- Write path: on new post, fan-out to every follower's feed
-- Implemented as an async worker reading from Kafka
Step-by-step explanation.
- Branch 1: query shape is "give me the last 100 in my feed" — a range scan inside one partition.
- Branch 2: write rate is dominated by fan-out. One post by a user with 1M followers = 1M writes. 10M posts/day with average 100 followers = 1B writes/day = ~12k writes/sec. Postgres single-node can do it; Cassandra is cheaper at this scale.
- Branch 3: consistency is eventual — a feed entry showing 200 ms late is invisible to users.
- Branch 5: relationships are denormalised — every feed row stores the post summary so the read does not need a join.
The denormalisation is the price for one-partition reads. The fan-out cost is the price for instant-reads.
Output.
| Property | Postgres single-node | Cassandra |
|---|---|---|
| 12k writes/s sustained | possible but expensive box | designed for it |
| Feed read latency | ~10 ms with index | < 10 ms always |
| Multi-DC writes | replica lag | native active-active |
| Operational cost | one engine | cluster ops |
Rule of thumb. Fan-out feeds are the textbook Cassandra workload — wide rows partitioned by user, sorted by time, append-mostly writes. If the feed has more than tens of thousands of writes/s sustained AND multi-region matters, Cassandra (or ScyllaDB) is the right pick.
Worked example — picking the engine for a session store
Detailed explanation. A login-session store — every authenticated user has a session record keyed by session token. Reads on every API call. Writes on login / logout / refresh. Records expire after 24 hours.
Question. Walk the decision tree. Which engine wins?
Input. 10M active users, each making ~100 API calls / day = 1B reads / day; ~10M writes / day.
Code.
# Redis — the canonical session store
import redis
r = redis.Redis()
# Set session with TTL
r.setex(f"sess:{token}", 86400, json.dumps({"user_id": uid, "scope": "..."}))
# Read on every API call
data = r.get(f"sess:{token}")
if data is None:
return 401
Step-by-step explanation.
- Branch 1: query shape is point read by session token. Pure key-value.
- Branch 4: latency must be sub-ms (every API call hits it). Redis is built for this.
- Sessions are ephemeral — TTL handles expiration automatically. No reconciliation needed.
- Redis handles 100k+ reads/s on a single node; the workload sits well within one instance.
Output.
| Engine | Per-read latency | Fit |
|---|---|---|
| Postgres | ~1 ms | works, overkill |
| DynamoDB | ~5 ms | works, more expensive |
| Redis | < 1 ms | textbook fit |
Rule of thumb. Pure key-value workloads with sub-ms latency budgets go to Redis (or DynamoDB if managed-AWS matters more than raw latency). If the session needs to survive a Redis-cluster failure, persist to a secondary store on write; the primary read path stays on Redis.
Worked example — picking the engine for a recommendation system
Detailed explanation. A product recommendation system answers "users who bought X also bought…" in real time. The data is a graph of users and products, with edges weighted by purchase events. The dominant query is multi-hop traversal.
Question. Walk the decision tree. Which family?
Input. 50M users, 1M products, 5B purchase events. Recommendations served in real time.
Code.
// Neo4j Cypher — "users who bought X also bought…"
MATCH (p:Product {id: $sku})<-[:BOUGHT]-(u:User)-[:BOUGHT]->(other:Product)
WHERE other.id <> $sku
RETURN other.id, count(*) AS co_purchase
ORDER BY co_purchase DESC
LIMIT 10;
Step-by-step explanation.
- Branch 5: relationships are first-class — the query is a two-hop traversal
(Product)<-(User)->(Product). - The same query in Postgres requires self-joining a
purchase_eventstable twice — possible, but slow at scale. - Neo4j's storage is built for traversals — each node stores adjacency lists, so the next-hop lookup is O(degree), not O(table size).
- For static recommendations precomputed nightly, the answer might be Spark + Snowflake. For real-time, graph wins.
Output.
| Engine | Query latency at 5B events | Real-time fit |
|---|---|---|
| Postgres self-join | hundreds of ms | no |
| Spark batch | seconds | precompute only |
| Neo4j | tens of ms | yes |
Rule of thumb. Real-time multi-hop traversals → graph database. Batch precomputation → Spark + warehouse. Pick based on whether the freshness budget tolerates a nightly job or demands single-digit-second updates.
SQL interview question on picking a database for a new product
A senior interviewer often probes: "Your team is launching a real-time chat product — direct messages between users. What database(s) and why? Walk me through the decision." The probe tests whether the candidate maps the access patterns onto the decision tree.
Solution Using a Cassandra-shaped wide-row design
Access patterns
1. "Show me my last 50 messages with user X" - range scan by (me, peer)
2. "Show all my conversations" - inbox list per user
3. "Notify on new message" - websocket / pub-sub
Decision tree walk
Branch 1: query shape = range scan + point read - OLTP.
Branch 2: write rate = millions of msgs/day, growing - prefer horizontal.
Branch 3: consistency = eventual is fine (display order).
Branch 4: latency = single-digit ms for read - Cassandra / DynamoDB.
Branch 6: multi-region replication eventually required.
Recommendation
- Cassandra (or ScyllaDB) as primary store.
- Schema below partitions by conversation_id (sorted user pair).
- Redis Pub-Sub for fan-out to websockets.
- S3 + Snowflake for analytics (DAU, retention, etc.).
Step-by-step trace.
| Sub-system | Engine | Schema or pattern |
|---|---|---|
| Message store | Cassandra | ((conv_id), msg_ts DESC, msg_id) |
| Inbox per user | Cassandra | ((user_id), last_msg_ts DESC) |
| Real-time fan-out | Redis Pub-Sub | channel per conversation |
| Analytics | Snowflake | nightly CDC dump |
| Object storage | S3 | message blob if > 1 KB |
-- Primary message table
CREATE TABLE messages (
conv_id text,
msg_ts timestamp,
msg_id uuid,
sender_id uuid,
body text,
PRIMARY KEY ((conv_id), msg_ts, msg_id)
) WITH CLUSTERING ORDER BY (msg_ts DESC, msg_id DESC);
-- Inbox materialised view per user
CREATE TABLE inbox (
user_id uuid,
conv_id text,
last_msg_ts timestamp,
last_msg_preview text,
PRIMARY KEY ((user_id), last_msg_ts)
) WITH CLUSTERING ORDER BY (last_msg_ts DESC);
Output:
| Workload tier | Engine | P99 latency |
|---|---|---|
| Read last 50 messages | Cassandra | < 10 ms |
| Read inbox list | Cassandra (denormalised) | < 10 ms |
| Real-time push | Redis Pub-Sub | < 50 ms |
| Analytics | Snowflake | minutes |
Why this works — concept by concept:
- Wide-row design — one Cassandra partition per conversation; reads are O(messages-shown), independent of the dataset size. Textbook fit.
- Denormalised inbox — every new message writes to (a) the messages table and (b) the inboxes of both participants. Two writes for one user action, but reads become single-partition.
- Redis as the fan-out layer — Cassandra is the durable store; Redis Pub-Sub is the low-latency push channel. Each engine does what it is best at.
- S3 for media — message bodies stay in Cassandra; attached media (images, files) lives in S3 with a presigned URL.
- Snowflake for analytics — DAU, retention, engagement metrics run on a nightly CDC export, not on the primary store.
- Avoiding Postgres for this workload — Postgres would work for the first 10M messages and break around 100M without significant engineering. Picking Cassandra now means no migration later.
- Cost — Cassandra scales cost ∝ data; Redis cost ∝ connections; Snowflake cost ∝ compute hours. Each tier scales on its own demand curve.
SQL
Topic — joins
Join problems for the relational tier
Cheat sheet — NoSQL vs SQL recipes
- Greenfield SaaS app. Start with Postgres + Redis cache. Add JSONB columns for evolving schema. Only move when you measure pain.
- Event log ingest > 50k writes/s. Cassandra or Kafka + S3 — not Postgres single-node. TimescaleDB extension can buy you time on Postgres up to ~100k writes/s.
- Single-table mobile backend. DynamoDB with composite PK + Global Secondary Indexes. Model every access pattern before the first write.
- Heavy JOIN + reporting. Postgres for ad-hoc reporting < 1 TB; columnar warehouse (Snowflake / BigQuery / Redshift / DuckDB) above.
- Schema-on-read JSON blobs. Postgres JSONB if joins to other tables matter; MongoDB if the document is fully self-contained.
- Graph traversal > 3 hops. Neo4j or Neptune. Postgres recursive CTE only for shallow walks (1–2 hops).
- Globally distributed strong consistency. Spanner or CockroachDB. Premium cost; the only engines that do it natively.
- Hot path key-value at low latency. Redis as cache layer in front of any primary store. DynamoDB if you want the cache and the durable store in one engine.
-
Multi-DC active-active writes. Cassandra (
LOCAL_QUORUM), DynamoDB Global Tables (last-write-wins), or Spanner (full ACID). - Audit log / append-only. S3 partitioned by day + Athena / Trino for ad-hoc. Cheapest tier in the stack.
- Time-series telemetry. TimescaleDB (Postgres extension) for moderate scale; Cassandra / ScyllaDB for unbounded; ClickHouse for analytics over time-series.
- Vector embeddings (LLM workloads). pgvector (Postgres) for moderate scale; Pinecone, Weaviate, Qdrant, or Milvus for production-scale RAG.
- CAP corner reminders. CP under partition = stop serving the minority side (HBase, MongoDB default). AP = keep serving and reconcile (Cassandra, DynamoDB default). PACELC: most NoSQL is PA / EL; Spanner is PC / EC.
- DynamoDB partition-key checklist. High cardinality, uniformly distributed writes, supports the dominant read pattern. If any check fails, redesign — hot partitions throttle the entire table.
- Cassandra anti-pattern alarms. Joining, ad-hoc filters without a partition key, large secondary indexes on high-cardinality columns. Each one signals "wrong engine."
-
MongoDB anti-pattern alarms. Cross-collection
$lookupover millions of documents, analytics queries, deeply-nested updates inside arrays. -
is_active: falseandtag_ids: []reminder. This is the draft-publishing contract for PipeCode blogs — same shape as picking your initial engine. Defaults that flip at publication.
Frequently asked questions
Is NoSQL faster than SQL?
It depends on the operation and the workload. For point reads by a known key at high QPS, DynamoDB and Redis are typically faster than Postgres — single-digit-ms P99 with horizontal scale. For complex multi-table joins, Postgres is faster than any NoSQL engine because the planner is optimised for that shape. The honest answer is that "fast" depends on the access pattern: pick the engine whose data shape matches the dominant query, and you get the fastest answer for that workload. The nosql vs sql choice is rarely about raw speed — it is about which engine is fastest for your workload.
Can NoSQL replace SQL entirely?
Almost never in production. Even teams marketed as "NoSQL shops" typically run a SQL engine somewhere — usually a columnar warehouse (Snowflake, BigQuery, Redshift) for analytics. The decision tree branches on query shape: ad-hoc analytical queries belong in a columnar SQL engine, regardless of where the operational data lives. The mature 2026 architecture is "NoSQL for the hot path that fits it, SQL for the relational core and the warehouse." Picking one and only one engine is an anti-pattern at any meaningful scale.
When should I use MongoDB instead of Postgres?
When (a) the unit of work is a self-contained nested document, (b) the schema genuinely evolves on every release, and (c) joins to other entities are rare. The set of workloads matching all three has shrunk significantly since Postgres added JSONB and partial indexes — most "we picked Mongo for schema flexibility" cases now fit Postgres + JSONB better. Mongo still wins for CMS articles, product catalogues with template-driven custom fields, mobile app payloads, and similar self-contained document workloads. If your domain diagram has more than two heavily-joined entities, you have a relational schema and Postgres is the right pick.
Does DynamoDB support joins?
No — there is no JOIN clause in DynamoDB's API. The application code handles the equivalent: query the first table, collect the IDs, batch-get the second table, stitch in code. The architectural answer is to avoid the need for joins via single-table design — embed related entities inside the same partition keyed by a primary entity. The cost is upfront access-pattern modelling. The reward is single-digit-ms latency on every read. If your domain genuinely needs joins across many entities, DynamoDB is the wrong engine — pick Postgres or Spanner instead.
Is Cassandra better than MongoDB for write-heavy workloads?
For unbounded write-heavy time-series-shaped workloads, yes — Cassandra's LSM-tree storage is built for absorbing writes faster than MongoDB's B-tree storage. Cassandra also ships multi-DC active-active replication natively, which MongoDB requires explicit sharding to achieve. For mixed read / write workloads on self-contained documents, MongoDB is more developer-friendly and indexable. The rule of thumb: above ~50k writes/s sustained AND multi-DC matters, pick Cassandra (or ScyllaDB for lower latency at the same throughput). Below that, MongoDB's richer query language and indexing usually win.
What does CAP theorem mean in practice for choosing a database?
In practice, every distributed engine is tunable along the consistency / availability spectrum — pure CP and pure AP are positions on a slider, not fixed labels. The senior framing is PACELC: during partition (P), pick A or C; otherwise (E), pick L or C. Most NoSQL engines are PA / EL (availability under partition, latency without) — Cassandra, DynamoDB. Spanner is PC / EC (consistency everywhere) at premium cost. When picking a database, ask: "for the most consistency-sensitive read in my workload, what guarantee do I need, and does this engine ship a knob to give it to me?" If yes, your engine is acceptable. The CAP corner is a starting point; the per-query knob is the real lever.
Practice on PipeCode
- Drill the database design practice library → for sharding, partitioning, and schema-design probes.
- Rehearse on SQL practice problems → for the joins, aggregations, and window-function muscles a Postgres-first stack demands.
- Sharpen the joins practice library → for the relational tier of a hybrid SQL + NoSQL system.
- Stack the graph traversal library → for the Neo4j / Neptune family of workloads.
- Layer the system design library → for end-to-end engine-picking probes.
- For the broader surface, read top data engineering interview questions →.
- Stack the prerequisites with the only 5 skills you need to become a data engineer →.
- Sharpen the schema axis with the data modelling for DE interviews course →.
- For Postgres + SQL depth, work through the SQL for data engineering interviews course →.
- For the system-design axis, the ETL system design for data engineering interviews course → covers how the two tiers wire together.
Pipecode.ai is Leetcode for Data Engineering — every NoSQL vs SQL recipe above ships with hands-on practice rooms where you sketch the partition key, write the JOIN that Postgres handles natively, and reason about the CAP corner of the engine you just picked. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you never have to wonder whether your decision tree actually maps onto the workload you sketched on a whiteboard during the interview.





Top comments (0)