Choosing Postgres, MongoDB, or DynamoDB: The 4 Questions That Actually Decide It

#postgres #architecture #mongodb #database

Book: Database Playbook
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Picture the startup CTO with a 14-tab spreadsheet comparing Postgres, MongoDB Atlas, DynamoDB, CockroachDB, and Aurora. Cost columns. Latency columns. Replication-lag columns. Three of the columns are wrong. Two of them are measuring different things under the same heading. The team picks DynamoDB because the spreadsheet has the most green cells in its row, and ten months later they are rebuilding the analytics path on Postgres because every cross-customer report requires a full table scan.

Feature matrices do not pick databases. The reason teams keep producing them is that the actual decision feels uncomfortable: it depends on what you know about your access patterns, your write semantics, your query mix, and whether you have a DBA. Four questions, in order, resolve about 80% of cases. The remaining 20% are the cases where you genuinely need both and end up with two stores.

The three options on the table:

Postgres — relational, mature, one of the few engines where you can change your mind later without rewriting your data layer. With pgvector 0.8 it now does ANN search well enough for most RAG workloads.
MongoDB — document store, Atlas dedicated tiers start around $57/month for M10 (AWS us-east-1, as of April 2026, per mongodb.com/pricing) and scale up smoothly. Schema flexibility that is sometimes a feature and sometimes a gun.
DynamoDB — key-value with an index sidecar, single-digit-ms reads at any scale. On-demand pricing is roughly $1.25 per million writes and $0.25 per million reads (us-east-1, as of April 2026, per the AWS DynamoDB pricing page), which is cheap until your access pattern requires GSIs — each GSI carries its own write cost, so a write fanned out across two or three indexes can multiply the effective per-write spend.

Here are the four questions.

Question 1: are your access patterns known up front, or exploratory?

This is the load-bearing question. Everything else routes through it.

DynamoDB tables are designed around access patterns. The partition key, sort key, and global secondary indexes are decisions you make on day one and live with forever. Adding a new query pattern after launch usually means a new GSI, a new write path, or a single-table redesign that effectively rewrites every row. If you cannot list the queries your service will serve at design time, DynamoDB will hurt you.

Postgres assumes you do not know your queries up front. You add an index when a query is slow. You add a column when a column is needed. The query planner re-optimizes on every statement. This is the freedom you trade away when you pick DynamoDB.

MongoDB sits in the middle. The schema is flexible: add a field, write code that reads it. Indexes are added without table rewrites. Aggregation pipelines handle the queries that do not fit the document model. You pay for that flexibility in operational complexity at scale, but you do not pay for it on day one.

Stable, high-throughput access patterns (session stores, feature flags, leaderboards): DynamoDB. Exploratory or product-driven patterns (every CRUD app, every dashboard, every B2B tool whose customers ask for new reports): Postgres. If you are between, MongoDB is a reasonable hedge.

Question 2: do you need strong consistency across multi-row writes?

If you charge a customer, decrement their balance, and write a transaction record, those three operations have to either all happen or all not happen. In Postgres, that is BEGIN; ...; COMMIT; and you go home. In MongoDB, you have multi-document transactions inside a replica set since 4.0, and they work, but they hold locks and they do not love long-running operations. In DynamoDB, you have TransactWriteItems with a per-call cap of 100 actions / 4 MB, and it counts as 2× the write capacity of the same writes done individually.

The honest version: Postgres is built for multi-row ACID and wears it lightly. MongoDB can do it. DynamoDB will let you do it, but the moment you depend on it you are working against the grain of the system.

If your writes are mostly single-row (a session, a counter, an event), the consistency question evaporates and the answer is whatever else the questions point to. If your writes are inherently multi-row (an order with line items, a transfer between accounts), Postgres is the path of least resistance, and you should pick MongoDB or DynamoDB only if Q1 forced you there and you are willing to write the saga.

Question 3: what is your single-key vs multi-attribute query mix?

DynamoDB is fast at single-key lookups and at single-key range scans. Its query model targets primary-key access; a request like "find all orders where status = pending AND amount > 100 AND placed_at > yesterday" — the multi-attribute filter any back-office tool needs — typically requires a GSI, a composite sort key, or a scan with filter expressions. You can model your way around it, but you are now denormalizing for queries you have not been asked yet, and the bill grows.

MongoDB is good at multi-attribute filters within a collection. Compound indexes work, the aggregation pipeline handles group-by, and you can run the filter without rebuilding the schema. In practice, an M30-class Atlas cluster has handled multi-attribute queries over multi-million-document collections in workloads I have shipped without sharding becoming the next problem; your mileage will vary with index design and document shape.

Postgres is good at all of this. It is the only one of the three where "join three tables and aggregate" is a normal sentence. If your read traffic includes ad-hoc queries (admin tools, analytics, internal dashboards), you are going to end up with Postgres in the stack one way or another, even if the operational store is something else.

The mix that points cleanly at DynamoDB: 90% single-key reads, the rest a small number of well-known secondary access patterns. The mix that points away: anything where a product manager might ask "can we filter by X" next quarter.

Question 4: do you operate the database yourself, or pay someone else?

The headcount question. The one nobody puts in the spreadsheet.

DynamoDB has no DBAs. You configure the table, set on-demand or provisioned, and AWS runs it. There is no failover to plan, no replica to size, no vacuum to schedule. The cost shows up as the AWS bill — and DynamoDB on-demand can be expensive at high steady throughput. A rough break-even against well-utilized provisioned capacity (derived from current on-demand vs provisioned rates, us-east-1 as of April 2026) lands somewhere around 15% of provisioned utilization; treat that as a back-of-the-envelope estimate, not a published number.

MongoDB Atlas is similar in spirit but more of an operational surface: you pick cluster tiers, you watch replica lag, you tune indexes, you pay attention to oplog size. An M30 dedicated cluster on AWS us-east-1 runs roughly $0.54/hour, or about $390/month (per mongodb.com/pricing, April 2026), before you add backups, multi-region, or analytics nodes. Manageable for a small team, but it is not zero work.

Postgres on RDS or Aurora is cheap for the throughput you get, but you make decisions other engines make for you — instance class, IOPS, vacuum, replication, failover testing. Self-hosted Postgres on EC2 is cheaper and even more work. The reward is that the database does more, so you can ship features without reaching for a second store.

If your team has zero database operations capacity and your workload fits the access pattern, DynamoDB. If you have one engineer who can own the database for an hour a week, Atlas or RDS works. If you have a real backend team, Postgres on RDS or Aurora is almost always the right call — it does the most work per store.

A decision script you can run

The four questions in code. Print the recommendation, print the reasons.

def ask(q: str) -> bool:
    return input(f"{q} ").strip().lower().startswith("y")


def recommend_db() -> None:
    print("Answer y/n.")
    known = ask("Known access patterns up front?")
    multi_row = ask("Need multi-row strong consistency?")
    multi_attr = ask("Multi-attribute / ad-hoc queries?")
    self_op = ask("Will you operate the DB yourself?")

    score = {"postgres": 0, "mongodb": 0, "dynamodb": 0}
    reasons = {"postgres": [], "mongodb": [], "dynamodb": []}

    if known:
        score["dynamodb"] += 2
        reasons["dynamodb"].append("stable access patterns")
    else:
        score["postgres"] += 2
        score["mongodb"] += 1
        reasons["postgres"].append("exploratory queries")

    if multi_row:
        score["postgres"] += 2
        reasons["postgres"].append("native multi-row ACID")
    else:
        score["dynamodb"] += 1
        score["mongodb"] += 1

    if multi_attr:
        score["postgres"] += 2
        score["mongodb"] += 1
        reasons["postgres"].append("ad-hoc filters")
        reasons["mongodb"].append("aggregation pipeline")
    else:
        score["dynamodb"] += 2
        reasons["dynamodb"].append("single-key dominant")

    if not self_op:
        score["dynamodb"] += 2
        reasons["dynamodb"].append("zero-ops managed")
    else:
        score["postgres"] += 1
        score["mongodb"] += 1

    # Print the full score so a tie is visible — max() resolves ties by
    # insertion order, which means Postgres wins silently otherwise.
    print(f"\nScores: {score}")
    top = max(score.values())
    tied = [db for db, s in score.items() if s == top]
    if len(tied) > 1:
        print(f"Tie between {tied}; defaulting to postgres as the safer pick.")
        pick = "postgres" if "postgres" in tied else tied[0]
    else:
        pick = tied[0]
    print(f"Recommendation: {pick}")
    for r in reasons[pick]:
        print(f"  - {r}")


if __name__ == "__main__":
    recommend_db()

Run this on a real workload and the answer is rarely surprising. What it does is force the four questions to be answered separately rather than mashed together in a vague gut call. Most arguments about database choice are arguments where one person is answering Q1 and the other is answering Q4, and neither has noticed.

Two anti-patterns

"MongoDB because schema-less is faster." The schema does not go away when you remove it from the database. It moves into your application code, where it is enforced inconsistently by whichever engineer is on the keyboard that week. Three months in, the users collection has documents with seven different shapes, and the bug report is "sometimes the email field is missing." The right reason to pick MongoDB is that your data is genuinely document-shaped (nested, variable, hierarchical), and you want the database to handle that natively. "Schema-less is faster" is a story you tell on day one and pay for on day ninety.

"DynamoDB because Amazon." Picking DynamoDB because the rest of the stack is on AWS, with no other reason, is how teams end up redesigning their data model six months in. DynamoDB is a great fit when Q1 (known patterns), Q2 (single-row writes), and Q3 (single-key reads) all point at it. If only one of them points at it, you are going to fight the database. If none of them do, you have invented a problem that did not exist on Postgres.

The variant of this is "DynamoDB for serverless." Serverless apps run fine on Postgres with a connection pooler. The "Lambda-cold-start-with-Postgres-connections" argument was a real problem in 2019 and has been solved twice over since. RDS Proxy, Aurora Serverless, Neon, and Supabase all handle the connection problem. Pick the database for the access pattern, not the runtime.

When you end up with two

The honest end state for many systems is two databases. Postgres for the operational core: orders, users, the relational ground truth. DynamoDB or Redis for a high-throughput hot path: sessions, feature flags, leaderboards, rate-limit counters. MongoDB for a document-shaped subdomain: content, configuration, audit blobs.

This is not a failure of architecture. It is what happens when the four questions have different answers for different parts of the system. The mistake is picking one database for all of it because the spreadsheet said so.

The other mistake is picking three databases on day one because someone read the same paragraph above. Start with one. Add the second when a real workload makes the first one sweat. The team most likely to ship is the team with the fewest stores in production.

If this was useful

The Database Playbook goes through these decisions store by store — Postgres, MongoDB, DynamoDB, plus Redis, Cassandra, ClickHouse, and the vector engines — with the access-pattern modeling worked through end to end. If you are picking a database for something real and the decision keeps slipping, it is the longer version of this post.