How to Choose the Right Database for Your Use Case
"There is no best database — only the best database for your use case."
Choosing a database is one of the most critical architectural decisions you'll ever make. Pick the right one, and your system scales effortlessly. Pick the wrong one, and you're rewriting infrastructure at 3 AM while production burns.
This guide takes you from zero to expert — covering every major database type, when to use each, real-world examples, and a battle-tested decision framework.
Whether you're building a side project or designing systems at scale, this is your one-stop reference.
Table of Contents
- Why Database Choice Matters
- The Database Landscape — A Bird's Eye View
- Relational Databases (SQL)
- Document Databases
- Key-Value Stores
- Wide-Column Databases
- Graph Databases
- Time-Series Databases
- Search Engines
- Vector Databases
- NewSQL Databases
- Core Concepts You Must Know
- The Decision Framework
- Real-World Architecture Examples
- Common Mistakes to Avoid
- Polyglot Persistence — Using Multiple Databases
- Trends in 2025–2026
- Cheat Sheet
Why Database Choice Matters
Let's get real — your database will outlive your code. Frameworks change, languages evolve, but data and its storage persist.
Here's what happens when you choose wrong:
- Performance bottlenecks → You're sharding a relational DB that should've been a key-value store
- Data integrity issues → You picked a schemaless DB for financial transactions
- Scaling nightmares → Your DB can't handle horizontal scaling when traffic 10x's overnight
- Wasted engineering time → Your team fights the database instead of building features
The cost of migrating databases in production is enormous — often requiring months of dual-write architectures, data backfills, and prayer.
Choose wisely from the start.
The Database Landscape
Before diving deep, here's the big picture:
┌─────────────────────────────────────────────────────────┐
│ DATABASE TYPES │
├──────────────┬──────────────┬──────────────┬────────────┤
│ Relational │ Document │ Key-Value │Wide-Column │
│ PostgreSQL │ MongoDB │ Redis │ Cassandra │
│ MySQL │ CouchDB │ DynamoDB │ HBase │
│ SQL Server │ Firestore │ Memcached │ ScyllaDB │
├──────────────┼──────────────┼──────────────┼────────────┤
│ Graph │ Time-Series │ Search │ Vector │
│ Neo4j │ InfluxDB │Elasticsearch │ Pinecone │
│ Neptune │ TimescaleDB │ Meilisearch │ Weaviate │
│ ArangoDB │ QuestDB │ Typesense │ Milvus │
├──────────────┴──────────────┴──────────────┴────────────┤
│ NewSQL │
│ CockroachDB • TiDB • Google Spanner │
└─────────────────────────────────────────────────────────┘
Each type is optimized for a specific access pattern. Let's break them all down.
1. Relational Databases (SQL)
Examples: PostgreSQL, MySQL, MariaDB, SQL Server, Oracle, SQLite
How They Work
Relational databases store data in tables (rows and columns) with predefined schemas. Tables are linked through foreign keys, and you query them using SQL (Structured Query Language).
Under the hood, they use a B-tree or B+ tree index structure for fast lookups, and a write-ahead log (WAL) for crash recovery.
-- Example: E-commerce schema
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id),
total DECIMAL(10,2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW()
);
-- Get all orders for a user with a JOIN
SELECT u.name, o.total, o.status
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.email = 'john@example.com';
Strengths
- ACID transactions — Atomicity, Consistency, Isolation, Durability. Your data is always in a valid state
- Complex queries — JOINs, aggregations, subqueries, window functions
- Data integrity — Schemas enforce structure; foreign keys enforce relationships
- Mature ecosystem — Decades of tooling, ORMs, monitoring, and community knowledge
- Standardized — SQL is universal; skills transfer across databases
Weaknesses
- Vertical scaling bias — Traditionally scale up (bigger machine), not out (more machines)
- Schema rigidity — ALTER TABLE on a billion-row table is painful
- Not ideal for hierarchical/nested data — Normalizing deeply nested JSON into tables gets messy
- Write throughput ceiling — Single-leader replication creates write bottlenecks at extreme scale
When to Use
| Use Case | Why |
|---|---|
| E-commerce | Transactions ensure orders, payments, and inventory stay consistent |
| Banking/Fintech | ACID compliance is non-negotiable for money |
| SaaS applications | Structured data with complex relationships |
| Content Management | Well-defined schemas for articles, users, categories |
| Any CRUD application | The default choice when data is structured and relational |
PostgreSQL vs MySQL — Quick Comparison
| Feature | PostgreSQL | MySQL |
|---|---|---|
| JSON support | Excellent (jsonb) |
Good (JSON type) |
| Full-text search | Built-in | Built-in |
| Extensions | Rich (PostGIS, pgvector, etc.) | Limited |
| Replication | Logical + Physical | Binary log |
| ACID compliance | Full | Full (with InnoDB) |
| Performance | Better for complex queries | Better for simple reads |
| Best for | Complex apps, analytics, GIS | Web apps, read-heavy workloads |
Pro tip: When in doubt, start with PostgreSQL. It's the Swiss Army knife of databases — it handles relational data, JSON, full-text search, geospatial queries, and even vector embeddings (via pgvector).
2. Document Databases
Examples: MongoDB, CouchDB, Firestore, Amazon DocumentDB
How They Work
Document databases store data as JSON-like documents (BSON in MongoDB). Each document is self-contained — no rigid schema, no JOINs needed. Documents are grouped into collections.
// A single MongoDB document — notice nested data
{
"_id": "order_12345",
"customer": {
"name": "Jane Doe",
"email": "jane@example.com",
"address": {
"street": "123 Main St",
"city": "San Francisco",
"zip": "94105"
}
},
"items": [
{ "product": "Laptop", "price": 999.99, "qty": 1 },
{ "product": "Mouse", "price": 29.99, "qty": 2 }
],
"total": 1059.97,
"status": "shipped",
"created_at": "2025-12-01T10:30:00Z"
}
Strengths
- Flexible schema — Each document can have different fields. Perfect for evolving data models
- Nested/hierarchical data — Embed related data directly; no JOINs needed
- Horizontal scaling — Built-in sharding distributes data across nodes
- Developer-friendly — JSON maps directly to objects in most languages
- Fast reads — All related data in one document = single read
Weaknesses
- No JOINs (by design) — Denormalized data means updates can affect multiple documents
- Data duplication — Embedding data creates copies that can go stale
- Weaker consistency guarantees — Eventually consistent by default (tunable)
- Not ideal for highly relational data — If you need 6 JOINs, you want SQL
When to Use
| Use Case | Why |
|---|---|
| Content management systems | Articles have varying structures (video, text, gallery) |
| Product catalogs | Every product has different attributes |
| User profiles | Flexible fields (preferences, settings, activity) |
| Mobile/web apps | JSON-native; fast iteration on schemas |
| Real-time analytics | Aggregation pipeline handles complex analytics |
| IoT data collection | Variable sensor data schemas |
The MongoDB Controversy
MongoDB got a bad reputation early on because people used it for everything — including use cases where a relational DB was clearly better (like financial transactions). The lesson:
MongoDB is not a replacement for PostgreSQL. It's a different tool for different problems.
Use MongoDB when your data is naturally document-shaped and you need schema flexibility + horizontal scale. Don't use it just because "NoSQL is cool."
3. Key-Value Stores
Examples: Redis, Amazon DynamoDB, Memcached, etcd, Riak
How They Work
The simplest database model: every piece of data is stored as a key-value pair. Think of it as a giant, distributed hash map.
Key → Value
──────────────────────────────────────
"user:1001:session" → "eyJhbGciOiJIUz..."
"product:sku:A123" → {"name": "Widget", "price": 9.99}
"rate_limit:ip:1.2.3" → 47
"cache:homepage:v3" → "<html>...</html>"
Redis — The King of Key-Value
Redis deserves special mention. It's an in-memory data store that supports rich data structures:
# Strings
SET user:session:abc123 "token_data" EX 3600 # Expires in 1 hour
# Hash (like a mini-document)
HSET user:1001 name "Alice" email "alice@example.com" plan "pro"
HGET user:1001 name # → "Alice"
# Sorted Set (leaderboards!)
ZADD leaderboard 9500 "player:alice"
ZADD leaderboard 8700 "player:bob"
ZREVRANGE leaderboard 0 9 WITHSCORES # Top 10 players
# Lists (queues)
LPUSH notifications:user:1001 "New message from Bob"
RPOP notifications:user:1001
# Pub/Sub (real-time messaging)
PUBLISH chat:room:42 "Hello everyone!"
Strengths
- Blazing fast — O(1) lookups. Redis: sub-millisecond latency
- Simple API — GET, SET, DELETE. Hard to misuse
- Perfect for caching — Reduce load on your primary database
- Horizontal scaling — DynamoDB scales to virtually unlimited throughput
- TTL support — Auto-expire data (sessions, OTPs, rate limits)
Weaknesses
- No complex queries — Can't filter, sort, or JOIN
- Limited data modeling — Everything is accessed by key; no relationships
- Memory cost — In-memory stores (Redis) are expensive at scale
- Data loss risk — In-memory stores can lose data on crash (mitigated by persistence options)
Redis vs DynamoDB
| Feature | Redis | DynamoDB |
|---|---|---|
| Storage | In-memory | Disk (SSD) |
| Latency | Sub-millisecond | Single-digit millisecond |
| Scale | Manual sharding | Auto-scaling |
| Cost | Memory-based ($$$) | Pay-per-request |
| Persistence | Optional (RDB/AOF) | Built-in |
| Best for | Caching, real-time | Serverless apps, any scale |
When to Use
| Use Case | Why |
|---|---|
| Caching | Cache DB queries, API responses, HTML fragments |
| Session storage | Fast read/write with TTL for expiry |
| Rate limiting | Atomic counters with expiration |
| Leaderboards | Redis sorted sets are purpose-built for this |
| Feature flags | Quick lookups for on/off switches |
| Shopping carts | Temporary, user-specific data with expiry |
| Message queues | Redis lists/streams for simple queuing |
4. Wide-Column Databases
Examples: Apache Cassandra, HBase, ScyllaDB, Google Bigtable
How They Work
Wide-column stores organize data into rows and column families. Unlike relational tables, each row can have different columns, and columns are grouped by access patterns.
Think of it as a two-dimensional key-value store — you look up data by (row_key, column_family:column).
Row Key | Column Family: user_info | Column Family: activity
─────────────────┼───────────────────────────────┼──────────────────────
user:1001 | name="Alice", email="a@b.com" | last_login="2025-12-01"
user:1002 | name="Bob", phone="555-1234" | last_login="2025-11-28"
| | last_purchase="2025-12-05"
Cassandra — The Write Champion
Cassandra uses a masterless ring architecture — every node is equal. Writes go to any node and are replicated. This means:
- No single point of failure
- Linear horizontal scaling — add nodes, get proportional throughput
- Tunable consistency — from eventual to strong, per query
-- Cassandra Query Language (CQL)
CREATE TABLE sensor_readings (
sensor_id UUID,
timestamp TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY (sensor_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
-- Partition key: sensor_id (distributes data)
-- Clustering key: timestamp (sorts within partition)
INSERT INTO sensor_readings (sensor_id, timestamp, temperature, humidity)
VALUES (uuid(), '2025-12-01T10:00:00Z', 23.5, 65.2);
Strengths
- Massive write throughput — Handles millions of writes per second
- Linear scalability — Performance grows linearly with nodes
- High availability — No master node; no single point of failure
- Geo-replication — Built-in multi-datacenter support
- Time-series friendly — Clustering keys naturally order time-series data
Weaknesses
- No JOINs or complex queries — You must model data around your queries
- Query-first design — You design tables for specific queries, not normalized models
- Read performance — Reads are slower than writes (must check multiple nodes)
- Operational complexity — Running Cassandra well requires expertise
- Eventual consistency — Default behavior; strong consistency is expensive
When to Use
| Use Case | Why |
|---|---|
| IoT sensor data | Millions of devices writing constantly |
| Messaging systems | Discord uses Cassandra for billions of messages |
| Time-series at scale | Event logging, metrics, activity feeds |
| Recommendation engines | Netflix uses Cassandra for user activity |
| Geo-distributed apps | Multi-region with low-latency writes |
5. Graph Databases
Examples: Neo4j, Amazon Neptune, ArangoDB, JanusGraph, TigerGraph
How They Work
Graph databases store data as nodes (entities) and edges (relationships). Each node and edge can have properties. They're designed for traversing relationships efficiently.
┌─────────┐ FOLLOWS ┌─────────┐
│ Alice │───────────────▶│ Bob │
│ (User) │ │ (User) │
└────┬────┘ └────┬────┘
│ LIKES │ POSTED
▼ ▼
┌─────────┐ ┌──────────┐
│ Post1 │ │ Post2 │
│ (Post) │◀──COMMENTED───│ (Post) │
└─────────┘ └──────────┘
// Cypher query language (Neo4j)
// Create nodes and relationships
CREATE (alice:User {name: "Alice", age: 28})
CREATE (bob:User {name: "Bob", age: 32})
CREATE (alice)-[:FOLLOWS]->(bob)
CREATE (alice)-[:LIKES]->(post1:Post {title: "Graph DBs Rock"})
// Find friends of friends (2nd degree connections)
MATCH (me:User {name: "Alice"})-[:FOLLOWS*2]->(fof:User)
WHERE fof <> me
RETURN fof.name
// Find shortest path between two users
MATCH path = shortestPath(
(a:User {name: "Alice"})-[:FOLLOWS*]-(b:User {name: "Charlie"})
)
RETURN path
// Recommendation: "Users who like what you like also like..."
MATCH (me:User {name: "Alice"})-[:LIKES]->(post)<-[:LIKES]-(other)-[:LIKES]->(rec)
WHERE NOT (me)-[:LIKES]->(rec)
RETURN rec.title, COUNT(other) AS score
ORDER BY score DESC LIMIT 5
Strengths
- Relationship queries are O(1) — Traversing edges is constant time per hop, regardless of total data size
- Intuitive modeling — Whiteboard diagrams map directly to the data model
- Pattern matching — Find complex patterns across deeply connected data
- No JOIN performance cliff — Relational JOINs slow down with depth; graph traversals don't
Weaknesses
- Not for simple CRUD — Overkill if your data isn't relationship-heavy
- Aggregation limitations — Not ideal for "sum all revenue" type queries
- Scaling challenges — Graph partitioning is fundamentally hard (NP-hard problem)
- Smaller ecosystem — Fewer ORMs, tools, and tutorials than SQL/MongoDB
When to Use
| Use Case | Why |
|---|---|
| Social networks | Friends, followers, mutual connections |
| Recommendation engines | "People who bought X also bought Y" |
| Fraud detection | Trace transaction chains and suspicious patterns |
| Knowledge graphs | Wikipedia, Google Knowledge Panel |
| Network/IT infrastructure | Map dependencies between services |
| Access control | Model complex permission hierarchies |
Key insight: If your most important queries involve relationships between entities (not just the entities themselves), you need a graph database.
6. Time-Series Databases
Examples: InfluxDB, TimescaleDB, QuestDB, Prometheus, Amazon Timestream
How They Work
Time-series databases are optimized for timestamped data — metrics, events, sensor readings, stock prices. They handle:
- High ingestion rates (millions of data points per second)
- Time-range queries (last 24 hours, last 7 days)
- Automatic downsampling (aggregate old data to save space)
- Built-in time functions (moving averages, rate of change)
-- TimescaleDB (PostgreSQL extension) example
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
sensor_id INT,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
-- Convert to a hypertable (enables time-series optimizations)
SELECT create_hypertable('metrics', 'time');
-- Insert data
INSERT INTO metrics VALUES
(NOW(), 1, 23.5, 65.0),
(NOW(), 2, 21.3, 70.2);
-- Query: Average temperature per hour for the last 24 hours
SELECT time_bucket('1 hour', time) AS hour,
AVG(temperature) AS avg_temp
FROM metrics
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
-- Continuous aggregate (auto-maintained materialized view)
CREATE MATERIALIZED VIEW hourly_temps
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS hour,
sensor_id,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp
FROM metrics
GROUP BY hour, sensor_id;
Strengths
- Extreme write throughput — Append-only writes are very fast
- Compression — 10-20x compression ratios on time-series data
- Built-in time functions — Bucketing, downsampling, gap-filling
- Retention policies — Auto-delete data older than X days
- Optimized storage — Columnar storage for time-series patterns
Weaknesses
- Not for general-purpose queries — Poor at random lookups or JOINs
- Append-heavy — Updates and deletes are expensive
- Limited data modeling — Everything is centered around timestamps
- Niche use case — If your data isn't inherently temporal, it's the wrong tool
When to Use
| Use Case | Why |
|---|---|
| Application monitoring | Metrics, logs, traces (Grafana + InfluxDB) |
| IoT sensor data | Temperature, pressure, GPS over time |
| Financial market data | Stock prices, trading volumes, candlestick data |
| DevOps metrics | CPU, memory, request latency, error rates |
| Energy/utilities | Smart meter readings, grid monitoring |
| Fitness/health | Heart rate, step counts, sleep data |
Pro tip: TimescaleDB gives you the best of both worlds — it's a PostgreSQL extension, so you get full SQL + time-series optimizations. Great if you don't want to run a separate database.
7. Search Engines
Examples: Elasticsearch, OpenSearch, Meilisearch, Typesense, Apache Solr
How They Work
Search engines use inverted indexes — instead of mapping documents to words, they map words to documents. This makes full-text search lightning-fast.
Standard index: Document → Words
Inverted index: Word → Documents
"database" → [doc1, doc3, doc7, doc15]
"postgres" → [doc1, doc5, doc12]
"scaling" → [doc3, doc7, doc9]
// Elasticsearch: Index a document
PUT /products/_doc/1
{
"name": "Wireless Bluetooth Headphones",
"description": "Premium noise-cancelling headphones with 30-hour battery",
"category": "Electronics",
"price": 149.99,
"tags": ["wireless", "bluetooth", "noise-cancelling"]
}
// Search with relevance scoring
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "match": { "description": "noise cancelling wireless" } }
],
"filter": [
{ "range": { "price": { "lte": 200 } } },
{ "term": { "category": "Electronics" } }
]
}
},
"highlight": {
"fields": { "description": {} }
}
}
Elasticsearch vs Meilisearch vs Typesense
| Feature | Elasticsearch | Meilisearch | Typesense |
|---|---|---|---|
| Speed | Fast | Very fast | Very fast |
| Setup | Complex (JVM) | Simple (single binary) | Simple |
| Typo tolerance | Plugin-based | Built-in | Built-in |
| Scale | Massive (PB+) | Small-medium | Small-medium |
| Resource usage | Heavy (RAM hungry) | Light | Light |
| Best for | Large-scale search & analytics | Product search, SaaS apps | Fast typo-tolerant search |
When to Use
| Use Case | Why |
|---|---|
| Product search | "Show me red shoes under $50" with typo tolerance |
| Log analytics | ELK stack (Elasticsearch + Logstash + Kibana) |
| Autocomplete | Real-time suggestions as user types |
| Document search | Search across millions of PDFs, articles, emails |
| Geospatial search | "Restaurants within 5 miles" |
Important: Search engines are almost never your primary database. They're used alongside your main DB for search-specific queries.
8. Vector Databases
Examples: Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector (PostgreSQL)
How They Work
Vector databases store and search high-dimensional vectors (embeddings). These embeddings are numerical representations of data (text, images, audio) generated by AI/ML models.
Instead of exact matching, they find semantically similar items using distance metrics (cosine similarity, Euclidean distance).
# How vector search works conceptually
# 1. Convert text to embedding (using an AI model)
query = "comfortable running shoes"
query_vector = embedding_model.encode(query)
# → [0.23, -0.45, 0.87, 0.12, ...] (768+ dimensions)
# 2. Search for nearest vectors in the database
results = vector_db.search(
collection="products",
vector=query_vector,
top_k=5,
filter={"category": "footwear"}
)
# 3. Results ranked by semantic similarity — not keyword matching
# "lightweight marathon trainers" would match
# "jogging sneakers with cushion" would match
# Even though none contain "comfortable" or "running shoes"
-- pgvector (PostgreSQL extension) — if you want vector search without a new DB
CREATE EXTENSION vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536) -- OpenAI embedding dimension
);
-- Create an index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Find 5 most similar documents
SELECT content, 1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> '[0.23, -0.45, ...]'::vector
LIMIT 5;
Strengths
- Semantic search — Find similar meaning, not just matching keywords
- AI/ML native — Built for the embedding-based AI workflow
- Multi-modal — Search across text, images, audio using unified embeddings
- Scalable — Handle billions of vectors with approximate nearest neighbor (ANN) algorithms
Weaknesses
- Requires embeddings — You need an ML model to generate vectors
- Approximate results — ANN trades accuracy for speed
- New ecosystem — Less mature tooling than traditional databases
- Cost — Storing high-dimensional vectors at scale is expensive
When to Use
| Use Case | Why |
|---|---|
| RAG (Retrieval Augmented Generation) | Feed relevant context to LLMs |
| Semantic search | "Find similar products" beyond keyword matching |
| Image/video search | Reverse image search, content moderation |
| Recommendation systems | Similar items based on embedding proximity |
| Anomaly detection | Find data points far from normal clusters |
| Chatbot memory | Store and retrieve conversation history semantically |
Pro tip: If you already use PostgreSQL, try pgvector before spinning up a dedicated vector DB. It handles millions of vectors well and keeps your stack simple.
9. NewSQL Databases
Examples: CockroachDB, TiDB, Google Spanner, YugabyteDB, PlanetScale
How They Work
NewSQL databases combine the best of both worlds:
- SQL interface + ACID transactions (like traditional RDBMS)
- Horizontal scalability (like NoSQL)
They achieve this through distributed consensus algorithms (like Raft or Paxos) and innovative architectures.
-- CockroachDB: Standard SQL that scales horizontally
CREATE TABLE accounts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
owner STRING NOT NULL,
balance DECIMAL NOT NULL,
region STRING NOT NULL
);
-- Geo-partition data by region
ALTER TABLE accounts PARTITION BY LIST (region) (
PARTITION us_east VALUES IN ('us-east'),
PARTITION eu_west VALUES IN ('eu-west'),
PARTITION ap_south VALUES IN ('ap-south')
);
-- Regular SQL transactions — but they work across distributed nodes
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 'sender-id';
UPDATE accounts SET balance = balance + 100 WHERE id = 'receiver-id';
COMMIT;
Strengths
- SQL compatibility — Use existing SQL skills and tools
- Horizontal scaling — Add nodes to scale reads AND writes
- Strong consistency — Distributed ACID transactions
- High availability — Survives node/datacenter failures
- Geo-distribution — Pin data to specific regions for compliance
Weaknesses
- Latency overhead — Distributed consensus adds latency (2-10ms vs <1ms for local PostgreSQL)
- Complexity — More operationally complex than a single PostgreSQL instance
- Cost — Running distributed databases is expensive
- Overkill for small scale — If your data fits on one machine, use PostgreSQL
When to Use
| Use Case | Why |
|---|---|
| Global apps needing SQL | Multi-region with strong consistency |
| Fintech at scale | ACID transactions + horizontal scaling |
| SaaS platforms | When single-node PostgreSQL hits its ceiling |
| E-commerce platforms | High traffic with transactional guarantees |
Rule of thumb: Start with PostgreSQL. Move to NewSQL only when you've genuinely outgrown single-node performance.
Core Concepts You Must Know
Before choosing a database, you need to understand these foundational concepts.
The CAP Theorem
You can only guarantee two out of three:
Consistency
╱╲
╱ ╲
╱ ╲
╱ CA ╲ CA: PostgreSQL, MySQL (single-node)
╱________╲ CP: MongoDB, HBase, CockroachDB
╱╲ ╱╲ AP: Cassandra, DynamoDB, CouchDB
╱ ╲ CP ╱ ╲
╱ ╲ ╱ AP ╲
╱______╲ ╱______╲
Partition Availability
Tolerance
- Consistency (C): Every read gets the most recent write
- Availability (A): Every request gets a response (even if not the latest data)
- Partition Tolerance (P): System works even if network splits nodes apart
In distributed systems, partition tolerance is mandatory (networks fail). So the real choice is:
- CP — Consistent but may reject requests during partitions (banking, inventory)
- AP — Always available but may return stale data (social feeds, analytics)
ACID vs BASE
| Property | ACID (SQL) | BASE (NoSQL) |
|---|---|---|
| A | Atomicity — all or nothing | Basically Available |
| C | Consistency — valid state | Soft state — may be inconsistent |
| I | Isolation — concurrent safety | Eventually consistent |
| D | Durability — survives crashes | |
| Best for | Financial, transactional | High-scale, distributed |
| Trade-off | Lower throughput | Higher throughput, weaker guarantees |
Scaling: Vertical vs Horizontal
Vertical Scaling (Scale Up) Horizontal Scaling (Scale Out)
┌──────────────────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ │ │Node 1│ │Node 2│ │Node 3│
│ BIGGER SERVER │ └──────┘ └──────┘ └──────┘
│ More CPU/RAM │ ┌──────┐ ┌──────┐ ┌──────┐
│ More Storage │ │Node 4│ │Node 5│ │Node 6│
│ │ └──────┘ └──────┘ └──────┘
└──────────────────┘
✓ Simple ✓ Virtually unlimited scale
✗ Has a ceiling ✗ Complex (sharding, consensus)
✗ Single point of failure ✓ Fault tolerant
| Database Type | Typical Scaling |
|---|---|
| PostgreSQL, MySQL | Vertical (read replicas for reads) |
| MongoDB, Cassandra | Horizontal (built-in sharding) |
| Redis | Both (Cluster mode for horizontal) |
| CockroachDB, Spanner | Horizontal (distributed SQL) |
| DynamoDB | Horizontal (auto-scaling) |
Consistency Models
From strongest to weakest:
- Strong consistency — Reads always return the latest write. (PostgreSQL, Spanner)
- Bounded staleness — Reads may be stale, but within a time window. (Cosmos DB)
- Session consistency — A client always sees its own writes. (MongoDB default)
- Eventual consistency — All nodes converge eventually. (Cassandra, DynamoDB)
The Decision Framework
Here's a practical, step-by-step framework for choosing your database:
Step 1: What Does Your Data Look Like?
Is your data structured with clear relationships?
├── YES → Relational (PostgreSQL, MySQL)
│ └── Need global scale? → NewSQL (CockroachDB, Spanner)
│
├── SEMI-STRUCTURED (JSON, varying fields)?
│ └── Document DB (MongoDB, Firestore)
│
├── SIMPLE key → value lookups?
│ └── Key-Value (Redis, DynamoDB)
│
├── HEAVILY CONNECTED (relationships are the query)?
│ └── Graph DB (Neo4j, Neptune)
│
├── TIMESTAMPED (metrics, events, logs)?
│ └── Time-Series (TimescaleDB, InfluxDB)
│
├── FULL-TEXT SEARCH needed?
│ └── Search Engine (Elasticsearch, Meilisearch)
│
└── AI EMBEDDINGS / SIMILARITY SEARCH?
└── Vector DB (Pinecone, pgvector)
Step 2: What Are Your Access Patterns?
| Pattern | Best Choice |
|---|---|
| Complex queries with JOINs | Relational (PostgreSQL) |
| Simple lookups by key | Key-Value (Redis, DynamoDB) |
| Full-text search with ranking | Search Engine (Elasticsearch) |
| Time-range aggregations | Time-Series (TimescaleDB) |
| Graph traversals (friends of friends) | Graph (Neo4j) |
| High-volume writes, simple reads | Wide-Column (Cassandra) |
| Similarity/semantic search | Vector DB (Pinecone) |
Step 3: What Are Your Non-Functional Requirements?
Ask yourself:
| Question | If YES → Consider |
|---|---|
| Do I need ACID transactions? | PostgreSQL, CockroachDB |
| Will I have 10K+ writes/second? | Cassandra, DynamoDB, ScyllaDB |
| Do I need sub-millisecond latency? | Redis, Memcached |
| Is my data globally distributed? | CockroachDB, Spanner, DynamoDB Global Tables |
| Do I need to search inside documents? | Elasticsearch, Meilisearch |
| Am I building AI/ML features? | Pinecone, Weaviate, pgvector |
| Do I need high availability (99.99%+)? | Cassandra, DynamoDB, CockroachDB |
| Is my team small / limited budget? | PostgreSQL, SQLite, Firestore |
Step 4: Consider Your Team and Budget
This is often more important than technical factors:
- Team expertise — A team of PostgreSQL experts will build faster with PostgreSQL than MongoDB, even if MongoDB is theoretically "better" for the use case
- Operational cost — Managed services (RDS, Atlas, DynamoDB) cost money but save engineering time
- Ecosystem — Consider ORMs, migration tools, monitoring, and community support
- Hiring — PostgreSQL/MySQL developers are easier to find than Cassandra experts
Real-World Architecture Examples
E-Commerce Platform (like Shopify)
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ PostgreSQL │ │ Elasticsearch │ │ Redis │
│ │ │ │ │ │
│ • Orders │ │ • Product search │ │ • Sessions │
│ • Payments │ │ • Faceted filter │ │ • Cart data │
│ • Users │ │ • Autocomplete │ │ • Rate limits│
│ • Inventory │ │ │ │ • Cache │
└──────────────┘ └─────────────────┘ └──────────────┘
Why this combination?
- PostgreSQL for transactional integrity (money and inventory can't be "eventually consistent")
- Elasticsearch for fast product search with filters, facets, and typo tolerance
- Redis for caching and sessions to keep the UX snappy
Social Media Platform (like Twitter/X)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ PostgreSQL │ │ Cassandra │ │ Redis │
│ │ │ │ │ │
│ • Users │ │ • Timelines │ │ • Feed cache │
│ • Auth │ │ • Messages │ │ • Sessions │
│ • Settings │ │ • Activity │ │ • Trending │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
│ ┌──────────────┐ │
│ │ Neo4j │ │
└───────────▶│ │◀───────────┘
│ • Social graph│
│ • Followers │
│ • Suggestions │
└──────────────┘
AI-Powered SaaS Application
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ PostgreSQL │ │ Pinecone │ │ Redis │
│ + pgvector │ │ (or Weaviate)│ │ │
│ │ │ │ │ • LLM cache │
│ • Users │ │ • Document │ │ • Rate limits│
│ • Billing │ │ embeddings │ │ • Sessions │
│ • App data │ │ • Semantic │ │ │
│ │ │ search │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
IoT / Industrial Monitoring
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ TimescaleDB │ │ PostgreSQL │ │ Redis │
│ │ │ │ │ │
│ • Sensor data│ │ • Devices │ │ • Alerts │
│ • Metrics │ │ • Users │ │ • Real-time │
│ • Aggregates │ │ • Config │ │ thresholds │
└──────────────┘ └──────────────┘ └──────────────┘
Common Mistakes to Avoid
1. Using MongoDB for Everything
"We chose MongoDB because it's web-scale!" — said every startup before rewriting their billing system in PostgreSQL.
The fix: Use MongoDB for genuinely document-shaped, schema-flexible data. Use PostgreSQL for transactions and relationships.
2. Premature Optimization with NoSQL
Starting with Cassandra or DynamoDB "for scale" when your app has 100 users is like renting a stadium for a dinner party.
The fix: Start with PostgreSQL. It handles millions of rows efficiently. Migrate when you actually hit scaling limits.
3. Ignoring the Access Pattern
Designing your schema before knowing your queries leads to performance nightmares. Especially in NoSQL:
❌ "Let's store everything in one big collection"
✅ "Our main query is 'get recent orders for user X' — let's model around that"
4. Not Using Caching
Hitting your database for every request is the #1 cause of slow apps. A simple Redis cache can reduce DB load by 80%+.
Request → Check Redis Cache → Cache Hit? → Return cached data
→ Cache Miss? → Query DB → Store in Redis → Return
5. Picking Based on Hype
Every year there's a new "hot" database. Don't pick a database because:
- A FAANG company uses it (they have different problems than you)
- It's trending on Hacker News
- The benchmark looks amazing (benchmarks are often misleading)
The fix: Pick based on your use case, your team's expertise, and your scale requirements.
6. Not Planning for Growth
Your MVP uses SQLite. Great. But what happens when you get 10,000 concurrent users?
The fix: Choose a database that can grow with you. PostgreSQL is almost always a safe starting point — it can scale vertically for a very long time before you need to distribute.
7. Forgetting About Backups and Recovery
The best database in the world is useless if you lose all your data.
The fix: Before launching, ensure:
- Automated backups are running
- You've tested restoring from a backup
- You have point-in-time recovery enabled
- Your backup is in a different region than your primary
Polyglot Persistence
Modern applications often use multiple databases, each handling what it does best. This is called polyglot persistence.
When to Use Multiple Databases
- Different access patterns — Transactions in PostgreSQL, caching in Redis, search in Elasticsearch
- Performance optimization — Offload read-heavy analytics to a columnar store
- AI features — Add a vector DB alongside your primary DB
- Event streaming — Use Kafka + a time-series DB for event processing
How to Keep Them in Sync
┌──────────┐ Write ┌──────────────┐
│ App │────────────▶│ PostgreSQL │ (Source of truth)
└──────────┘ └──────┬───────┘
│
Change Data Capture (CDC)
(e.g., Debezium, WAL)
│
┌───────────┼───────────┐
▼ ▼ ▼
┌──────────┐ ┌────────┐ ┌──────────┐
│Elastic │ │ Redis │ │ Pinecone │
│search │ │ Cache │ │ Vectors │
└──────────┘ └────────┘ └──────────┘
Key principle: Have ONE source of truth (usually your relational DB). Everything else is derived and can be rebuilt.
Database Trends 2025–2026
1. AI-Native Databases
Vector search is becoming a standard feature, not a separate database. PostgreSQL (pgvector), MongoDB (Atlas Vector Search), and even Redis now support vector operations. The standalone vector DB market is consolidating.
2. Serverless Databases
Pay-per-query pricing is becoming mainstream. Neon (serverless PostgreSQL), PlanetScale (serverless MySQL), DynamoDB, and Firestore all scale to zero when idle — perfect for startups watching costs.
3. Edge Databases
Databases running at the edge (close to users). Turso (SQLite at the edge), Cloudflare D1, and Durable Objects enable ultra-low-latency reads globally without managing infrastructure.
4. Postgres Is Eating the World
PostgreSQL keeps absorbing features that previously required separate databases:
- pgvector → Vector/AI search
- TimescaleDB → Time-series
- PostGIS → Geospatial
- pg_cron → Job scheduling
- Full-text search → Built-in
This trend towards Postgres as a universal database is one of the most significant shifts in the data landscape.
5. Unified Query Layers
Tools like Hasura, Prisma, and GraphQL are abstracting away database differences, letting apps query multiple databases through a single API.
Cheat Sheet
Here's your quick-reference guide:
| I Need... | Use This | Top Pick |
|---|---|---|
| Structured data + transactions | Relational | PostgreSQL |
| Flexible JSON documents | Document DB | MongoDB |
| Ultra-fast cache/sessions | Key-Value | Redis |
| Massive write throughput | Wide-Column | Cassandra |
| Relationship traversal | Graph DB | Neo4j |
| Metrics and time-based data | Time-Series | TimescaleDB |
| Full-text search | Search Engine | Elasticsearch |
| AI embeddings / similarity | Vector DB | pgvector / Pinecone |
| SQL at global scale | NewSQL | CockroachDB |
| Start a new project | Default | PostgreSQL |
The Golden Rules
- Start with PostgreSQL unless you have a specific reason not to
- Add Redis for caching — almost every production app needs it
- Add a search engine when full-text search becomes a feature
- Add a specialized DB only when PostgreSQL can't handle the workload
- Don't optimize for scale you don't have — premature optimization is the root of all evil
Wrapping Up
Choosing the right database is not about finding the "best" database — it's about finding the best fit for your specific use case. Here's the mental model:
- Understand your data — Is it structured, semi-structured, graph-shaped, temporal?
- Understand your queries — Are they transactional, analytical, search-based, relational?
- Understand your scale — Hundreds of users or hundreds of millions?
- Start simple — PostgreSQL + Redis covers 90% of applications
- Evolve when needed — Add specialized databases as requirements become clear
The best database is the one your team can operate, debug, and scale confidently. Don't let hype drive your architecture — let your use case be the guide.
Let's Connect!
If you found this guide helpful, I'd love to connect with you! I regularly share deep dives on system design, backend engineering, and software architecture.
Connect with me on LinkedIn — let's grow together.
Drop a comment, share this with someone who's picking a database right now, and follow along for more guides like this!
Top comments (0)