How It Achieves Massive Write Scalability (and What You Trade Off)
Designing large-scale systems often comes down to one uncomfortable truth:
You cannot optimize reads, writes, consistency, and simplicity all at once.
When your system is write-heavy—logs, metrics, events, feeds, IoT data—traditional databases often become the bottleneck. This is where Apache Cassandra becomes a compelling choice.
This article explains:
- When Cassandra is the right database
- Why it outperforms traditional databases for writes
- How its internal storage model enables that performance
When Should You Choose Cassandra?
Cassandra is a strong choice when writes dominate your workload and availability matters more than strict consistency.
Choose Cassandra if:
- You have very high write throughput (10k–100k+ writes/sec)
- Data arrives continuously (events, logs, metrics, tracking)
- You need horizontal scaling by adding nodes
- Downtime is unacceptable (multi-node, multi-DC availability)
- Your queries are predictable and can be modeled without JOINs
- Eventual or tunable consistency is acceptable
Avoid Cassandra if:
- You need complex ad-hoc queries or JOINs
- You rely heavily on transactions across multiple rows/tables
- Reads must be extremely fast and strongly consistent
- Your dataset is small and doesn’t need horizontal scaling
Cassandra is not a “general-purpose” database—it’s a specialized write-scaling machine.
Why Traditional Databases Struggle with Writes
Most relational databases (PostgreSQL, MySQL) use B-tree–based storage engines.
How writes work in traditional databases:
- Data is updated in place
- Indexes must be updated immediately
- Writes trigger random disk I/O
- Locks, WAL flushes, and index maintenance add overhead
This works well for mixed workloads, but at scale:
- Random disk seeks become expensive
- Index-heavy schemas slow down writes
- Vertical scaling hits hardware limits
- Sharding adds operational complexity
In short: traditional databases optimize for reads first.
Why Cassandra Is Faster for Writes
Cassandra flips the design priorities.
Instead of updating data in place, Cassandra uses a Log-Structured Merge Tree (LSM Tree), which turns almost every write into a sequential append.
Key design choice:
Never modify data in place. Always append.
Sequential disk writes are orders of magnitude faster than random writes, which is why Cassandra can sustain massive write throughput on modest hardware.
Cassandra’s Internal Storage Model (LSM Tree)
Cassandra’s storage engine revolves around three core components.
1. Commit Log (Durability First)
- Every write is appended to the commit log
- Acts as a write-ahead log
- Ensures data isn’t lost if a node crashes
- Sequential disk I/O → very fast
This step guarantees durability without slowing down the write path.
2. Memtable (In-Memory Writes)
- Writes are stored in an in-memory, sorted structure
- Sorted by primary key
- Multiple updates to the same key are merged in memory
- No disk I/O for each write
This absorbs write bursts efficiently.
3. SSTable (Immutable Disk Storage)
- When the Memtable fills up, it is flushed to disk as an SSTable
- SSTables are:
- Immutable
- Sorted by primary key
- Written sequentially
Because SSTables are never updated, Cassandra avoids random disk writes entirely.
How Cassandra Handles Updates and Deletes
Cassandra treats every change as a new write:
- Updates → new version with a higher timestamp
- Deletes → written as tombstones (delete markers)
The “current state” of a row is determined by timestamps, not by overwriting data.
This design enables fast writes but shifts cleanup work to the background.
Reading Data: The Trade-Off
Reads are more complex than writes:
- Check the Memtable (latest data)
- Use Bloom filters to identify relevant SSTables
- Read SSTables from newest to oldest
- Merge results to find the latest version
Bloom filters help skip unnecessary disk reads, but reads are still slower than in well-indexed relational databases.
This is the intentional trade-off.
Compaction: Paying the Cost Later
To prevent unlimited growth of SSTables and tombstones, Cassandra runs compaction:
- Merges multiple SSTables into fewer ones
- Resolves multiple versions of rows
- Removes deleted and expired data
- Improves read performance over time
Cassandra makes writes cheap now, and pays the cost later via compaction.
Why This Design Scales So Well
Putting it all together:
| Traditional DB | Cassandra |
|---|---|
| In-place updates | Append-only writes |
| Random disk I/O | Sequential disk I/O |
| Write blocks reads | Writes isolated from reads |
| Hard to shard | Built-in partitioning |
| Vertical scaling | Horizontal scaling |
Cassandra scales because:
- Writes are fast and predictable
- Nodes are independent (shared-nothing)
- Adding nodes increases total throughput
- Failures don’t stop the system
Final Takeaway
Cassandra is not faster because it’s “better”—it’s faster because it chooses a different set of trade-offs.
If your system is write-heavy, highly available, and horizontally scalable, Cassandra’s LSM-based storage model can outperform traditional databases by an order of magnitude.
But if reads, transactions, or flexibility matter more—choose something else.
Good system design is not about the best database.
It’s about the right database for the workload.

Top comments (0)