Why reads are more expensive as compared to writes in Cassandra database ?

#database #webdev #beginners #softwareengineering

In Cassandra, reads can be more expensive than writes due to the distributed nature of the database.

When data is written to Cassandra, it is stored in a partition.

Each partition is replicated across multiple nodes in the cluster to ensure fault tolerance and high availability.

When a write occurs, the data is written to the node responsible for that partition and then propagated to the replicas.

Reads, on the other hand, require coordination between multiple nodes in the cluster.

When a read request is made, Cassandra must first determine which nodes are responsible for the data being requested, and then retrieve the data from those nodes.

This coordination and data retrieval process can be slower and more complex than a write operation, especially if the data being requested is stored across many different nodes.

Additionally, Cassandra's data model is optimized for high write throughput.

The database is designed to handle large volumes of writes and can easily scale horizontally to accommodate additional write capacity.

However, as the number of nodes in a cluster grows, the coordination and data retrieval required for reads can become more complex and potentially slower.

Read Request Flow In Cassandra

When a read operation is performed, the database needs to coordinate with multiple nodes to retrieve the data.

To understand the coordination process, let's consider a scenario where a client sends a read request to a Cassandra node for a specific piece of data. Here are the steps that occur:

The node that receives the read request first needs to determine which nodes in the cluster are responsible for the data being requested. This is done using the partition key of the data, which determines the nodes that are responsible for storing that data.
Once the nodes that are responsible for the data have been identified, the coordinator node sends a read request to each of those nodes.
Each of the nodes that receive the read request will search for the requested data in their local data store. If the data is found, the node will return it to the coordinator node.
The coordinator node collects the responses from all of the nodes that were contacted and sends the results back to the client.

This coordination process can be more complex if the data being requested spans multiple partitions, or if some of the nodes responsible for the data are unavailable.

In these cases, Cassandra uses a quorum-based approach to ensure consistency and availability of the data.

The coordination process during reads can be more resource-intensive than writes, as it requires multiple nodes to be queried and their responses to be collected and merged.

This is why reads can be more expensive than writes in Cassandra.

Claps Please!

If you found this article helpful I would appreciate some claps 👏👏👏👏, it motivates me to write more such useful articles in the future.

Follow for regular awesome content and insights.

Subscribe to my Newsletter

If you like my content, then consider subscribing to my free newsletter, to get exclusive, educational, technical, interesting and career related content directly delivered to your inbox