DEV Community

Dsysd Dev
Dsysd Dev

Posted on

Why apache cassandra is faang companies favorite database ?

Apache Cassandra is a highly scalable and distributed NoSQL database system known for its ability to handle massive amounts of data across multiple nodes.

Understanding the internals of Apache Cassandra involves grasping its underlying architecture, data model, and key components.


Here’s an overview of the internals of Apache Cassandra:

Distributed Architecture:

  1. Peer-to-Peer: Cassandra follows a peer-to-peer architecture, where all nodes in the cluster are equal and communicate with each other directly, without relying on a central coordinator node.

  2. Ring Topology: Cassandra employs a ring topology, where nodes are organized in a logical ring. Each node is responsible for a range of data, determined by a partitioner.

  3. Replication: Data is replicated across multiple nodes to ensure high availability and fault tolerance. Cassandra supports replication strategies that determine how data is distributed and replicated across the cluster.

Data Model:

  1. Columnar Structure: Cassandra utilizes a columnar data model, where data is organized into rows, columns, and column families (tables). This structure allows flexible schema design and efficient read and write operations.

  2. Wide Rows: Rows in Cassandra can contain a large number of columns, allowing for dynamic and sparse data storage. This design enables high write throughput and efficient data retrieval.

  3. NoSQL Model: Cassandra provides a schema-less approach, allowing for easy scalability and schema evolution. Each row can have a different set of columns, and data can be added or modified without altering the entire schema.

Consistency and Availability:

  1. Tunable Consistency: Cassandra provides tunable consistency, allowing users to balance consistency and availability based on their application requirements. Consistency levels can be defined per read and write operation.

  2. Eventual Consistency: Cassandra follows an eventually consistent model, where updates are propagated to replicas asynchronously. This approach ensures high availability and low latency, even in the presence of network partitions.

Replication and Data Distribution:

  1. Partitioner: Cassandra uses a partitioner to determine how data is distributed across the nodes. Common partitioners include RandomPartitioner, Murmur3Partitioner, and ByteOrderedPartitioner.

  2. Replication Factor: Cassandra allows specifying the number of replicas for each piece of data. The replication factor determines the number of copies stored across the cluster.

  3. Consistent Hashing: Cassandra employs consistent hashing to evenly distribute data across the cluster. This technique ensures a balanced load distribution and simplifies node addition or removal.

Read and Write Operations:

  1. Write Path: When data is written, it is first stored in memory (memtable) and then periodically flushed to disk as an SSTable (Sorted String Table). Writes are typically fast and sequential.

  2. Read Path: During a read operation, data is first searched in the in-memory data structure (memtable and Bloom filters). If not found, Cassandra looks in the SSTables. Multiple SSTables are merged during the compaction process to optimize read efficiency.

Gossip Protocol:

  1. Gossip: Gossip is a peer-to-peer communication protocol used by Cassandra nodes to share information about the cluster’s topology, node status, and schema changes. Gossip helps nodes discover and maintain an updated view of the cluster.

Compaction:

  1. Compaction Process: Compaction is the process of merging and compacting SSTables to optimize storage and ensure data consistency. It involves removing tombstones, resolving conflicts, and consolidating data into a smaller number of SSTables.

  2. Compaction Strategies: Cassandra provides various compaction strategies, including Leveled Compaction, SizeTieredCompaction, and TimeWindowCompaction. These strategies determine how SSTables are compacted based on specific algorithms and configuration settings.

Create Query Example

Let’s say you are given a task of creating a table in apache cassandra, you come up with the following query

CREATE TABLE table (
    name text,
    age int,
    PRIMARY KEY (name)
) WITH bloom_filter_fp_chance = 0.01
  AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
  AND comment = ''
  AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'tombstone_compaction_interval': '1800', 'tombstone_threshold': '0.01', 'unchecked_tombstone_compaction': 'true'}
  AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
  AND crc_check_chance = 1.0
  AND default_time_to_live = 0
  AND gc_grace_seconds = 3600
  AND max_index_interval = 2048
  AND memtable_flush_period_in_ms = 0
  AND min_index_interval = 128
  AND speculative_retry = '99PERCENTILE';
Enter fullscreen mode Exit fullscreen mode

bloom_filter_fp_chance: Sets the desired false positive chance for Bloom filters.

caching: Specifies the caching settings for the table. In this case, it caches all keys and doesn't cache entire rows per partition.

`comment: Allows adding an optional comment or description for the table.

compaction: Configures the compaction strategy and related parameters. Here, the Leveled Compaction Strategy is used with specific settings for tombstone compaction interval, tombstone threshold, and unchecked tombstone compaction.

compression: Specifies the compression settings for the table. LZ4 compression algorithm is used with a chunk length of 64KB.

crc_check_chance: Sets the probability of performing a CRC check for read operations.

default_time_to_live: Sets the default time-to-live for data in the table. A value of 0 means data doesn't expire

gc_grace_seconds: Defines the grace period in seconds for garbage collection (deletion) of tombstones.

max_index_interval: Sets the maximum interval between indexed

memtable_flush_period_in_ms: Specifies the frequency of flushing the in-memory memtable to disk.

min_index_interval: Sets the minimum interval between indexed entries.

speculative_retry: Configures the speculative retry behavior for the table. In this case, it uses a 99th percentile latency threshold for speculative retries.

Claps Please!

If you found this article helpful I would appreciate some claps 👏👏👏👏, it motivates me to write more such useful articles in the future.

Follow for regular awesome content and insights.

Subscribe to my Newsletter

If you like my content, then consider subscribing to my free newsletter, to get exclusive, educational, technical, interesting and career related content directly delivered to your inbox

https://dsysd.beehiiv.com/subscribe

Important Links

Thanks for reading the post, be sure to follow the links below for even more awesome content in the future.

Twitter: https://twitter.com/dsysd_dev
Youtube: https://www.youtube.com/@dsysd-dev
Github: https://github.com/dsysd-dev
Medium: https://medium.com/@dsysd-dev
Email: dsysd.mail@gmail.com
Linkedin: https://www.linkedin.com/in/dsysd-dev/
Newsletter: https://dsysd.beehiiv.com/subscribe
Gumroad: https://dsysd.gumroad.com/
Dev.to: https://dev.to/dsysd_dev/

Top comments (0)