DEV Community

Cover image for Apache Cassandra Database – The Complete Guide (Architecture, Internals, and Full CQL Syntax)
Farhad Rahimi Klie
Farhad Rahimi Klie

Posted on

Apache Cassandra Database – The Complete Guide (Architecture, Internals, and Full CQL Syntax)

Apache Cassandra is a distributed, highly available, horizontally scalable NoSQL database designed to handle massive amounts of data across many commodity servers, with no single point of failure.

It is widely used by organizations that require:

  • High write throughput
  • Always-on availability
  • Linear scalability
  • Multi-datacenter replication

1. What is Apache Cassandra?

Apache Cassandra is a wide-column store inspired by:

  • Amazon Dynamo (distributed system design)
  • Google Bigtable (data model)

Core Characteristics

  • Peer-to-peer architecture (no master)
  • Linear horizontal scalability
  • Tunable consistency
  • Fault tolerance
  • High write performance
  • Schema-based (unlike MongoDB)

2. Cassandra Architecture (High Level)

Cassandra uses a ring-based peer-to-peer architecture.

Key Components

  • Node – A single Cassandra instance
  • Cluster – A group of nodes
  • Datacenter – Logical grouping of nodes
  • Rack – Physical grouping for fault tolerance
Client
  |
Coordinator Node
  |
Replica Nodes (Ring)
Enter fullscreen mode Exit fullscreen mode

There is no master node. Every node is equal.


3. Gossip Protocol (Node Communication)

Cassandra nodes communicate using the Gossip Protocol.

What Gossip Does

  • Node discovery
  • Cluster membership
  • Failure detection
  • Metadata sharing

Each node periodically exchanges state information with others.


4. Partitioning & Token Ring

Token Ring

  • Data is distributed using consistent hashing
  • Each node owns a range of tokens
Hash(key) → Token → Node
Enter fullscreen mode Exit fullscreen mode

Partitioner Types

  • Murmur3Partitioner (default)
  • RandomPartitioner (deprecated)

5. Replication Strategy

Defines how data is replicated.

SimpleStrategy

Used for single datacenter setups.

CREATE KEYSPACE test
WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 3
};
Enter fullscreen mode Exit fullscreen mode

NetworkTopologyStrategy

Used for multi-datacenter setups.

CREATE KEYSPACE prod
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1': 3,
  'DC2': 2
};
Enter fullscreen mode Exit fullscreen mode

6. Cassandra Consistency Model

Cassandra offers tunable consistency.

Consistency Levels

  • ONE
  • TWO
  • THREE
  • QUORUM
  • ALL
  • LOCAL_QUORUM
  • EACH_QUORUM

Example:

CONSISTENCY QUORUM;
Enter fullscreen mode Exit fullscreen mode

Consistency = trade-off between availability and consistency.


7. Write Path (Internal Flow)

Writes are very fast in Cassandra.

Write Steps

  1. Write to Commit Log (durability)
  2. Write to MemTable (in-memory)
  3. MemTable flushes to SSTable

No random disk writes.


8. Read Path (Internal Flow)

Reads check multiple structures.

Read Steps

  1. Check MemTable
  2. Check Bloom Filter
  3. Read from SSTables
  4. Merge results
  5. Apply tombstones

9. Storage Engine Internals

Commit Log

  • Append-only log
  • Crash recovery

MemTable

  • In-memory sorted structure
  • Flushed to disk

SSTable

  • Immutable disk file
  • Sorted by partition key

Bloom Filter

  • Probabilistic structure
  • Avoids unnecessary disk reads

10. Compaction

Compaction merges SSTables.

Compaction Strategies

  • SizeTieredCompactionStrategy (default)
  • LeveledCompactionStrategy
  • TimeWindowCompactionStrategy

11. Cassandra Data Model

Core Concepts

  • Keyspace
  • Table
  • Partition Key
  • Clustering Columns
  • Columns
  • Rows

Example Table Structure

Partition Key → Node
Clustering Columns → Sort order
Enter fullscreen mode Exit fullscreen mode

12. Keyspaces (Full Syntax)

CREATE KEYSPACE app
WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 3
};

DESCRIBE KEYSPACES;

DROP KEYSPACE app;
Enter fullscreen mode Exit fullscreen mode

13. Tables (Full Syntax)

CREATE TABLE users (
  user_id UUID,
  email TEXT,
  name TEXT,
  created_at TIMESTAMP,
  PRIMARY KEY (user_id)
);
Enter fullscreen mode Exit fullscreen mode

Composite Primary Key

PRIMARY KEY ((user_id), created_at)
Enter fullscreen mode Exit fullscreen mode

14. Clustering Order

CREATE TABLE posts (
  user_id UUID,
  created_at TIMESTAMP,
  post TEXT,
  PRIMARY KEY ((user_id), created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);
Enter fullscreen mode Exit fullscreen mode

15. Data Types (All Core Types)

  • TEXT
  • INT
  • BIGINT
  • BOOLEAN
  • UUID
  • TIMEUUID
  • TIMESTAMP
  • FLOAT
  • DOUBLE
  • BLOB

Collection Types

  • LIST
  • SET
  • MAP
tags SET<TEXT>
Enter fullscreen mode Exit fullscreen mode

16. Insert Data

INSERT INTO users (user_id, email, name)
VALUES (uuid(), 'a@test.com', 'Alice');
Enter fullscreen mode Exit fullscreen mode

TTL (Time To Live)

INSERT INTO sessions (id, data)
VALUES (1, 'temp')
USING TTL 3600;
Enter fullscreen mode Exit fullscreen mode

17. Select Queries

SELECT * FROM users;

SELECT * FROM posts
WHERE user_id = ?;
Enter fullscreen mode Exit fullscreen mode

⚠️ Cassandra requires partition key in WHERE clause.


18. Filtering (Limited)

SELECT * FROM users
WHERE email = 'a@test.com'
ALLOW FILTERING;
Enter fullscreen mode Exit fullscreen mode

Not recommended for large datasets.


19. Update Data

UPDATE users
SET name = 'Bob'
WHERE user_id = ?;
Enter fullscreen mode Exit fullscreen mode

Updates are upserts.


20. Delete Data

DELETE FROM users
WHERE user_id = ?;
Enter fullscreen mode Exit fullscreen mode

Deletes create tombstones.


21. Indexes

Secondary Index

CREATE INDEX ON users (email);
Enter fullscreen mode Exit fullscreen mode

Limited scalability.


22. Materialized Views

CREATE MATERIALIZED VIEW users_by_email AS
SELECT * FROM users
WHERE email IS NOT NULL AND user_id IS NOT NULL
PRIMARY KEY (email, user_id);
Enter fullscreen mode Exit fullscreen mode

23. Batches

BEGIN BATCH
INSERT INTO users (...) VALUES (...);
INSERT INTO logs (...) VALUES (...);
APPLY BATCH;
Enter fullscreen mode Exit fullscreen mode

Not for bulk operations.


24. Counters

CREATE TABLE page_views (
  page TEXT PRIMARY KEY,
  views COUNTER
);

UPDATE page_views
SET views = views + 1
WHERE page = 'home';
Enter fullscreen mode Exit fullscreen mode

25. User Defined Types (UDT)

CREATE TYPE address (
  street TEXT,
  city TEXT,
  zip INT
);
Enter fullscreen mode Exit fullscreen mode

26. User Defined Functions (UDF)

CREATE FUNCTION add(a int, b int)
RETURNS NULL ON NULL INPUT
RETURNS int
LANGUAGE java
AS 'return a + b;';
Enter fullscreen mode Exit fullscreen mode

27. Security & Authentication

  • PasswordAuthenticator
  • Role-based access
  • TLS encryption
CREATE ROLE admin
WITH PASSWORD = 'secret'
AND LOGIN = true;
Enter fullscreen mode Exit fullscreen mode

28. Cassandra vs Traditional RDBMS

Feature Cassandra MySQL
Joins No Yes
Schema Flexible Rigid
Scalability Horizontal Vertical
Consistency Tunable Strong

29. Common Use Cases

  • Time-series data
  • IoT platforms
  • Messaging systems
  • Analytics ingestion
  • Logging systems

30. When NOT to Use Cassandra

  • Complex joins
  • Strong ACID transactions
  • Small datasets
  • Ad-hoc queries

31. Final Thoughts

Apache Cassandra is a write-optimized, distributed database built for scale and availability. Its design prioritizes fault tolerance and performance over relational flexibility.

If your system demands always-on availability and massive horizontal scale, Cassandra is one of the best choices available.

Top comments (0)