DEV Community

Cover image for Connecting Cassandra with Django: The Complete Guide (With & Without Docker)
sizan mahmud0
sizan mahmud0

Posted on

Connecting Cassandra with Django: The Complete Guide (With & Without Docker)

Part 1: Setup Without Docker (Traditional Method)

Why Cassandra with Django?

Before we dive in, let's understand why you'd want to use Cassandra with Django:

Cassandra strengths:

  • Handles massive write-heavy workloads
  • Linear scalability (add nodes, get more capacity)
  • No single point of failure
  • Perfect for time-series data, logs, IoT data

Django + Cassandra use cases:

  • Analytics platforms
  • Real-time messaging systems
  • IoT data collection
  • Event logging systems
  • Social media feeds

Important note: Cassandra is NOT a replacement for PostgreSQL/MySQL for typical Django apps. Use it for specific high-scale scenarios.


Installing Cassandra Locally

On Ubuntu/Debian

# Add Cassandra repository
echo "deb https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

# Add repository keys
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -

# Update and install
sudo apt-get update
sudo apt-get install cassandra

# Start Cassandra
sudo systemctl start cassandra
sudo systemctl enable cassandra

# Verify installation
nodetool status
Enter fullscreen mode Exit fullscreen mode

On macOS

# Install via Homebrew
brew install cassandra

# Start Cassandra
brew services start cassandra

# Verify
cqlsh
Enter fullscreen mode Exit fullscreen mode

On Windows

  1. Download Apache Cassandra from official website
  2. Extract to C:\cassandra
  3. Set JAVA_HOME environment variable
  4. Run C:\cassandra\bin\cassandra.bat

Setting Up Django Project

Step 1: Install Required Packages

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Django and Cassandra driver
pip install django
pip install cassandra-driver
pip install django-cassandra-engine

# Create requirements.txt
pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

Your requirements.txt should include:

Django==4.2.7
cassandra-driver==3.28.0
django-cassandra-engine==1.8.0
Enter fullscreen mode Exit fullscreen mode

Step 2: Create Django Project

django-admin startproject myproject
cd myproject
python manage.py startapp myapp
Enter fullscreen mode Exit fullscreen mode

Step 3: Configure Django Settings

Open myproject/settings.py:

# Add cassandra engine to installed apps
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'django_cassandra_engine',
    'myapp',
]

# Configure database connections
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': BASE_DIR / 'db.sqlite3',
    },
    'cassandra': {
        'ENGINE': 'django_cassandra_engine',
        'NAME': 'myapp_keyspace',
        'HOST': '127.0.0.1',
        'PORT': 9042,
        'OPTIONS': {
            'replication': {
                'strategy_class': 'SimpleStrategy',
                'replication_factor': 1
            },
            'connection': {
                'consistency': 'ONE',
                'retry_connect': True,
                'idle_heartbeat_interval': 30,
                'idle_heartbeat_timeout': 10,
            }
        }
    }
}

# Set default database for cassandra engine
DATABASE_ROUTERS = ['django_cassandra_engine.routers.CassandraRouter']
Enter fullscreen mode Exit fullscreen mode

Step 4: Create Cassandra Models

Create myapp/models.py:

from cassandra.cql import TimeUUID
from django_cassandra_engine.models import DjangoCassandraModel
from django_cassandra_engine import columns
import uuid

class UserActivity(DjangoCassandraModel):
    """
    Store user activity logs in Cassandra
    Perfect for time-series data
    """
    user_id = columns.UUID(primary_key=True, default=uuid.uuid4)
    timestamp = columns.DateTime(primary_key=True, clustering_order="DESC")
    activity_type = columns.Text()
    description = columns.Text()
    metadata = columns.Map(columns.Text, columns.Text)

    class Meta:
        get_pk_field = 'user_id'

class SensorData(DjangoCassandraModel):
    """
    IoT sensor data example
    """
    sensor_id = columns.Text(primary_key=True)
    recorded_at = columns.DateTime(primary_key=True, clustering_order="DESC")
    temperature = columns.Float()
    humidity = columns.Float()
    location = columns.Text()

    class Meta:
        get_pk_field = 'sensor_id'

class ChatMessage(DjangoCassandraModel):
    """
    Chat/messaging system
    """
    room_id = columns.UUID(primary_key=True)
    message_id = columns.TimeUUID(primary_key=True, default=uuid.uuid1, clustering_order="DESC")
    user_id = columns.UUID()
    username = columns.Text()
    message = columns.Text()
    timestamp = columns.DateTime()

    class Meta:
        get_pk_field = 'room_id'
Enter fullscreen mode Exit fullscreen mode

Step 5: Sync Models with Cassandra

# Create keyspace and tables
python manage.py sync_cassandra

# Verify in CQL shell
cqlsh

# Check keyspace
USE myapp_keyspace;

# Show tables
DESCRIBE TABLES;

# Check table structure
DESCRIBE TABLE user_activity;
Enter fullscreen mode Exit fullscreen mode

Step 6: Create Views to Use Cassandra Models

Create myapp/views.py:

from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
from .models import UserActivity, SensorData, ChatMessage
import json
import uuid
from datetime import datetime

@csrf_exempt
def log_activity(request):
    """Log user activity"""
    if request.method == 'POST':
        data = json.loads(request.body)

        activity = UserActivity.create(
            user_id=uuid.UUID(data['user_id']),
            timestamp=datetime.now(),
            activity_type=data['activity_type'],
            description=data['description'],
            metadata=data.get('metadata', {})
        )

        return JsonResponse({
            'status': 'success',
            'activity_id': str(activity.user_id)
        })

def get_user_activities(request, user_id):
    """Get all activities for a user"""
    activities = UserActivity.objects.filter(
        user_id=uuid.UUID(user_id)
    ).limit(100)

    results = [{
        'user_id': str(a.user_id),
        'timestamp': a.timestamp.isoformat(),
        'activity_type': a.activity_type,
        'description': a.description,
        'metadata': a.metadata
    } for a in activities]

    return JsonResponse({'activities': results})

@csrf_exempt
def save_sensor_data(request):
    """Save IoT sensor data"""
    if request.method == 'POST':
        data = json.loads(request.body)

        sensor = SensorData.create(
            sensor_id=data['sensor_id'],
            recorded_at=datetime.now(),
            temperature=data['temperature'],
            humidity=data['humidity'],
            location=data['location']
        )

        return JsonResponse({'status': 'success'})

def get_sensor_readings(request, sensor_id):
    """Get recent sensor readings"""
    readings = SensorData.objects.filter(
        sensor_id=sensor_id
    ).limit(50)

    results = [{
        'sensor_id': r.sensor_id,
        'recorded_at': r.recorded_at.isoformat(),
        'temperature': r.temperature,
        'humidity': r.humidity,
        'location': r.location
    } for r in readings]

    return JsonResponse({'readings': results})

@csrf_exempt
def send_message(request):
    """Send chat message"""
    if request.method == 'POST':
        data = json.loads(request.body)

        message = ChatMessage.create(
            room_id=uuid.UUID(data['room_id']),
            message_id=uuid.uuid1(),
            user_id=uuid.UUID(data['user_id']),
            username=data['username'],
            message=data['message'],
            timestamp=datetime.now()
        )

        return JsonResponse({'status': 'success'})

def get_room_messages(request, room_id):
    """Get messages from a chat room"""
    messages = ChatMessage.objects.filter(
        room_id=uuid.UUID(room_id)
    ).limit(100)

    results = [{
        'message_id': str(m.message_id),
        'user_id': str(m.user_id),
        'username': m.username,
        'message': m.message,
        'timestamp': m.timestamp.isoformat()
    } for m in messages]

    return JsonResponse({'messages': results})
Enter fullscreen mode Exit fullscreen mode

Step 7: Configure URLs

Create myapp/urls.py:

from django.urls import path
from . import views

urlpatterns = [
    # User activity endpoints
    path('activity/log/', views.log_activity),
    path('activity/<str:user_id>/', views.get_user_activities),

    # Sensor data endpoints
    path('sensor/save/', views.save_sensor_data),
    path('sensor/<str:sensor_id>/', views.get_sensor_readings),

    # Chat endpoints
    path('chat/send/', views.send_message),
    path('chat/room/<str:room_id>/', views.get_room_messages),
]
Enter fullscreen mode Exit fullscreen mode

Update myproject/urls.py:

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('api/', include('myapp.urls')),
]
Enter fullscreen mode Exit fullscreen mode

Step 8: Test Your Setup

# Run Django development server
python manage.py runserver

# Test endpoints using curl
# Log activity
curl -X POST http://localhost:8000/api/activity/log/ \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "550e8400-e29b-41d4-a716-446655440000",
    "activity_type": "login",
    "description": "User logged in",
    "metadata": {"ip": "192.168.1.1"}
  }'

# Get user activities
curl http://localhost:8000/api/activity/550e8400-e29b-41d4-a716-446655440000/

# Save sensor data
curl -X POST http://localhost:8000/api/sensor/save/ \
  -H "Content-Type: application/json" \
  -d '{
    "sensor_id": "SENSOR001",
    "temperature": 25.5,
    "humidity": 60.0,
    "location": "Room 101"
  }'
Enter fullscreen mode Exit fullscreen mode

Part 2: Setup With Docker

Docker makes Cassandra setup much cleaner and reproducible. Let's containerize everything.

Step 1: Create Docker Configuration Files

Create docker-compose.yml in your project root:

version: '3.8'

services:
  cassandra:
    image: cassandra:4.1
    container_name: cassandra_db
    ports:
      - "9042:9042"
      - "9160:9160"
    environment:
      - CASSANDRA_CLUSTER_NAME=MyCluster
      - CASSANDRA_DC=datacenter1
      - CASSANDRA_RACK=rack1
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_NUM_TOKENS=128
    volumes:
      - cassandra_data:/var/lib/cassandra
    healthcheck:
      test: ["CMD-SHELL", "cqlsh -e 'describe cluster'"]
      interval: 30s
      timeout: 10s
      retries: 5
    networks:
      - django_network

  django:
    build: .
    container_name: django_app
    command: python manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/app
    ports:
      - "8000:8000"
    depends_on:
      cassandra:
        condition: service_healthy
    environment:
      - CASSANDRA_HOST=cassandra
      - CASSANDRA_PORT=9042
    networks:
      - django_network

volumes:
  cassandra_data:

networks:
  django_network:
    driver: bridge
Enter fullscreen mode Exit fullscreen mode

Step 2: Create Dockerfile

Create Dockerfile in project root:

FROM python:3.11-slim

# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Set work directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    python3-dev \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt /app/
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# Copy project
COPY . /app/

# Run migrations and start server
CMD ["sh", "-c", "python manage.py sync_cassandra && python manage.py runserver 0.0.0.0:8000"]
Enter fullscreen mode Exit fullscreen mode

Step 3: Update Django Settings for Docker

Update myproject/settings.py:

import os

# ... existing settings ...

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': BASE_DIR / 'db.sqlite3',
    },
    'cassandra': {
        'ENGINE': 'django_cassandra_engine',
        'NAME': 'myapp_keyspace',
        'HOST': os.environ.get('CASSANDRA_HOST', '127.0.0.1'),
        'PORT': int(os.environ.get('CASSANDRA_PORT', 9042)),
        'OPTIONS': {
            'replication': {
                'strategy_class': 'SimpleStrategy',
                'replication_factor': 1
            },
            'connection': {
                'consistency': 'ONE',
                'retry_connect': True,
                'idle_heartbeat_interval': 30,
                'idle_heartbeat_timeout': 10,
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Create .dockerignore

Create .dockerignore:

__pycache__
*.pyc
*.pyo
*.pyd
.Python
env/
venv/
.env
.git
.gitignore
*.sqlite3
*.log
Enter fullscreen mode Exit fullscreen mode

Step 5: Build and Run with Docker

# Build and start all services
docker-compose up --build

# In another terminal, check Cassandra is ready
docker exec -it cassandra_db cqlsh

# Verify keyspace and tables
USE myapp_keyspace;
DESCRIBE TABLES;

# Check Django logs
docker-compose logs -f django

# Access your Django app
# Open browser: http://localhost:8000
Enter fullscreen mode Exit fullscreen mode

Step 6: Useful Docker Commands

# Start services
docker-compose up -d

# Stop services
docker-compose down

# Stop and remove volumes (clean slate)
docker-compose down -v

# View logs
docker-compose logs -f
docker-compose logs cassandra
docker-compose logs django

# Execute commands in containers
docker exec -it cassandra_db cqlsh
docker exec -it django_app python manage.py shell

# Restart specific service
docker-compose restart django

# Scale Cassandra (multi-node cluster)
docker-compose up -d --scale cassandra=3
Enter fullscreen mode Exit fullscreen mode

Step 7: Production Docker Setup

For production, create docker-compose.prod.yml:

version: '3.8'

services:
  cassandra:
    image: cassandra:4.1
    container_name: cassandra_prod
    ports:
      - "9042:9042"
    environment:
      - CASSANDRA_CLUSTER_NAME=ProdCluster
      - MAX_HEAP_SIZE=4G
      - HEAP_NEWSIZE=800M
    volumes:
      - cassandra_prod_data:/var/lib/cassandra
    deploy:
      resources:
        limits:
          memory: 8G
          cpus: '4'
    networks:
      - prod_network

  django:
    build: .
    container_name: django_prod
    command: gunicorn myproject.wsgi:application --bind 0.0.0.0:8000 --workers 4
    volumes:
      - static_volume:/app/staticfiles
    expose:
      - 8000
    depends_on:
      - cassandra
    environment:
      - DEBUG=False
      - CASSANDRA_HOST=cassandra
    networks:
      - prod_network

  nginx:
    image: nginx:alpine
    container_name: nginx_prod
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - static_volume:/app/staticfiles
    ports:
      - "80:80"
    depends_on:
      - django
    networks:
      - prod_network

volumes:
  cassandra_prod_data:
  static_volume:

networks:
  prod_network:
    driver: bridge
Enter fullscreen mode Exit fullscreen mode

Comparison: Docker vs Without Docker

Without Docker

Pros:

  • ✅ Direct control over installation
  • ✅ Easier debugging initially
  • ✅ No containerization overhead
  • ✅ Good for learning Cassandra

Cons:

  • ❌ Manual installation and configuration
  • ❌ Environment inconsistencies
  • ❌ Harder to replicate on other machines
  • ❌ Cleanup is messy

With Docker

Pros:

  • ✅ Consistent environment everywhere
  • ✅ Easy setup and teardown
  • ✅ Perfect for team collaboration
  • ✅ Production-ready configuration
  • ✅ Easy to scale

Cons:

  • ❌ Learning curve for Docker
  • ❌ Resource overhead
  • ❌ Network complexity

Best Practices

1. Connection Pooling

# In settings.py
'OPTIONS': {
    'connection': {
        'consistency': 'ONE',
        'connections_per_host': 5,
        'protocol_version': 4,
    }
}
Enter fullscreen mode Exit fullscreen mode

2. Proper Data Modeling

# Good: Partition by natural key
class UserActivity(DjangoCassandraModel):
    user_id = columns.UUID(primary_key=True)
    timestamp = columns.DateTime(primary_key=True, clustering_order="DESC")

# Bad: Everything in one partition
class BadModel(DjangoCassandraModel):
    id = columns.UUID(primary_key=True)  # Will create hot partitions
Enter fullscreen mode Exit fullscreen mode

3. Query Optimization

# Good: Query by partition key
activities = UserActivity.objects.filter(user_id=user_id).limit(100)

# Bad: Full table scan
all_activities = UserActivity.objects.all()  # NEVER do this!
Enter fullscreen mode Exit fullscreen mode

4. Error Handling

from cassandra.cluster import NoHostAvailable
from django.core.exceptions import ObjectDoesNotExist

try:
    activity = UserActivity.objects.get(user_id=user_id)
except ObjectDoesNotExist:
    return JsonResponse({'error': 'Not found'}, status=404)
except NoHostAvailable:
    return JsonResponse({'error': 'Database unavailable'}, status=503)
Enter fullscreen mode Exit fullscreen mode

Troubleshooting

Issue: Cassandra won't start

# Check logs
docker-compose logs cassandra

# Increase memory in docker-compose.yml
environment:
  - MAX_HEAP_SIZE=2G
Enter fullscreen mode Exit fullscreen mode

Issue: Django can't connect

# Add connection retry logic
'OPTIONS': {
    'connection': {
        'retry_connect': True,
        'connect_timeout': 30,
    }
}
Enter fullscreen mode Exit fullscreen mode

Issue: Slow queries

# Check with TRACING in cqlsh
TRACING ON;
SELECT * FROM user_activity WHERE user_id = ...;
Enter fullscreen mode Exit fullscreen mode

Conclusion

You now have two ways to integrate Cassandra with Django:

  1. Without Docker: Better for learning and local development
  2. With Docker: Better for teams and production

Choose based on your needs. Start simple, scale when necessary.

Remember: Cassandra is powerful but complex. Use it for the right use cases—high-write scenarios, time-series data, and distributed systems.


Questions about Cassandra + Django? Drop them in the comments!

Top comments (0)