DEV Community

Saket
Saket

Posted on

Extend Bitnami Cassandra Image to customize the configuration in cassandra.yaml

Cassandra on Kubernetes

Cassandra is a NoSQL Column Family based database. It is generally recommended in uses cases which need fast writes. Cassandra is deployed on Kubernetes as Statefulset due to its nature unlike stateless applications which are deployed as Deployments.

Bitnami Cassandra Image

There are multiple benefits of using the images from Bitnami. We can refer to their github repo for additional details.
The Bitnami image from cassandra provides us the option to override few of the configurations in the cassandra.yaml file by passing the values as environment variables.
For eg: When we provide an environment variable - CASSANDRA_CLUSTER_NAME – to the container, the value of this variable gets updated in the cassandra.yaml -> cluster_name field.

#cassandra.yaml
..
...
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: ‘Dev Cassandra Cluster'
...
..
Enter fullscreen mode Exit fullscreen mode

The image executes container with a command which executes a shell script. This script when executes creates a cassandra.yaml file with default configurations and some parameters that can be provided to the container. Once the file is generated, it is placed in the appropriate location and then then the last step of the script is to start the Cassandra process.
Like explained above regarding the cluster_name configuration in cassandra.yaml, there are various other configurations which can be updated by providing values through environment variables. For all such variables please go through the github page of Bitnami cassandra image.

Need for further customization

The Bitnami image does provided custom configuration for some of the fields in cassandra.yaml. When we were working on a POC we encountered a problem related to queries failing due to tombstones failure threshold. A simple google search will help you explain what are tombstones in cassandra.
The default limit for tombstone failure threshold is 100000 but our POC use case had nothing to worry even if this threshold was breached. So, we wanted a way to customize this number setting on the Bitnami Cassandra image that we were using. The nearest answer/solution that we got was to provide an entire cassandra.yaml file to the image. This way we could override which ever configuration we wanted and it worked in a simple single instance cluster. But when we scaled the cluster to 3 instances the custom configuration file did not help the new instances to join the cluster due to seed node address setting of the configuration.

Bitnami Image code on Github

Bitnami Cassandra Github

If we check the rootfs/opt/bitnami/scripts folder, it has few shell scripts.

Bitnami Cassandra Scripts Github

Libcassandra.sh -> responsible for setting up the cassandra.yaml by reading the configurations from environment variables.
Cassandra-env.sh -> this script is included in the libcassandra.sh and is responsible for injecting the environment variables into environment using export.

How to customize Bitnami Image to override configuration through environment variables.

So in our case we wanted to override the tombstone failure threshold in cassandra.yaml. For this to achieve we introduced a new variable in Cassandra-env.sh file towards the bottom of this file.

#!/bin/bash
#
# Environment configuration for cassandra

# The values for all environment variables will be set in the below order of precedence
# 1. Custom environment variables defined below after Bitnami defaults
# 2. Constants defined in this file (environment variables with no default), i.e. BITNAMI_ROOT_DIR
# 3. Environment variables overridden via external files using *_FILE variables (see below)
# 4. Environment variables set externally (i.e. current Bash context/Dockerfile/userdata)

# Load logging library
# shellcheck disable=SC1090,SC1091
. /opt/bitnami/scripts/liblog.sh

export BITNAMI_ROOT_DIR="/opt/bitnami"
export BITNAMI_VOLUME_DIR="/bitnami"
…
…
..

# Custom environment variables may be defined below
export CASSANDRA_TOMBSTONE_WARN_THRESHOLD="${CASSANDRA_TOMBSTONE_WARN_THRESHOLD:-1000}"
export CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD="${CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD:-100000}"


Enter fullscreen mode Exit fullscreen mode

The above code just exports the value if passed in as environment variable or default it to 100000 for failure threshold variable. Similarly for warn threshold variable.
In libcassandra.sh file there is a function called cassandra_setup_cluster(). This function actually sets up the Cassandra.yaml file which has all the configuration for the instance.
This function calls the below function to set each configuration in cassandra.yaml file that needs to be customized or overridden.

cassandra_yaml_set "listen_address" "$host"
Enter fullscreen mode Exit fullscreen mode

So we can call the same function and set the tombstone related threshold values accordingly.

        cassandra_yaml_set "tombstone_warn_threshold" "$CASSANDRA_TOMBSTONE_WARN_THRESHOLD"
        debug "setting tombstone_warn_threshold to $CASSANDRA_TOMBSTONE_WARN_THRESHOLD"
        cassandra_yaml_set "tombstone_failure_threshold" "$CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD"
        debug "setting tombstone_failure_threshold to $CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD"       

Enter fullscreen mode Exit fullscreen mode

With the help of docker file we can now build the Cassandra image and push to image repository.
When this image is used in docker cli or Kubernetes pod and if we want to override the default tombstone threshold specific values then we can pass the below environvent variable with our custom override value.

Docker run Cassandra –iimage mycustomcassandra:latest -e CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD=200000
Enter fullscreen mode Exit fullscreen mode

Or in Kubernetes statefulset definition as follows:
(Please check the last of the env values passed to the statefulset)

kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: cassandra
  namespace: cassandra
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/instance: cassandra
      app.kubernetes.io/name: cassandra
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: cassandra
        app.kubernetes.io/name: cassandra
    spec:
      containers:
        - name: cassandra
          image: my-custom-cassandra:latest
          command:
            - bash
            - '-ec'
            - >
              # Node 0 is the password seeder

              if [[ $POD_NAME =~ (.*)-0$ ]]; then
                  echo "Setting node as password seeder"
                  export CASSANDRA_PASSWORD_SEEDER=yes
              else
                  # Only node 0 will execute the startup initdb scripts
                  export CASSANDRA_IGNORE_INITDB_SCRIPTS=1
              fi

              /opt/bitnami/scripts/cassandra/entrypoint.sh
              /opt/bitnami/scripts/cassandra/run.sh
          ports:
            - name: intra
              containerPort: 7000
              protocol: TCP
            - name: tls
              containerPort: 7001
              protocol: TCP
            - name: jmx
              containerPort: 7199
              protocol: TCP
            - name: cql
              containerPort: 9042
              protocol: TCP
            - name: thrift
              containerPort: 9160
              protocol: TCP
          env:
            - name: BITNAMI_DEBUG
              value: 'false'
            - name: CASSANDRA_CLUSTER_NAME
              value: cassandra
            - name: CASSANDRA_SEEDS
              value: cassandra-0.cassandra-headless.cassandra.svc.cluster.local
            - name: CASSANDRA_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: cassandra
                  key: cassandra-password
            - name: POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: CASSANDRA_USER
              value: cassandra
            - name: CASSANDRA_NUM_TOKENS
              value: '256'
            - name: CASSANDRA_DATACENTER
              value: dc1
            - name: CASSANDRA_ENDPOINT_SNITCH
              value: SimpleSnitch
            - name: CASSANDRA_KEYSTORE_LOCATION
              value: /opt/bitnami/cassandra/certs/keystore
            - name: CASSANDRA_TRUSTSTORE_LOCATION
              value: /opt/bitnami/cassandra/certs/truststore
            - name: CASSANDRA_CLIENT_ENCRYPTION
              value: 'true'
            - name: CASSANDRA_TRUSTSTORE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: cassandra-tls-pass
                  key: truststore-password
            - name: CASSANDRA_KEYSTORE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: cassandra-tls-pass
                  key: keystore-password
            - name: CASSANDRA_RACK
              value: rack1
            - name: CASSANDRA_ENABLE_RPC
              value: 'true'
            - name: CASSANDRA_TRANSPORT_PORT_NUMBER
              value: '7000'
            - name: CASSANDRA_JMX_PORT_NUMBER
              value: '7199'
            - name: CASSANDRA_CQL_PORT_NUMBER
              value: '9042'
            - name: CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD
              value: 200000
            - name: CASSANDRA_TOMBSTONE_WARN_THRESHOLD
              value: 2000
          resources:
            limits:
              cpu: '3'
              memory: 16Gi
            requests:
              cpu: 1500m
              memory: 8Gi
          volumeMounts:
            - name: data
              mountPath: /bitnami/cassandra
            - name: certs-shared
              mountPath: /opt/bitnami/cassandra/certs
          livenessProbe:
            exec:
              command:
                - /bin/bash
                - '-ec'
                - |
                  nodetool status
            initialDelaySeconds: 60
            timeoutSeconds: 5
            periodSeconds: 30
            successThreshold: 1
            failureThreshold: 5
          readinessProbe:
            exec:
              command:
                - /bin/bash
                - '-ec'
                - |
                  nodetool status | grep -E "^UN\\s+${POD_IP}"
            initialDelaySeconds: 60
            timeoutSeconds: 5
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 5
          securityContext:
            runAsUser: 1001
            runAsNonRoot: true
            allowPrivilegeEscalation: false
  volumeClaimTemplates:
    - kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: data
        creationTimestamp: null
        labels:
          app.kubernetes.io/instance: cassandra
          app.kubernetes.io/name: cassandra
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Ti
        storageClassName: default
        volumeMode: Filesystem

Enter fullscreen mode Exit fullscreen mode

Top comments (0)