Cassandra is a NoSQL Column Family based database. It is generally recommended in uses cases which need fast writes. Cassandra is deployed on Kubernetes as Statefulset due to its nature unlike stateless applications which are deployed as Deployments.
Bitnami Cassandra Image
There are multiple benefits of using the images from Bitnami. We can refer to their github repo for additional details.
The Bitnami image from cassandra provides us the option to override few of the configurations in the cassandra.yaml file by passing the values as environment variables.
For eg: When we provide an environment variable - CASSANDRA_CLUSTER_NAME – to the container, the value of this variable gets updated in the cassandra.yaml -> cluster_name field.
#cassandra.yaml
..
...
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: ‘Dev Cassandra Cluster'
...
..
The image executes container with a command which executes a shell script. This script when executes creates a cassandra.yaml file with default configurations and some parameters that can be provided to the container. Once the file is generated, it is placed in the appropriate location and then then the last step of the script is to start the Cassandra process.
Like explained above regarding the cluster_name configuration in cassandra.yaml, there are various other configurations which can be updated by providing values through environment variables. For all such variables please go through the github page of Bitnami cassandra image.
Need for further customization
The Bitnami image does provided custom configuration for some of the fields in cassandra.yaml. When we were working on a POC we encountered a problem related to queries failing due to tombstones failure threshold. A simple google search will help you explain what are tombstones in cassandra.
The default limit for tombstone failure threshold is 100000 but our POC use case had nothing to worry even if this threshold was breached. So, we wanted a way to customize this number setting on the Bitnami Cassandra image that we were using. The nearest answer/solution that we got was to provide an entire cassandra.yaml file to the image. This way we could override which ever configuration we wanted and it worked in a simple single instance cluster. But when we scaled the cluster to 3 instances the custom configuration file did not help the new instances to join the cluster due to seed node address setting of the configuration.
Bitnami Image code on Github
If we check the rootfs/opt/bitnami/scripts folder, it has few shell scripts.
Libcassandra.sh -> responsible for setting up the cassandra.yaml by reading the configurations from environment variables.
Cassandra-env.sh -> this script is included in the libcassandra.sh and is responsible for injecting the environment variables into environment using export.
How to customize Bitnami Image to override configuration through environment variables.
So in our case we wanted to override the tombstone failure threshold in cassandra.yaml. For this to achieve we introduced a new variable in Cassandra-env.sh file towards the bottom of this file.
#!/bin/bash
#
# Environment configuration for cassandra
# The values for all environment variables will be set in the below order of precedence
# 1. Custom environment variables defined below after Bitnami defaults
# 2. Constants defined in this file (environment variables with no default), i.e. BITNAMI_ROOT_DIR
# 3. Environment variables overridden via external files using *_FILE variables (see below)
# 4. Environment variables set externally (i.e. current Bash context/Dockerfile/userdata)
# Load logging library
# shellcheck disable=SC1090,SC1091
. /opt/bitnami/scripts/liblog.sh
export BITNAMI_ROOT_DIR="/opt/bitnami"
export BITNAMI_VOLUME_DIR="/bitnami"
…
…
..
# Custom environment variables may be defined below
export CASSANDRA_TOMBSTONE_WARN_THRESHOLD="${CASSANDRA_TOMBSTONE_WARN_THRESHOLD:-1000}"
export CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD="${CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD:-100000}"
The above code just exports the value if passed in as environment variable or default it to 100000 for failure threshold variable. Similarly for warn threshold variable.
In libcassandra.sh file there is a function called cassandra_setup_cluster(). This function actually sets up the Cassandra.yaml file which has all the configuration for the instance.
This function calls the below function to set each configuration in cassandra.yaml file that needs to be customized or overridden.
cassandra_yaml_set "listen_address" "$host"
So we can call the same function and set the tombstone related threshold values accordingly.
cassandra_yaml_set "tombstone_warn_threshold" "$CASSANDRA_TOMBSTONE_WARN_THRESHOLD"
debug "setting tombstone_warn_threshold to $CASSANDRA_TOMBSTONE_WARN_THRESHOLD"
cassandra_yaml_set "tombstone_failure_threshold" "$CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD"
debug "setting tombstone_failure_threshold to $CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD"
With the help of docker file we can now build the Cassandra image and push to image repository.
When this image is used in docker cli or Kubernetes pod and if we want to override the default tombstone threshold specific values then we can pass the below environvent variable with our custom override value.
Docker run Cassandra –iimage mycustomcassandra:latest -e CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD=200000
Or in Kubernetes statefulset definition as follows:
(Please check the last of the env values passed to the statefulset)
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: cassandra
namespace: cassandra
spec:
replicas: 3
selector:
matchLabels:
app.kubernetes.io/instance: cassandra
app.kubernetes.io/name: cassandra
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: cassandra
app.kubernetes.io/name: cassandra
spec:
containers:
- name: cassandra
image: my-custom-cassandra:latest
command:
- bash
- '-ec'
- >
# Node 0 is the password seeder
if [[ $POD_NAME =~ (.*)-0$ ]]; then
echo "Setting node as password seeder"
export CASSANDRA_PASSWORD_SEEDER=yes
else
# Only node 0 will execute the startup initdb scripts
export CASSANDRA_IGNORE_INITDB_SCRIPTS=1
fi
/opt/bitnami/scripts/cassandra/entrypoint.sh
/opt/bitnami/scripts/cassandra/run.sh
ports:
- name: intra
containerPort: 7000
protocol: TCP
- name: tls
containerPort: 7001
protocol: TCP
- name: jmx
containerPort: 7199
protocol: TCP
- name: cql
containerPort: 9042
protocol: TCP
- name: thrift
containerPort: 9160
protocol: TCP
env:
- name: BITNAMI_DEBUG
value: 'false'
- name: CASSANDRA_CLUSTER_NAME
value: cassandra
- name: CASSANDRA_SEEDS
value: cassandra-0.cassandra-headless.cassandra.svc.cluster.local
- name: CASSANDRA_PASSWORD
valueFrom:
secretKeyRef:
name: cassandra
key: cassandra-password
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: CASSANDRA_USER
value: cassandra
- name: CASSANDRA_NUM_TOKENS
value: '256'
- name: CASSANDRA_DATACENTER
value: dc1
- name: CASSANDRA_ENDPOINT_SNITCH
value: SimpleSnitch
- name: CASSANDRA_KEYSTORE_LOCATION
value: /opt/bitnami/cassandra/certs/keystore
- name: CASSANDRA_TRUSTSTORE_LOCATION
value: /opt/bitnami/cassandra/certs/truststore
- name: CASSANDRA_CLIENT_ENCRYPTION
value: 'true'
- name: CASSANDRA_TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
name: cassandra-tls-pass
key: truststore-password
- name: CASSANDRA_KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
name: cassandra-tls-pass
key: keystore-password
- name: CASSANDRA_RACK
value: rack1
- name: CASSANDRA_ENABLE_RPC
value: 'true'
- name: CASSANDRA_TRANSPORT_PORT_NUMBER
value: '7000'
- name: CASSANDRA_JMX_PORT_NUMBER
value: '7199'
- name: CASSANDRA_CQL_PORT_NUMBER
value: '9042'
- name: CASSANDRA_TOMBSTONE_FAILURE_THRESHOLD
value: 200000
- name: CASSANDRA_TOMBSTONE_WARN_THRESHOLD
value: 2000
resources:
limits:
cpu: '3'
memory: 16Gi
requests:
cpu: 1500m
memory: 8Gi
volumeMounts:
- name: data
mountPath: /bitnami/cassandra
- name: certs-shared
mountPath: /opt/bitnami/cassandra/certs
livenessProbe:
exec:
command:
- /bin/bash
- '-ec'
- |
nodetool status
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 30
successThreshold: 1
failureThreshold: 5
readinessProbe:
exec:
command:
- /bin/bash
- '-ec'
- |
nodetool status | grep -E "^UN\\s+${POD_IP}"
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 5
securityContext:
runAsUser: 1001
runAsNonRoot: true
allowPrivilegeEscalation: false
volumeClaimTemplates:
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data
creationTimestamp: null
labels:
app.kubernetes.io/instance: cassandra
app.kubernetes.io/name: cassandra
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Ti
storageClassName: default
volumeMode: Filesystem
Top comments (0)