Execute startup scripts in cassandra docker

#docker #cassandra #nosql #programming

Recently I started learning about Cassandra. As with any of my learning journeys, it started with me creating a small-scale application in my local machine to use as my inner feedback loop to tinker with. I chose to have a docker-compose file to start the Cassandra instance for me whenever I wanted.

One of the things I wanted to do was to auto-create and provision the keyspace in the Cassandra container as soon as it started. There were 2 ways I found to do that,

Using `docker-entrypoint-initdb.d` :

if using the bitnami/cassandra images, the script (sh, cql or cql.gz) files at the /docker-entrypoint-initdb.d directory are executed at the startup. Its fairly easy to accomplish that. Assuming the startup scripts are at ./init-scripts/cassandra location, the docker-compose file would look like this.

services:
  cassandra:
    image: bitnami/cassandra:4.0.7
    ports:
      - "7000:7000"
      - "9042:9042"
    environment:
      - CASSANDRA_CLUSTER_NAME=test
    volumes:
      - "./init-scripts/cassandra:/docker-entrypoint-initdb.d"

But somehow I couldn't get the bitnami docker image to work for me without issues. The other option was to use the official cassandra image. But that one didn't have the docker-entrypoint-initdb.d startup script like the bitnami image. The other solution was to use an init-container.

Using init-containers:

The idea of an init-container is fairly simple. In addition to the actual container, one additional container is needed to startup, execute startup scripts on actual container and then stop silently.

services:
  cassandra:
    image: cassandra:4.1.0
    ports:
      - "7000:7000"
      - "9042:9042"
    environment:
      - CASSANDRA_CLUSTER_NAME=test
  init-cassandra:
    image: cassandra:4.1.0
    depends_on:
      - cassandra # IMPORTANT: this init container can only start up after the original container is started
    restart: "no" # IMPORTANT: the restart should be set to "no" so that the init container is used only once after the original container is started
    entrypoint: ["/init.sh"] # executing the init script
    volumes:
      - ./cassandra-init-data.sh:/init.sh # the init script is added via volumes

The init script (./cassandra-init-data.sh in this case) looked like this,

#!/usr/bin/env bash

until printf "" 2>>/dev/null >>/dev/tcp/cassandra/9042; do
    sleep 5;
    echo "Waiting for cassandra...";
done

echo "Creating keyspace"
cqlsh cassandra -u cassandra -p cassandra -e "CREATE KEYSPACE IF NOT EXISTS spring_cassandra WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};"

This script waits for cassandra 9042 port to be available in the original image and then creates a keyspace.

Note: printf "" 2>>/dev/null >>/dev/tcp/cassandra/9042 checks whether any message can be sent to cassandra:9042 port or else it is failing silently. Check this link for more details.

And this is it. It is something new that I learned today and thought it is interesting enough to be shared in here.