DEV Community

Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on • Edited on

Docker Image for YugabyteDB Developers

You have a production YugabyteDB and you need to create multiple developer databases. To achieve this, you may have to anonymize certain data and prepare a database that the developers can use. Furthermore, the developers want to run it in their own Docker container. The question is, how can you load this database into new containers?

One solution is to use ysql_dump to export the database and import it into an empty developer container. This may take several minutes for large databases in YugabyteDB if you have thousand of tables, indexes and referential integrity constraints.

To achieve a faster solution, you can copy the physical files. Doing this in a production cluster can be complex because it requires running snapshots on all yb-tserver nodes and exporting the metadata from yb-master. In development, if you have a single node that is started with yugabyted. The process is much simpler. All data and metadata are contained in the --base_dir, which by default is set to /root/var.

There are two solutions: build a Docker image containing the data directory or use an external volume.

Docker image containing the database files

To create a Docker image, the following Dockerfile starts a yugabyte instance, initializes the database from the scripts in an initial directory, and copies the data directory to a new image.

Here is my Dockerfile:

FROM yugabytedb/yugabyte as init
RUN mkdir /initial_scripts_dir
ADD     . /initial_scripts_dir
RUN bin/yugabyted start --advertise_address=0.0.0.0 \
    --background=true --base_dir=/root/var \
    --initial_scripts_dir=/initial_scripts_dir
RUN yugabyted stop
# copy to a new image without the /initial_scripts_dir
FROM yugabytedb/yugabyte as base
COPY --from=init /root/var /root/var
WORKDIR /root/var/logs
CMD yugabyted start --background=false --advertise_address=$(hostname)
Enter fullscreen mode Exit fullscreen mode

I have a file called demo.dump that is generated by ysql_dump. I create demo.sql to run and create a database. When yugabyted starts with a --initial_scripts_dir it runs all .sql and .cql files. I use another extension for the dump, as I call it from my .sql file, and leave it untouched.

Here is my demo.sql:

create database demo;
\set ON_ERROR_STOP on
\c demo
\ir demo.dump
Enter fullscreen mode Exit fullscreen mode

This is run when the image build starts yugabyted.

Using this Dockerfile and the .sql scripts in the current directory, I build the yb-dev image:

docker build -t yb-dev .
Enter fullscreen mode Exit fullscreen mode

To utilize this image, the developer can simply create a container:

docker run -d --name yb-dev1 -p5433:5433 -p 15433:15433 yb-dev

psql -h localhost -p 5433 -U yugabyte -d demo

Enter fullscreen mode Exit fullscreen mode

By default, Docker uses the overlay2 storage driver that copies entire files to the container layer when they are written to. The good news is that, for YugabyteDB, this is not a major issue, as the largest files are immutable SST files that do not change. Any new data goes to new files, WAL or SST, meaning existing files are not modified.

External volume for the database directory

If your developers prefer to use a standard image containing only the binaries and store the database in an external volume, you can extract a tarball with the initial database base directory:

docker run -i yb-dev tar -zcvf - /root/var > demo.tgz
Enter fullscreen mode Exit fullscreen mode

To use it, developers can extract it and start the YugabyteDB container by specifying the volume:

tar --sparse -zxvf demo.tgz

docker run -d --name yb-dev2 -p5433:5433 -p15433:15433 \
 -v $(pwd)/root/var:/root/var \
 yugabytedb/yugabyte \
 yugabyted start --advertise_address=0.0.0.0 --background=false

psql -h localhost -p 5433 -U yugabyte -d demo

Enter fullscreen mode Exit fullscreen mode

yugabyted

This works only with single-container clusters for development, starting with yugabyted. To copy the physical files from a multi-node cluster, you need to take distributed snapshots to get all data and metadata consistent.

You might be wondering why it's necessary to copy the entire base directory /root/var instead of /root/var/data. The reason is that the UUID of the universe, the yb-master, and the yb-tserver are located in /root/var/conf. When opening an existing database, these values must match with the data directory, so it's essential to copy the entire base directory to ensure that everything matches up correctly.

When shipping the database to a developer, it can be done as a Docker image or an external volume to mount. The container starts up immediately since it opens an already existing database.

Top comments (0)