DEV Community

Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on • Edited on

A smaller YugabyteDB image for CI/CD (example with Sakila)

To establish a CI/CD pipeline, setting up a new database and executing the DDL (Data Definition Language) scripts to create the schema and the DML (Data Manipulation Language) scripts to populate data can be time-consuming. Creating an image that includes all the necessary schema and data for the process to run smoothly is advisable to streamline this.
Here is an example where I install the well-known Sakila database:

# Start YugabyteDB
yugabyted start

# Create "sakila" database once ready
until ysqlsh -h $(hostname) -c "create database sakila" ; do sleep 1 ; done | uniq

# get the DDL and DML scripts from jOOQ repository, and run them
curl -Ls https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-schema.sql | 
 ysqlsh -eh $(hostname)
curl -Ls https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-insert-data.sql | 
 ysqlsh -eh $(hostname)

# stop YugabyteDB
yugabyted stop

Enter fullscreen mode Exit fullscreen mode

You can use a Dockerfile to create an image that starts quickly with the database schema and data pre-installed.

The inserted data in the LSM Tree is stored only in the write-ahead logs (WALs) while the size remains small as the MemTables were not flushed to SST files.

# du -hs /root/var/data/yb-data/*/{data,wals} | sort -h
3.1M    /root/var/data/yb-data/master/wals
4.0M    /root/var/data/yb-data/tserver/data
13M     /root/var/data/yb-data/master/data
70M     /root/var/data/yb-data/tserver/wals
Enter fullscreen mode Exit fullscreen mode

If you build a docker image using this, the resulting image will be too large:

# docker image ls yb-sakila
REPOSITORY   TAG       IMAGE ID       CREATED              SIZE
yb-sakila    latest    819b80485518   About a minute ago   3.87GB
Enter fullscreen mode Exit fullscreen mode

The reason is that there are sparse files that do not use space for the unallocated parts, but Docker stores the whole file. With --apparent-size, you can check the size:

du -hs --apparent-size /root/var/data/yb-data/*/{data,wals} | sort -h
1.8M    /root/var/data/yb-data/tserver/data
13M     /root/var/data/yb-data/master/data
26M     /root/var/data/yb-data/master/wals
1.7G    /root/var/data/yb-data/tserver/wals
Enter fullscreen mode Exit fullscreen mode

this indicates that every tablet possesses an index.000000000 file of approximately 23 megabytes in size:
Image description

The Sakila schema consists of sixty tables and indexes, which consume over one gigabyte of space. If we extrapolate the size of an image for a schema with a thousand tables, it would be enormous.

However, there is some good news! The index file in question is not actually necessary. YugabyteDB's Raft replication code has been taken from Apache Kudo, and this file is simply an index for the WAL cached in memory. It is used when a follower disconnected from a leader comes back and retrieves a range of write operations to resolve the gap. The index does not need to be persisted as it is re-created when starting. This is described in Apache Kudo's LogIndex and is implemented as a memory-mapped file that is never synced to disk.

Therefore, we can safely drop it when stopping YugabyteDB.

yugabyted stop
rm -f /root/var/data/yb-data/*/wals/table-*/tablet-*/index.*
Enter fullscreen mode Exit fullscreen mode

In a Dockerfile, all actions must be performed in the same layer so that the allocated space is reclaimed upon file removal. Here is an example:

FROM yugabytedb/yugabyte:latest

# get Sakila DDL and DML scripts
ADD https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-schema.sql .
ADD https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-insert-data.sql .

# Start YugabyteDB to run the scripts
RUN yugabyted start \
&& until ysqlsh -h $(hostname) -c "create database sakila" ; do sleep 1 ; done | uniq \
&& ysqlsh -h $(hostname) -d sakila -f yugabytedb-sakila-schema.sql \
&& ysqlsh -h $(hostname) -d sakila -f yugabytedb-sakila-insert-data.sql \
&& yugabyted stop \
&& rm -f /root/var/data/yb-data/*/wals/table-*/tablet-*/index.*

# starting a container can re-start YugabyteDB
ENTRYPOINT yugabyted start --background=false
Enter fullscreen mode Exit fullscreen mode

I can create an image and verify its size:

docker build -t yb-sakila .
docker image ls yb-sakila
Enter fullscreen mode Exit fullscreen mode

The image is now back to its expected size, the base image with an additional 150MB:

# docker image ls yb-sakila
REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
yb-sakila    latest    b462eb5aaacb   17 seconds ago   2.19GB
Enter fullscreen mode Exit fullscreen mode

This image can be used easily, for example:

docker rm -f $(docker ps -qa)
docker run -d -p5433:5433 --name yb-sakila yb-sakila
docker exec yb-sakila bash -c 'until postgres/bin/pg_isready -h $(hostname) ; do sleep 1 ; done | uniq'
psql -h localhost -p 5433 -U yugabyte sakila <<<'\d'
Enter fullscreen mode Exit fullscreen mode

Image description

Top comments (0)