DEV Community

loading...

Fast startup application with database stored in container images

edersonbrilhante profile image Ederson Brilhante ・4 min read

TL;DR; This article shows which strategy I implemented to allow an application to be ready to use in a few minutes rather than many hours.

In this article, I will talk about the strategy I used in the project Vilicus to have big databases synced in new setups. For those who don't know Vilicus yet, I recommend reading my article about it.


Why the application takes too much time to start?

At this moment the project Vilicus uses Anchore, Clair, and Trivy as vendors to run security scans in container images. Each vendor has its own programming language, database, internal dependencies and can use different vulnerabilities databases.

Vilicus itself starts in milliseconds, but to be ready to use it's necessary to wait for the vendors to sync the vulnerabilities database with the latest changes. But these syncs can consume a lot of time.

See for example Anchore, the one with more time-consuming to complete the sync:

There is no exact time frame for the initial sync to complete as it depends heavily on environmental factors, such as the host's memory/cpu allocation, disk space, and network bandwidth. Generally, the initial sync should complete within 8 hours but may take longer. Subsequent feed updates are much faster as only deltas are updated.
https://docs.anchore.com/current/docs/faq/

Clair takes more or less 20 minutes. And Trivy is ready in a few seconds.

If you run everything from scratch will take almost 1 day to sync all vulnerabilities databases, but after this major sync, the next syncs will be faster.

This will be a problem if you would like to run an ephemeral instance in your CI / CD, so waiting hours for the sync to be completed before you can run the first scan will be inviable. Thinking about how to fix this problem, I came with a solution: Save updated database snapshots in container images every day.

Now you must be thinking, this is not a good practice, and normally I would agree. But I believe there are exceptions in specific cases, such as fixing the problem is more important than conventions.


Saving the database in a container image

I'll show you in detail how I made Anchore work, but Clair and Trivy are not much different

Anchore

First I have a compacted dump SQL, with the database already synced with less last 6 months, stored in a container image: vilicus/anchoredb:dumpsql. So we don't need to wait many hours, we just update the delta.

I used this image as a base to create a local image(vilicus/anchoredb:files) with a script to restore the database when this image runs as a container.

Dockerfile content

FROM vilicus/anchoredb:dumpsql as dumpsql

FROM postgres:9.6.21-alpine
LABEL vilicus.app.version=9.6.21-alpine

COPY --chown=postgres:postgres --from=dumpsql /opt/vilicus/data/anchore_db.tar.gz /opt/vilicus/data/anchore_db.tar.gz
COPY deployments/dockerfiles/anchore/db/files/restore-dbs.sh /docker-entrypoint-initdb.d/01.restore-dbs.sh
Enter fullscreen mode Exit fullscreen mode

Building the container image

docker build -f deployments/dockerfiles/anchore/db/files/Dockerfile -t vilicus/anchoredb:files .
Enter fullscreen mode Exit fullscreen mode

The image vilicus/anchoredb:files is referenced in deployments/docker-compose.updater.yml

Here we start the anchore and the anchoredb.

docker-compose -f deployments/docker-compose.updater.yml up \
    --build -d --force \
    --remove-orphans \
    --renew-anon-volumes anchore
Enter fullscreen mode Exit fullscreen mode

After that, we run this command to restore the database.

docker exec anchoredb sh -c 'docker-entrypoint.sh postgres' &
Enter fullscreen mode Exit fullscreen mode

So we wait for the restore and the database we ready to be connected.

docker run --network container:anchore vilicus/vilicus:latest \
    sh -c "dockerize -wait http://anchore:8228/health -wait-retry-interval 10s -timeout 1000s echo done"
Enter fullscreen mode Exit fullscreen mode

With the Anchore Engine and the DB ready, we start the sync.

docker exec anchore sh -c 'anchore-cli system wait'
Enter fullscreen mode Exit fullscreen mode

When the sync finishes we stop anchore and we kill gracefully the Postgres PID in anchoredb.

docker stop anchore
docker exec -u postgres anchoredb sh -c 'pg_ctl stop -m smart'
Enter fullscreen mode Exit fullscreen mode

We commit the container, with the changes made by the sync, into a new container image vilicus/anchoredb:local-update

CID=$(docker inspect --format="{{.Id}}" anchoredb)
docker commit $CID vilicus/anchoredb:local-update
Enter fullscreen mode Exit fullscreen mode

So we finally build the container image that goes to docker hub, by copying the Postgres data from the image vilicus/anchoredb:local-update

Dockerfile content

FROM as db
FROM postgres:9.6.21-alpine
COPY --chown=postgres:postgres --from=db /data/ /data
Enter fullscreen mode Exit fullscreen mode

Building the container image

docker build -f deployments/dockerfiles/anchore/db/Dockerfile -t vilicus/anchoredb:latest .
Enter fullscreen mode Exit fullscreen mode

Check the complete script here

Clair and Trivy

For Clair check here.

For Trivy check here.


Updating the images every day

To have the databases with the latest changes, I have a GitHub workflow that runs a job everyday building the images and pushing them to the Docker Hub.

Check the workflow

Complete workflow
Complete workflow


That's it!

In case you have any questions, please leave a comment here or ping me on 🔗 LinkedIn.

Discussion (0)

Forem Open with the Forem app