Ederson Brilhante

Posted on Apr 1, 2021

Fast startup application with database stored in container images

#docker #devops #database #programming

TL;DR; This article shows which strategy I implemented to allow an application to be ready to use in a few minutes rather than many hours.

In this article, I will talk about the strategy I used in the project Vilicus to have big databases synced in new setups. For those who don't know Vilicus yet, I recommend reading my article about it.

Why the application takes too much time to start?

At this moment the project Vilicus uses Anchore, Clair, and Trivy as vendors to run security scans in container images. Each vendor has its own programming language, database, internal dependencies and can use different vulnerabilities databases.

Vilicus itself starts in milliseconds, but to be ready to use it's necessary to wait for the vendors to sync the vulnerabilities database with the latest changes. But these syncs can consume a lot of time.

See for example Anchore, the one with more time-consuming to complete the sync:

There is no exact time frame for the initial sync to complete as it depends heavily on environmental factors, such as the host's memory/cpu allocation, disk space, and network bandwidth. Generally, the initial sync should complete within 8 hours but may take longer. Subsequent feed updates are much faster as only deltas are updated.
https://docs.anchore.com/current/docs/faq/

Clair takes more or less 20 minutes. And Trivy is ready in a few seconds.

If you run everything from scratch will take almost 1 day to sync all vulnerabilities databases, but after this major sync, the next syncs will be faster.

This will be a problem if you would like to run an ephemeral instance in your CI / CD, so waiting hours for the sync to be completed before you can run the first scan will be inviable. Thinking about how to fix this problem, I came with a solution: Save updated database snapshots in container images every day.

Now you must be thinking, this is not a good practice, and normally I would agree. But I believe there are exceptions in specific cases, such as fixing the problem is more important than conventions.

Saving the database in a container image

I'll show you in detail how I made Anchore work, but Clair and Trivy are not much different

Anchore

First I have a compacted dump SQL, with the database already synced with less last 6 months, stored in a container image: vilicus/anchoredb:dumpsql. So we don't need to wait many hours, we just update the delta.

I used this image as a base to create a local image(vilicus/anchoredb:files) with a script to restore the database when this image runs as a container.

Dockerfile content

FROM vilicus/anchoredb:dumpsql as dumpsql

FROM postgres:9.6.21-alpine
LABEL vilicus.app.version=9.6.21-alpine

COPY --chown=postgres:postgres --from=dumpsql /opt/vilicus/data/anchore_db.tar.gz /opt/vilicus/data/anchore_db.tar.gz
COPY deployments/dockerfiles/anchore/db/files/restore-dbs.sh /docker-entrypoint-initdb.d/01.restore-dbs.sh

Building the container image

docker build -f deployments/dockerfiles/anchore/db/files/Dockerfile -t vilicus/anchoredb:files .

The image vilicus/anchoredb:files is referenced in deployments/docker-compose.updater.yml

Here we start the anchore and the anchoredb.

docker-compose -f deployments/docker-compose.updater.yml up \
    --build -d --force \
    --remove-orphans \
    --renew-anon-volumes anchore

After that, we run this command to restore the database.

docker exec anchoredb sh -c 'docker-entrypoint.sh postgres' &

So we wait for the restore and the database we ready to be connected.

docker run --network container:anchore vilicus/vilicus:latest \
    sh -c "dockerize -wait http://anchore:8228/health -wait-retry-interval 10s -timeout 1000s echo done"

With the Anchore Engine and the DB ready, we start the sync.

docker exec anchore sh -c 'anchore-cli system wait'

When the sync finishes we stop anchore and we kill gracefully the Postgres PID in anchoredb.

docker stop anchore
docker exec -u postgres anchoredb sh -c 'pg_ctl stop -m smart'

We commit the container, with the changes made by the sync, into a new container image vilicus/anchoredb:local-update

CID=$(docker inspect --format="{{.Id}}" anchoredb)
docker commit $CID vilicus/anchoredb:local-update

So we finally build the container image that goes to docker hub, by copying the Postgres data from the image vilicus/anchoredb:local-update

Dockerfile content

FROM as db
FROM postgres:9.6.21-alpine
COPY --chown=postgres:postgres --from=db /data/ /data

Building the container image

docker build -f deployments/dockerfiles/anchore/db/Dockerfile -t vilicus/anchoredb:latest .

Check the complete script here

Clair and Trivy

For Clair check here.

For Trivy check here.

Updating the images every day

To have the databases with the latest changes, I have a GitHub workflow that runs a job everyday building the images and pushing them to the Docker Hub.

Check the workflow

That's it!

In case you have any questions, please leave a comment here or ping me on 🔗 LinkedIn.

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

DEV Community

Fast startup application with database stored in container images

Why the application takes too much time to start?

Saving the database in a container image

Anchore

Clair and Trivy

Updating the images every day

That's it!

Your AI Code Assistant

Top comments (0)

The Next Generation Developer Platform

Read next

Daily JavaScript Challenge #JS-90: Generate Collatz Sequence

SQL 101 | Chapter 3: Mastering Data Retrieval with SELECT Statements

SQL Transactions - COMMIT, ROLLBACK, and Savepoints with Python

Real-time Data Synchronization Patterns: Build Modern Web Apps with WebSocket and Firebase Integration