loading...
Aerospike

Getting started with Aerospike on Docker

pbanavara profile image Pradeep Banavara ・4 min read

I joined Aerospike around mid March just before the Covid lockdown went into effect. Call me fortunate. I was super excited to learn everything about Aerospike and wanted to get my hands dirty as quickly as possible. Somehow, the 'quickly' aspect did turn out to be that quick.

Fast forward to May and our annual summit was going live, virtually of course. I got the opportunity to present a talk on Docker. I thought of putting some of that talk content here in a blog, so that other developers who are keen on getting started with Aerospike can do so in less than 5 minutes using Docker, and of course Markdown.

Why Docker

Containers revolutionized shipping when they came about. The container standard established one size package to transport goods globally. This led to at least a 100x improvement in logistics speed and probably saved billions of dollars in transporting costs.

Software containers are unleashing a similar revolution in deploying and distributing software applications. Regardless of what environment the container will be shipped to, a developer can be certain that the software they have packaged in that container will execute successfully. Compare this to almost always ending up in 'the application runs fine on my laptop, probably it's an environment problem'.

Docker is now the most widely used and deployed container platform, even though Mesos was the first container platform. So let's see how to get started with Aerospike on Docker.

Once you have Docker installed, the absolute simplest way to run Aerospike in a container is to just run this command

docker run -d --name aerospike -p 3000:3000 -p 3001:3001 -p 3002:3002 -p 3003:3003 aerospike

We need to check if this container started or not. So we use docker ps to do that and if everything went well, the response we see is this

CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS                              NAMES
ff7482742691        aerospike           "/entrypoint.sh asd"   3 seconds ago       Up 3 seconds        0.0.0.0:3000-3003->3000-3003/tcp   aerospike

That's it, we have Aerospike running. This is accessible on the host machine from the 0.0.0.0 address and port 3000.

To stop the container docker stop ff7482742691 and again to start the container docker start ff7482742691. There is no need to use the 'docker run' command. docker run pulls the respective image from the Docker hub. Once that happens, the image is available locally. Just start and stop commands suffice from here.

To quickly use the database, we can run the aerospike-tools image and use the command line client aql to connect to this container.

docker run -ti aerospike/aerospike-tools:latest aql -h  $(docker inspect -f '{{.NetworkSettings.IPAddress }}' aerospike)

Since we are running another container, we need to connect to the interface of the aerospike container and not 0.0.0.0. This is the beauty of containers. There is no scope for confusion about 'which interface is this container running on' especially when you have multiple interfaces. Once you are connected, you will get the AQL prompt.

Seed:         172.17.0.2
User:         None
Config File:  /etc/aerospike/astools.conf /root/.aerospike/astools.conf
Aerospike Query Client
Version 3.26.2
C Client Version 4.6.15
Copyright 2012-2020 Aerospike. All rights reserved.
aql> select * from test.demo
0 rows in set (0.248 secs)

OK

aql>

In the default container, Aerospike runs in the RAM with a file backup. If we look at the Dockerfile for Aerospike, the image uses a default configuration file which specifies this mode in the namespace configuration

namespace ${NAMESPACE} {
    replication-factor ${REPL_FACTOR}
    memory-size ${MEM_GB}G
    default-ttl ${DEFAULT_TTL} # 5 days, use 0 to never expire/evict.
    nsup-period ${NSUP_PERIOD}
    #   storage-engine memory

    # To use file storage backing, comment out the line above and use the
    # following lines instead.

    storage-engine device {
        file /opt/aerospike/data/${NAMESPACE}.dat
        filesize ${STORAGE_GB}G
        data-in-memory true # Store data in memory in addition to file.
    }
}

While Aerospike runs fine in this mode and is good for reasonable workloads, it's best to run Aerospike on block SSD devices to get the best of latency and persistence.

Aerospike hybrid mode on Docker

This is best done tried on a public cloud infrastructure such as AWS because it is super simple to get a new SSD volume and use it as a raw block device.
Once we have a new device attached to a EC2 linux VM (which has docker installed of course) we can map that volume to be used inside the container like this

docker run -tid --name aerospike -p 3000:3000 -p 3001:3001 -p 3002:3002 -p 3003:3003 -v /opt:/opt/aerospike/etc --device '/dev/xvdf:/dev/xvdf' aerospike/aerospike-server:latest asd --config-file /opt/aerospike/etc/aerospike.conf

The configuration section specific to the device is

namespace test {
        replication-factor 2
        memory-size 1G
        storage-engine device {
                device /dev/xvdf
                write-block-size 128k
        }
}

Where /dev/xvdf is the new device volume, /opt is the host location where the config file is stored. This is mapped to /opt/aerospike/etc inside the container.

With that, Aerospike is now running inside a Docker container in hybrid mode.

Yet, Aerospike is still running on a single node. Of what good is this, when the real power of Aerospike is in horizontal scaling to hundreds of nodes. The word scaling immediately sets off a few bells - Kubernetes, Docker swarm etc. While these are the best options to scale software, they have a high initial learning curve. Does that mean we have to sacrifice on exploring Aerospike's clustering abilities.

Absolutely not. In fact it is very straightforward to setup a cluster on the same host machine.

In fact, one can run as many Aerospike containers as the host machine hardware can accommodate. So it is very feasible to run a 3 or 5 node cluster.

At this point, one can test Aerospike in either AP or SC mode.

Posted on by:

pbanavara profile

Pradeep Banavara

@pbanavara

Generalist software engineer

Aerospike

Aerospike is a next-generation real-time NoSQL data solution. Unlike legacy NoSQL databases, Aerospike’s unique patented Hybrid Memory Architecture unlocks the potential of modern hardware, delivering value from vast amounts of data.

Discussion

markdown guide