Forem

Cover image for Record Aggregation in Aerospike For Performance and Economy
Ken Tune for Aerospike

Posted on • Edited on

2 1

Record Aggregation in Aerospike For Performance and Economy

A strong differentiator for Aerospike vs other key value databases is its DRAM economy. Every object has a 64 byte DRAM footprint no matter what the size of the actual object is. You can manage a billion objects using only 128Gb of DRAM, allowing for 2x replication.

Great news! A billion is a pretty big number and 3 * 512GB nodes gets me to 12bn. Within reason I can have as many objects as I like. I should start making them right away with no further thought required.

Hold your horses, cowboy. It might not be as simple as that.

For instance, what if your objects are very small? Worst case, they’re so small that they’re of the order of 64 bytes, so now your memory footprint is similar to your disk footprint. It might even be the case that your memory to disk ratio is such that your DRAM is full when your disk is half empty. In a bare metal situation, buy less disk / more DRAM for sure, but you might be in the cloud where you’re stuck with certain DRAM / storage ratios. Or maybe these machines got brought to you for re-purposing.

A technique informally known as blocking can help you. You store your objects within larger objects. Your API turns this into an implementation detail. Blocking can reduce your memory footprint, helping with the small object use case.

Aerospike lets you do this by offering a comprehensive List and Map API which allows you to place objects within objects as well as retrieving them in an efficient manner. List structures can be used for storing structured data such as time series of a regular frequency while maps can be used to reduce your key space. The API offered is a distinguishing feature of Aerospike when contrasted with other key value databases.

Let’s look again at our example. Suppose your key space is composed of device ids, and these are fundamentally UUIDs — 128 bit numbers or 32 digit hexadecimal numbers. Let’s say you anticipate you may need to store as many as 15bn of these, but each record is only around 200 bytes. Your DRAM requirement with Aerospike would be of the order of

64(bytes) * 15bn * 2(replication) = ~2Tb

Not the end of the world, but you could do better by working smarter.

Assume also that we want to keep our physical object size below 128kb — a good starting point for optimal block size on a flash device, which is recommended for Aerospike. We can put 655 200 byte objects into 128kb.
If each physical object contains 655 actual objects then we require 15bn / 655 = 22.9m container object keys. The question is, how then, given do we map from a device id to the container(physical) object key, and how do we reliably look up a logical object inside thecontainer object. The answer is that we do this using bit-masking.

A 128 bit key space can be converted into a key space of size 2,4 …. 65536… 2²⁰ keys by AND-ing the key with a binary number composed of 1,2 … 16 … 20 etc leading ones followed by trailing zeros. For our example we need a bit mask of size equivalent to the first power of two above our key space size which can be calculated as

ceiling(log(22.9 *10⁶) / log(2)) = 25 bits

This gives us a key space of size 2²⁵ = ~33.5m so we’ve got our maths correct.

Let’s look at how we make use of this in Aerospike

public static final int BIT_MASK_SIZE = 25;
public static final BigInteger BIT_MASK = (new BigInteger("2")).pow(BIT_MASK_SIZE).subtract(new BigInteger("1"));
public static final String METADATA_BIN_NAME = "deviceMetaData";
public static final AerospikeClient aeroClient = new AerospikeClient(hostName,aerospikeServicePort);
/**
* Store device meta data in Aerospike using 'blocking' technique
* @param deviceIDasUUID - device id
* @param deviceMetaData - metadat
*/
public void storeDeviceMetaData(UUID deviceIDasUUID ,Map<String,Object> deviceMetaData){
// Turn UUID into a BigInteger to help with bit masking
BigInteger deviceID = new BigInteger(deviceIDasUUID.toString().replace("-",""),32);
// Do bit masking
BigInteger physicalKey = deviceID.and(BIT_MASK);
// Construct Aerospike key using key built using bit masking
Key aerospikeKey= new Key(deviceDataNamespace,deviceDataSetName,physicalKey.toString());
// Store using Map API - stores device metadata in an Aerospike object using the bit masked key
// but within that object inserts to a map using the device id as a key
aeroClient.operate(new WritePolicy(),aerospikeKey,
MapOperation.put(new MapPolicy(),METADATA_BIN_NAME,Value.get(deviceIDasUUID.toString()), Value.get(deviceMetaData)));
}

This function stores our device metadata inside a physical object. As described, the physical object key is derived using bit masking. Note this is efficient from a network capacity point of view — only the metadata gets sent across the network, not the full physical object.

We also need to see how to retrieve our object

public static final int BIT_MASK_SIZE = 25;
public static final BigInteger BIT_MASK = (new BigInteger("2")).pow(BIT_MASK_SIZE).subtract(new BigInteger("1"));
public static final String METADATA_BIN_NAME = "deviceMetaData";
public static final AerospikeClient aeroClient = new AerospikeClient(hostName,aerospikeServicePort);
/**
* Get device meta data from Aerospike having employed 'blocking' technique
* @param deviceIDasUUID - device id
* @return device meta data
*/
public Map<String,Object> getDeviceMetaData(UUID deviceIDasUUID){
// Turn UUID into a BigInteger to help with bit masking
BigInteger deviceID = new BigInteger(deviceIDasUUID.toString().replace("-",""),32);
// Do bit masking
BigInteger physicalKey = deviceID.and(BIT_MASK);
// Construct Aerospike key using key built using bit masking
Key aerospikeKey= new Key(deviceDataNamespace,deviceDataSetName,physicalKey.toString());
// Retrieve record from map. We use the 'physical' aerospikeKey to identify the object
// and the device id to retrieve the metadata from the map
Record r = aeroClient.operate(null,aerospikeKey,
MapOperation.getByKey(METADATA_BIN_NAME,Value.get(deviceIDasUUID.toString()), MapReturnType.VALUE));
return (Map<String, Object>) r.getMap(METADATA_BIN_NAME);
}

The construction of the physical key is as before. This time we use the getByKey operation to retrieve the device metadata.

An important point to note is that only the metadata requested is transmitted across the network not the entire physical object. This consideration applies in general to the calls offered by the List/Map API. This is what we mean by ‘economy’ in the article title.

Finally, a code snippet showing how to calculate the bit mask using the ‘natural’ inputs.

/**
* Obtain bit mask required to ensure we have on average a specific number of logical objects in each physical object
* @param keySpaceSize - size of key space
* @param objectsPerBlock - average number of objects stored in each physical object
* @return bit mask as a BigInteger
*/
public static BigInteger getBitMask(long keySpaceSize,int objectsPerBlock){
// No of bits required is
// log(keySpaceSize/objectsPerBlock) / log(2)
// as reqd # of keys is keySpaceSize/objectsPerBlock
// round up as bit count has to be integral
int bitMaskSize = (int)Math.ceil((Math.log(keySpaceSize/objectsPerBlock) / Math.log(2)));
// Bit mask - 2^mask size - 1 to get all bits set
return (new BigInteger("2")).pow(bitMaskSize).subtract(new BigInteger("1"));
}

The net benefit of all the above is that the memory footprint will be reduced in this case to 

2²⁵ (keys) * 64 (DRAM cost per record) * 2 (rep factor) = 4Gb

from 

15bn * 64 (DRAM cost per record) * 2 (rep factor) = ~1.8TB

The reduction factor is 447, slightly less than the '655' quoted above as on average, our 128kb blocks will not be completely filled.

Before we close, worth noting our Enterprise only ‘all flash’ capability which allows both index and data to be placed on disk thus reducing DRAM usage to very low levels. This was developed specifically with the use cases of small objects and/or very large numbers of objects (~10¹² = 1 trillion ) in mind. It will engender higher levels of latency (~5ms vs ~1ms at the 95th percentile ) but it’s still competitive vs any other database out there.

The above solution is a good example of a differentiating feature, our List and Map API , providing a distinguishing optimisation under constraints. The technique of ‘blocking’ can also be made use of for time series data which I hope to explore in a future article.

Cover image with thanks to Nana Smirnova

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay