Neel Phadnis for Aerospike

Posted on Aug 16, 2022 • Originally published at developer.aerospike.com

Aerospike Through SQL

#aerospike #sql #api #crud

Source: Photo by Alex wong on Unsplash Unsplash

SQL is broadly used as a data access language for analytics. Even if you are an application developer, chances are you have used it or at least are familiar with it.

Aerospike has broad support for SQL, enabling you to use SQL to access Aerospike data in multiple ways.

Trino

For analytics, you can access Aerospike data on Trino with the Aerospike Trino Connector.

Through Trino, analytics use cases such as ad-hoc SQL queries, reports, and dashboard have access to data in one or more Aerospike clusters, and they can also merge Aerospike data with data from other sources.

For more details of the Trino Connector, see the blog posts Deploy Aerospike and Trino based analytics platform using Docker and Aerospike Trino Connector - Chapter Two.

Starburst is a SQL-based MPP query engine based on Trino that enables you to run Trino on a single machine, a cluster of machines, on-prem or in the cloud. The blog post Analyze Data with Aerospike and Starburst Anywhere describes how to use Starburst Enterprise.

The data browser described in the blog post Aerospike Data Browser uses Trino with the Trino Connector underneath.

Spark

You can use Spark SQL to manipulate Aerospike data on the Spark platform. Aerospike Spark Connector provides parallel access to the Aerospike cluster from Spark.

Spark SQL merges two abstractions: Replicated Distributed Datasets (RDDs) and relational tables. Find examples of importing and storing Aerospike data to and from RDDs in these Aerospike Spark tutorials. You can use Spark SQL to manipulate and process data in RDDs.

More details on the Spark Connector are available in the blog posts Using Aerospike Connect for Spark and Accelerate Spark queries with Predicate Pushdown using Aerospike.

JDBC

Application developers can use simple SQL with JDBC with the community-contributed JDBC Connector.
Please read more details in the blog post Introducing Aerospike JDBC Driver.

. . . .

Aerospike API

While the various connectors allow broad SQL access for multiple purposes, the connectors may not be suitable for general applications as they do not provide the full Aerospike API functionality that a general application needs. For example, update capabilities are limited through SQL.

We recommend that you use the Aerospike API to access its full functionality and performance. Aerospike, a NoSQL database, does not directly support all SQL features. Inversely, Aerospike has many capabilities that cannot be expressed in SQL. This is to be expected because SQL is designed to provide physical data independence, which means the user need not worry about the physical details of the data such as the data distribution, size, selectivity, indexes, and so forth. The query optimizer deals with these details and selects the best execution plan. The goal of the Aerospike API is to provide full control to developers for optimal performance of their applications.

. . . .

This article describes how a developer who is familiar with SQL can quickly implement specific SQL CRUD operations using the Aerospike API. The goal is not to discuss the many mechanisms to control optimal performance (although it points to some of them), but to provide a ramp for a developer who has some knowledge of SQL to map the basic CRUD queries into the Aerospike API. We encourage you to learn about the performance features using the pointers provided.

While Aerospike supports many languages, we have used the Java client API in our examples as it is most widely used. The functionality is similar across all client libraries, and you can find equivalent functions in each.

. . . .

Mapping SQL to Aerospike

While there is no direct mapping of full SQL to Aerospike API, simple CRUD functionality can be easily mapped to Aerospike API as the underlying data models are similar: Aerospike’s set-record-bin organization matches the SQL’s table-record-column organization (see below).

We point out differences and unsupported constructs below. They need to be handled through alternative means such as specific features, libraries, and application code.

Similarities

Aerospike has a record-based data model. An Aerospike Database holds multiple namespaces, which are equivalent to databases in the relational model. A namespace holds records (rows), organized in sets (tables) that are accessed using a unique key that serves as the record ID. A record can contain one or more bins (columns), and a bin can hold a value of different data types. Sets and records do not conform to any schema. The primary index provides fast access to a record by key, which is a unique record identifier, while secondary indexes defined on a bin are supported for content based access.

SQL concept	Aerospike equivalent
Database or schema	Namespace
Table	Set
Record	Record
Column	Bin
Index	Primary and Secondary indexes
Stored Procedures	User Defined Functions (UDFs)

Differences

Aerospike is a NoSQL database, and its API has many differences from SQL databases. Following are some key differences:

Set: A set is a tag on the record that gets created when the first record is created in the set. A set is schemaless and can hold records holding different bins.
Record: A record is schemaless, and can hold any combination of bins.
Bin: A bin is typeless, and can hold a value of any type.
Index: Integrity constraints, such as uniqueness, cannot be specified on an index.
Transactions: All single record requests are transactional. The transaction boundary does not span multiple records. For a detailed discussion, see the blog post Developers: Understanding Aerospike Transactions.

Constructs Not Directly Supported

Due to the differences in its data and execution models, Aerospike API does not directly support the following SQL constructs, however, they can be implemented using data modeling, alternative features, and application code. We will discuss them later.

Join
Aggregations (max, min, top, average, sum, etc.)
Order By, Distinct, Union
Limit
Constraints: NULL, Foreign Key, Default
Built-in functions
View

. . . .

Overview

For the purpose of our discussion, SQL queries can be organized in these categories:

SELECT or read operations,
CREATE, UPDATE, DELETE or write operations,
Metadata operations, and
Other functionality.

Use these interactive tutorials to work along with this text.

A Word on Key, Metadata, Policy, and API Variants

Before we dive in, it is useful to know record key, record metadata, operation policy, and API variants.

Record Key

Each record is uniquely identified by a key or id, consisting of a triple: (namespace, set, user-key) where user-key is a user-specified id that is unique within the set. The key (also called the digest) is returned in all read APIs.

Record Metadata

Each record has metadata associated with it: generation (or version) and expiration time (or time-to-live in seconds). This metadata is returned in all read operations. It is possible to retrieve only the metadata without the record's bins through the "getHeader" operation explained below.

Policy

Aerospike API calls take a policy parameter which includes many details of the how and what of the request. For example, timeout, retries, filter expression, and additional write semantics are specified in the policy object. We will specify significant policy info that is relevant to the operation semantics in each operation below.

API Variants

Aerospike API is designed for control and simplicity. As such, a read and write operation that has one form in SQL has multiple variations in Aerospike API:

By number of records involved: Single record, batch, and query
By the processing mode: Sync, async, and background

In the following examples, only the synchronous APIs are shown when available, but you can easily discover the asynchronous variants in the documentation.

SQL SELECT and Equivalent Read Operations

Single-Record Read Operations

Get

SQL Query	Equivalent Aerospike API (Java)
`SELECT * FROM namespace.set WHERE id = key`	`Record Client::get(Policy policy, Key key)`
`SELECT bins FROM namespace.set WHERE id = key`	`Record Client::get(Policy policy, Key key, String... binNames)`

Existence

There is a variant of single record retrieval to check a record's existence.

SQL	Aerospike
`SELECT EXISTS(SELECT * FROM namespace.set WHERE id = key)`	`boolean Client::exists(Policy policy, Key key)`

Metadata

It is possible to only obtain a record's header info or metadata, consisting of generation (or version) and expiration (time-to-live in seconds).

SQL	Aerospike
`SELECT generation, expiration FROM namespace.set WHERE id = key`	`Record Client::getHeader(Policy policy, Key key)`

Batch Read Operations

A batch request operates on a list of records identified by the keys provided. It works similar to a single record retrieval, except multiple records are returned.

Batch requests are critical for high performance applications as they eliminate multiple client-server round trips, one for each record.

Read

SQL	Aerospike
`SELECT * FROM namespace.set WHERE id IN key-list`	`Record[] Client::get(BatchPolicy policy, Key[] keys)`
`SELECT bins FROM namespace.set WHERE id in key-list`	`Record[] Client::get(BatchPolicy policy, Key[] keys, String... binNames)`

Existence

There is a variant of batch retrieval to check record existence.

SQL	Aerospike
`SELECT id, EXISTS(SELECT * FROM namespace.set WHERE id = key) WHERE key IN key-list`	`boolean[] Client::exists(Policy policy, Key[] keys)`

Metadata

It is possible to obtain header info or metadata consisting of generation (or version) and expiration time (time-to-live in seconds) for a specified set of records.

SQL	Aerospike
`SELECT generation, expiration FROM namespace.set WHERE id IN key-list`	`Record[] Client::getHeader(Policy policy, Key[] keys)`

Composite Batch Read

A more general form of batch reads is also available that provides a union of simple batch results with different namespace, set, and bin specification. The records argument takes the input record keys and populates record details on return.

SQL	Aerospike
`(SELECT bins1 FROM namespace1.set1 WHERE id IN key-list1)` `UNION` `(SELECT bins2 FROM namespace2.set2 WHERE id IN key-list2)` `UNION ...`	`void Client::get(BatchPolicy policy, List<BatchRead> records)`

Predicate-Based Read Operations

In predicate-based read operations (aka queries), records matching a general predicate or condition are retrieved. In SQL, the predicate is specified in the WHERE clause.
Aerospike provides two ways of performing an SQL query:

Using a secondary index based predicate, which can optionally be ANDed with an expression filter
Using a scan (which uses the primary “key” index), which can optionally be ANDed with an expression filter

Secondary Index Query

While a query in SQL doesn’t require an index to exist, the query API in Aerospike requires that the corresponding secondary index exists.

The namespace, set, and secondary index based predicate is specified in the statement argument. The expression filter is optionally specified in the policy argument for additional conditions to be ANDed.

SQL	Aerospike
`SELECT bins FROM namespace.set WHERE condition`	`Record[] Client::query(QueryPolicy policy, Statement statement)`

Scan

The scan operation takes a callback object which is called for every record in the result (within the scope of the call which remains blocked until the operation completes).
The expression filter is optionally specified in the policy argument.

SQL	Aerospike
`SELECT bins FROM namespace.set WHERE condition`	`void Client::scanAll(ScanPolicy policy, String namespace, String setName, ScanCallback callback, String... binNames)`

SQL CREATE, UPDATE, DELETE and Equivalent Write Operations

Aerospike combines Create and Update in a single write operation. The following record-exists-action options specified in the write-policy define the operation semantics if the record already exists:

create-only: Create if record doesn't exist, fail otherwise.
update: Create if record doesn't exist, update otherwise.
update-only: Update if record exists, fail otherwise.
replace: Create if record doesn't exist, replace otherwise.
replace-only: Replace if record exists, fail otherwise.

SQL INSERT maps to create-only and SQL UPDATE maps to update-only options. SQL does not have a way to specify other options, such as replace, which removes an existing record.

Single-Record Write Operations

INSERT and UPDATE

The put operation handles Create (Insert) and Update.

SQL	Aerospike
`INSERT INTO namespace.set VALUES (id=key, bin=value, ...)` + `UPDATE namespace.set SET (bin=value, ...) WHERE id=key`	`void Client::put(WritePolicy policy, Key key, Bin... bins)`

Type-Specific Write Operations

Aerospike allows type-specific update operations. For integer and string types, they include the following. The bins argument holds multiple bin objects, each with the bin name and the operand value.

SQL	Aerospike
`UPDATE namespace.set SET (bin = bin + intval) WHERE id=key`	`void Client::add(WritePolicy policy, Key key, Bin... bins)`
`UPDATE namespace.set SET (bin = bin + strval) WHERE id=key`	`void Client::append(WritePolicy policy, Key key, Bin... bins)`
`UPDATE namespace.set SET (bin = strval + bin) WHERE id=key`	`void Client::prepend( WritePolicy policy, Key key, Bin... bins)`

Other type specific operations including on Collection Data Types (CDTs), are described in the documentation and tutorials.

DELETE

SQL	Aerospike
`DELETE FROM namespace.set WHERE id=key`	`void Client::delete(WritePolicy policy, Key key)`

Batch Write Operations

A batch write operates on multiple records specifically identified with a list of keys. There is a batch API for insertion, update, and deletion of multiple records.
Two forms of batch writes are shown below. Other forms including one with a UDF (described below) and key-specific operations are described in the blog post Batch Operations.

The argument ops is a list of operations to be performed in the specified sequence on each record, and can include read as well as write operations. The argument batchPolicy contains the specifics of how the batch is processed, whereas the arguments writePolicy and deletePolicy have the specifics of how the respective individual record operation is performed.

SQL	Aerospike
`UPDATE namespace.set SET (bin1=fn_1(bin_1), ...) WHERE id in key-list` + `SELECT fn_n(bin_n), ,,, FROM namespace.set WHERE id in key-list`	`BatchResults operate(BatchPolicy batchPolicy, BatchWritePolicy writePolicy Key[] keys, Operation... ops)`
`DELETE FROM namespace.set WHERE id in key-list`	`BatchResults delete BatchPolicy batchPolicy, BatchDeletePolicy deletePolicy, Key[] keys)`

Predicate-Based Write Operations

Predicate-based updates and deletes are possible by specifying the WHERE condition using the secondary index predicate (specified in a statement object) and expression filter (specified in the write policy) as explained earlier.

Predicate-based updates and deletes can involve a large number of records, and therefore are processed in background execution mode with the execute API. Sync and callback async modes are not available. Two forms of execute are possible:

Using a list of bin updates and deletes: A multi-op request provides a list of bin operations. Multi-op requests are further described below.

Since execute performs in a background mode with no returned results, the operation list in the statement object cannot have a read operation, only updates.

SQL	Aerospike
`UPDATE namespace.set SET (bin=value, ...) WHERE condition` + `DELETE FROM namespace.set WHERE condition`	`Client::execute(WritePolicy policy, Statement statement)`

User Defined Functions (UDFs): UDFs are equivalent to stored procedures, and are described further below. Record-oriented UDFs implement arbitrary logic in a Lua function that is registered with the server and invoked through an API call.

SQL	Aerospike
`UPDATE namespace.set SET (bin1=fn1(args), ...) WHERE condition` + `DELETE FROM namespace.set WHERE condition`	`ExecuteTask Client::execute(WritePolicy policy, Statement statement, String packageName, String functionName, Value... functionArgs)`

SQL Stored Procedures and Aerospike User Defined Functions (UDFs)

User Defined Functions (UDFs) are equivalent to stored procedures in SQL systems. A custom User Defined Function (UDF) is written in Lua, registered on the server, and invoked for a specified record(s). You can find further details in the documentation on User Defined Functions (UDFs).

In the following example, the UDF is specified using the arguments packageName and functionName, and supplied a list of arguments it expects in functionArgs. The API returns a generic Object which can be anything like a single value or a map of key-value pairs.

SQL	Aerospike
`EXEC StoredProcedure @arg1 = val1, @arg2 = val2, …`	`Object Client::execute(WritePolicy policy, Key key, String packageName, String functionName, Value... functionArgs)`

A UDF can have arbitrary logic combining CRUD operations.

In Aerospike, aggregation functions such as MIN, MAX, AVERAGE, SUM, etc, over multiple records are implemented with Stream UDFs. This article does not cover the specifics of Stream UDFs; please refer to the tutorials on SQL: Aggregates.

Multi-Op Requests

Multiple single bin read and write operations are possible through the operate API. It differs from the dedicated “single-op” requests, which allow just one operation. The operations in the argument operations are executed atomically and in the order specified,

Unlike in SQL, read and write operations can be combined in the same request (for single-record and batch requests) as illustrated below.

SQL	Aerospike
`SELECT fn1(bin1), …FROM namespace.set WHERE id=key` + `UPDATE namespace.set SET (bin1=fn_n(bin_n), ...) WHERE id=key`	`Record Client::operate( WritePolicy policy, Key key, Operation... operations)`

Multi-op operate APIs are available for a single record, batch, and query operations. See this tutorial that illustrates multi-ops.

Metadata Operations

Namespace Operations

CREATE Namespace

There is no API to create a namespace. A namespace is added through the config and requires a server restart.

TRUNCATE Namespace

The truncate API removes all records in a set or the entire namespace.

SQL	Aerospike
`TRUNCATE namespace`	`void Client::truncate(policy, namespace, set=null, beforeLastUpdate=null)`

DELETE Namespace

There is no API to delete a namespace. A namespace has one or more dedicated storage devices, and they must be wiped clean to delete the namespace.

Set Operations

CREATE Set

There is no explicit operation to create a set. A set is created when the first record is inserted in the set.

ALTER Set

A set is schemaless, and can hold records that have different schemas or bins. A bin has no type associated with it, and can hold values of any type. Therefore ALTER operation on a set to modify its schema is not needed.

TRUNCATE Set

All records in a set can be truncated using the truncate API:

SQL	Aerospike
`TRUNCATE namespace.set`	`void Client::truncate(policy, namespace, set, beforeLastUpdate=null)`

DROP Set

There is no notion of deleting a set as a set is just a name that a record is tagged with. The namespace must be deleted to remove the set name.

Index Operations

CREATE Index

An index is created on a bin for a specific value type. Integer, string, and GeoJSON types are currently supported for indexing.

SQL	Aerospike
`CREATE Index`	`createIndex(Policy policy, String namespace, String setName, String indexName, String binName,IndexType indexType)`

DROP Index

SQL	Aerospike
`DROP Index`	`dropIndex(Policy policy, String namespace, String setName, String indexName)`

UDF Operations

CREATE UDF

The arguments clientPath and serverPath below define the path to UDFfile on the client and server respectively.

SQL	Aerospike
`CREATE StoredProcedure`	`Client::register(Policy policy, String clientPath, String serverPath, Language.LUA)`

Other SQL Capabilities

Join

Most NoSQL databases do not have the Join operation as it is slow and complex. You can avoid Joins by storing the joined objects in aggregate form. Alternatively, the join can be performed in the application by retrieving the referenced object.

Limit

The policy parameter max-records can be specified as a hint. Fewer objects may be returned as the limit gets divided among participating nodes.

Order By, Top, Union, Distinct

List and Expressions can be used to implement these operations. Alternatively, they can be performed in the application.

Aggregations

Aggregations involving Group-By, Having, and Aggregate Functions (such as Max, Min, Top, Average, Sum) can be implemented using Stream UDFs as shown in the tutorials SQL Aggregates - Part 1 and Part 2.

Constraints

Integrity constraints such as NULL, Foreign Key, Default should be handled in the application logic. The uniqueness constraint can be enforced in a List or Map.

Built-In Functions

Many built-in functions like UPPER, TRIM, can be implemented with Expressions or UDFs.

Going Beyond SQL with Aerospike

In order to get the most out of Aerospike for speed-at-scale, thinking beyond SQL is necessary.
The process starts with modeling your data for performance, scale, and other needs of the application. Please review the series Data Modeling for Speed At Scale.

Learn about and use the various performance features that the Aerospike API provides through the documentation and tutorials. Examples of such features include:

Collection Data Types (CDTs)
Multi-op requests
Batch requests
Expressions
Secondary indexes
Set indexes
Complex Data Types - Binary, HLL, GeoJSON
User Defined Functions (UDFs)

Summary

You can use SQL to access Aerospike data through the Trino, Spark, and JDBC Connectors. While the connectors work quite well for the environment and intent they are built for, they do not provide the full Aerospike API functionality that the application may need. Therefore, use of the Aerospike API is recommended for full functionality and performance. The Aerospike API is designed with the goal of enabling developers of high performance applications who need to control performance specific details to make better decisions.

The article describes how a developer who is familiar with SQL can quickly implement specific SQL CRUD operations using the Aerospike API. Coming from a SQL background, it is important to remember that through NoSQL data modeling one should be able to avoid certain SQL features entirely such as the Join to maximize the benefit of using Aerospike for performance and scale. With the introduction provided in this article, you should be able to take the next step to learn the mechanisms in Aerospike API to optimize your application’s performance and scale.

Trino

Spark

JDBC

Aerospike API

Mapping SQL to Aerospike

Similarities

Differences

Constructs Not Directly Supported

Overview

A Word on Key, Metadata, Policy, and API Variants

Record Key

Record Metadata

Policy

API Variants

SQL SELECT and Equivalent Read Operations

Single-Record Read Operations

Get

Existence​

Metadata​

Batch Read Operations

Read

Existence​

Metadata​

Composite Batch Read

Predicate-Based Read Operations

Secondary Index Query​

Scan

SQL CREATE, UPDATE, DELETE and Equivalent Write Operations

Single-Record Write Operations

INSERT and UPDATE

Type-Specific Write Operations

DELETE

Batch Write Operations

Predicate-Based Write Operations

SQL Stored Procedures and Aerospike User Defined Functions (UDFs)

Multi-Op Requests

Metadata Operations

Namespace Operations

CREATE Namespace​

TRUNCATE Namespace​

DELETE Namespace​

Set Operations

CREATE Set​

ALTER Set​

TRUNCATE Set​

DROP Set​

Index Operations

CREATE Index​

DROP Index​

UDF Operations

CREATE UDF

Other SQL Capabilities

Join

Limit

Order By, Top, Union, Distinct

Aggregations

Constraints

Built-In Functions

Going Beyond SQL with Aerospike

Summary

Related Links:

Existence

Metadata

Existence

Metadata

Secondary Index Query

CREATE Namespace

TRUNCATE Namespace

DELETE Namespace

CREATE Set

ALTER Set

TRUNCATE Set

DROP Set

CREATE Index

DROP Index