kouwei qing

Posted on Jun 30

Introduction to HarmonyOS Next Data Base Vector Database

#harmonyosnext

Introduction to HarmonyOS Next Data Base Vector Database

Background

At this year's HDC, I met the architect of the HarmonyOS data base in the exhibition hall, who introduced an intelligent assistant based on the data base to achieve edge-side capabilities. It was very appealing to hear that HarmonyOS has integrated edge-side AI into the system layer, creating unlimited possibilities for developers. After returning, I quickly checked the updated documents and found that HarmonyOS provides two language interfaces: ArkTS and C++. This article will deeply explore the technical characteristics, core concepts, operation interfaces, and advanced functions of the vector database to help developers comprehensively master this emerging technology.

Overview of Vector Database

A vector database is a database system that supports the storage, management, and retrieval of vector data, while being compatible with traditional relational data processing capabilities. Its core data type floatvector is used to store vectorized results, enabling the system to efficiently implement similarity search and fast retrieval functions.

Starting from API version 18, the vector database officially supports data persistence through standardized interfaces, providing developers with a reliable data storage solution.

Basic Concepts and Architecture

ResultSet Mechanism

The result set returned by a query operation is called a ResultSet, which provides a flexible way to access data, allowing developers to easily obtain the required information. The result set adopts a lazy loading strategy, loading data from the storage layer only when it is actually accessed, effectively reducing memory consumption.

Vector Data Representation

floatvector is the core data type of the vector database, used to represent high-dimensional vector data. For example, a numerical array like [1.0, 3.0, 2.4, 5.1, 6.2, 11.7] is a typical vector representation, widely used in fields such as image recognition and natural language processing.

System Constraints and Limitations

The vector database considers the balance between performance and resources in its design, with the following key constraints:

Logging Mode: The default is WAL (Write Ahead Log) mode to ensure the atomicity and durability of data writing.
Disk Persistence Strategy: The FULL mode ensures that data is completely written to the storage medium.
Connection Management: The system maintains 4 read connections and 1 write connection by default, using connection pool technology to optimize resource usage.
Write Concurrency: Only one write operation is supported at a time, and concurrent write requests are automatically serialized.
Data Size Limit: It is recommended that a single piece of data does not exceed 2MB, as exceeding this may cause reading failures.

Data Cleaning Mechanism

When an application is uninstalled, the relevant database files and temporary files on the device are automatically cleared. This design simplifies application lifecycle management and avoids residual file issues.

Data Types and Constraints

Supported Data Types

The vector database supports a rich set of field types to meet diverse data storage needs:

Type	Description	Supported
NULL	Null value	Yes
INTEGER	Integer type	Yes
DOUBLE	Floating-point type	Yes
TEXT	String type	Yes
BLOB	Binary type	Yes
FLOATVECTOR	Vector data type	Yes

Field Constraint Mechanism

To ensure data integrity, the system provides various field constraints:

NOT NULL: Ensures that the field is not empty.
DEFAULT: Sets the default value for the field.
UNIQUE: Ensures the uniqueness of the field value.
PRIMARY KEY: Defines the primary key index.

Note: The system currently does not support foreign key constraints (FOREIGN) and CHECK constraints.

Query Language Features

Supported Query Clauses

The vector database supports rich SQL query clauses:

WHERE: Conditional filtering.
LIMIT: Result quantity limitation.
ORDER BY: Multi-column sorting, especially supporting vector distance sorting.
GROUP BY: Data grouping.
HAVING: Aggregate result filtering.
INDEXED BY: Forces the use of a specific index.
DISTINCT: Deduplication (not yet supported).

It is particularly noteworthy that the vector distance sorting function supports two distance metrics:

<->: L2 Euclidean distance.
<=>: Cosine similarity.

Set Operations

Supports standard set operations:

UNION: Merges results and removes duplicates.
UNION ALL: Merges results and retains duplicate items.

Operator System

The system provides comprehensive operator support:

Arithmetic Operations: +, -, *, /, %.
Comparison Operations: ==, =, !=, >, >=, <, <=.
Logical Operations: AND, BETWEEN, EXISTS, etc. (12 types in total).
String Concatenation: ||.
Bitwise Operations: &, |, ~, <<, >>.
Vector Distance Operations: <->, <= (support use in aggregate functions).

Time and Date Functions

Built-in multiple time processing functions:

Function	Description	Format
DATE	Date	"YYYY-MM-DD"
TIME	Time	"HH:MM:SS"
DATETIME	Date and time	"YYYY-MM-DD HH:MM:SS"
JULIANDAY	Julian day	Number of days
STRFTIME	Format date	Custom format

Aggregation and Analysis Functions

The system provides rich analysis functions:

COUNT: Row count statistics.
MAX/MIN: Extreme value calculation.
AVG: Average value.
SUM: Total sum.
RANDOM: Random number generation.
ABS: Absolute value.
UPPER/LOWER: String case conversion.
LENGTH: String length.

These functions can be combined with vector operations to implement complex data analysis requirements.

Development Interfaces and Practices

Environment Detection and Initialization

The first step in development is to detect whether the system supports the vector database:

import { relationalStore } from '@kit.ArkData';
import { UIAbility } from '@kit.AbilityKit';

class EntryAbility extends UIAbility {
  async onWindowStageCreate(windowStage: window.WindowStage) {
    let ret = relationalStore.isVectorSupported();
    if (!ret) {
      console.error(`vectorDB is not supported.`);
      return;
    }
    // Initialize the database
  }
}

Database Creation and Configuration

Create a database instance through the getRdbStore interface:

const STORE_CONFIG: relationalStore.StoreConfig = {
  name: 'VectorTest.db',
  securityLevel: relationalStore.SecurityLevel.S1,
  vector: true // Enable vector support
};

relationalStore.getRdbStore(this.context, STORE_CONFIG)
  .then(async (rdbStore) => {
    // Table creation operation
    const SQL_CREATE_TABLE = 'CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, repr floatvector(2));';
    await rdbStore.execute(SQL_CREATE_TABLE, 0, undefined);
  })
  .catch((err) => {
    console.error(`Get RdbStore failed: ${err.code}, ${err.message}`);
  });

Data Operation Practices

Insert Data

Supports two methods: parameter binding and non-binding:

// Parameter binding method
const vectorValue: Float32Array = Float32Array.from([1.2, 2.3]);
await store.execute("insert into test VALUES(?, ?);", 0, [0, vectorValue]);

// Non-binding method
await store.execute("insert into test VALUES(1, '[1.3, 2.4]');", 0, undefined);

Update and Delete

// Vector update
const vectorValue1: Float32Array = Float32Array.from([2.1, 3.2]);
await store.execute("update test set repr = ? where id = ?", 0, [vectorValue1, 0]);

// Data deletion
await store.execute("delete from test where id = ?", 0, [0]);

Query Operations

Basic Query

// Parameterized query
const vectorValue2: Float32Array = Float32Array.from([6.2, 7.3]);
let resultSet = await store.querySql(
  "select id, repr <-> ? as distance from test where id > ? order by repr <-> ? limit 5;",
  [vectorValue2, 0, vectorValue2]
);

while (resultSet.goToNextRow()) {
  let id = resultSet.getValue(0);
  let dis = resultSet.getValue(1);
}
resultSet.close();

Subquery and Aggregation

// Subquery example
let resultSet = await store.querySql(
  "select * from test where id in (select id from test1);"
);
resultSet.close();

// Aggregation query
resultSet = await store.querySql(
  "select * from test where repr <-> '[1.0, 1.0]' > 0 group by id having max(repr <=> '[1.0, 1.0]');"
);
resultSet.close();

Advanced Features

Vector Index Optimization

Vector indexing is a key technology to improve query performance. The system supports the following index types:

Index Type	Description	Applicable Scenarios
gsdiskann	High-dimensional dense vector index	Text embeddings, image features, etc.

Index Creation Syntax

Basic syntax:

CREATE INDEX [IF NOT EXISTS] index_name ON table_name USING index_type (column_name dist_function);

Extended syntax (with parameters):

CREATE INDEX [basic syntax] WITH(parameter = value [, ...]);

Parameter configuration:

QUEUE_SIZE: [10, 1000], default 20.
OUT_DEGREE: [1, 1200], default 60.

Index Management Example

// Create an L2 distance index
await store.execute(
  "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2);"
);

// Create an index with parameters
await store.execute(
  "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2) WITH (queue_size=20, out_degree=50);"
);

// Delete an index
await store.execute("DROP INDEX test.diskann_l2_idx;");

Index Hit Conditions

To ensure that queries can utilize vector indexes, the following conditions must be met:

The query must be of the ORDER BY + LIMIT type.
ORDER BY can only have one vector distance sorting condition.
DESC descending order cannot be used.
The query distance metric must be consistent with that when the index was created.

Disk Fragment Management

Starting from API version 20, manual fragment recovery is supported:

// Manually trigger fragment recovery
await store.execute("PRAGMA DISKANN_ASYNC_COLLECTING;");

This function solves problems in the following scenarios:

Closing the database immediately after deleting vectors.
No subsequent operations after batch deletions.

Data Management Strategies

Data Aging Configuration

Implement automated data cleaning through table creation parameters:

Parameter	Required	Description
time_col	Yes	Time column name (integer type)
interval	No	Aging check interval (default 1 day)
ttl	No	Data retention time (default 3 months)
max_num	No	Maximum data volume limit (default 1024)

Example configuration:

await store.execute(
  "CREATE TABLE test2(rec_time integer not null) WITH (time_col = 'rec_time', interval = '5 minute');"
);

Data Compression Function

Supports compressed storage of TEXT type columns:

await store.execute(
  "CREATE TABLE IF NOT EXISTS test3 (time integer not null, content text) with (time_col = 'time', interval = '5 minute', compress_col = 'content');"
);

Summary and Outlook

As an emerging data management technology, vector databases are reshaping the data infrastructure of the AI era. This article has provided a detailed introduction to the core features, development interfaces, and advanced functions of vector databases, offering a comprehensive technical guide for developers.

With the popularity of large models and generative AI, the importance of vector databases will further increase. In the future, we can expect to see smarter indexing algorithms, more efficient query optimization, and tighter integration with AI frameworks, making vector databases the core component of AI-native applications.

Developers should pay close attention to the development of vector database technology, master its core principles and usage skills, and thus gain an advantage in AI-driven application development.

DEV Community

Introduction to HarmonyOS Next Data Base Vector Database

Introduction to HarmonyOS Next Data Base Vector Database

Background

Overview of Vector Database

Basic Concepts and Architecture

ResultSet Mechanism

Vector Data Representation

System Constraints and Limitations

Data Cleaning Mechanism

Data Types and Constraints

Supported Data Types

Field Constraint Mechanism

Query Language Features

Supported Query Clauses

Set Operations

Operator System

Time and Date Functions

Aggregation and Analysis Functions

Development Interfaces and Practices

Environment Detection and Initialization

Database Creation and Configuration

Data Operation Practices

Insert Data

Update and Delete

Query Operations

Basic Query

Subquery and Aggregation

Advanced Features

Vector Index Optimization

Index Creation Syntax

Index Management Example

Index Hit Conditions

Disk Fragment Management

Data Management Strategies

Data Aging Configuration

Data Compression Function

Summary and Outlook

Top comments (0)