DEV Community

Cover image for Introduction to HarmonyOS Next Data Base Vector Database
kouwei qing
kouwei qing

Posted on

Introduction to HarmonyOS Next Data Base Vector Database

Introduction to HarmonyOS Next Data Base Vector Database

Background

At this year's HDC, I met the architect of the HarmonyOS data base in the exhibition hall, who introduced an intelligent assistant based on the data base to achieve edge-side capabilities. It was very appealing to hear that HarmonyOS has integrated edge-side AI into the system layer, creating unlimited possibilities for developers. After returning, I quickly checked the updated documents and found that HarmonyOS provides two language interfaces: ArkTS and C++. This article will deeply explore the technical characteristics, core concepts, operation interfaces, and advanced functions of the vector database to help developers comprehensively master this emerging technology.

Overview of Vector Database

A vector database is a database system that supports the storage, management, and retrieval of vector data, while being compatible with traditional relational data processing capabilities. Its core data type floatvector is used to store vectorized results, enabling the system to efficiently implement similarity search and fast retrieval functions.

Starting from API version 18, the vector database officially supports data persistence through standardized interfaces, providing developers with a reliable data storage solution.

Basic Concepts and Architecture

ResultSet Mechanism

The result set returned by a query operation is called a ResultSet, which provides a flexible way to access data, allowing developers to easily obtain the required information. The result set adopts a lazy loading strategy, loading data from the storage layer only when it is actually accessed, effectively reducing memory consumption.

Vector Data Representation

floatvector is the core data type of the vector database, used to represent high-dimensional vector data. For example, a numerical array like [1.0, 3.0, 2.4, 5.1, 6.2, 11.7] is a typical vector representation, widely used in fields such as image recognition and natural language processing.

System Constraints and Limitations

The vector database considers the balance between performance and resources in its design, with the following key constraints:

  • Logging Mode: The default is WAL (Write Ahead Log) mode to ensure the atomicity and durability of data writing.
  • Disk Persistence Strategy: The FULL mode ensures that data is completely written to the storage medium.
  • Connection Management: The system maintains 4 read connections and 1 write connection by default, using connection pool technology to optimize resource usage.
  • Write Concurrency: Only one write operation is supported at a time, and concurrent write requests are automatically serialized.
  • Data Size Limit: It is recommended that a single piece of data does not exceed 2MB, as exceeding this may cause reading failures.

Data Cleaning Mechanism

When an application is uninstalled, the relevant database files and temporary files on the device are automatically cleared. This design simplifies application lifecycle management and avoids residual file issues.

Data Types and Constraints

Supported Data Types

The vector database supports a rich set of field types to meet diverse data storage needs:

Type Description Supported
NULL Null value Yes
INTEGER Integer type Yes
DOUBLE Floating-point type Yes
TEXT String type Yes
BLOB Binary type Yes
FLOATVECTOR Vector data type Yes

Field Constraint Mechanism

To ensure data integrity, the system provides various field constraints:

  • NOT NULL: Ensures that the field is not empty.
  • DEFAULT: Sets the default value for the field.
  • UNIQUE: Ensures the uniqueness of the field value.
  • PRIMARY KEY: Defines the primary key index.

Note: The system currently does not support foreign key constraints (FOREIGN) and CHECK constraints.

Query Language Features

Supported Query Clauses

The vector database supports rich SQL query clauses:

  • WHERE: Conditional filtering.
  • LIMIT: Result quantity limitation.
  • ORDER BY: Multi-column sorting, especially supporting vector distance sorting.
  • GROUP BY: Data grouping.
  • HAVING: Aggregate result filtering.
  • INDEXED BY: Forces the use of a specific index.
  • DISTINCT: Deduplication (not yet supported).

It is particularly noteworthy that the vector distance sorting function supports two distance metrics:

  • <->: L2 Euclidean distance.
  • <=>: Cosine similarity.

Set Operations

Supports standard set operations:

  • UNION: Merges results and removes duplicates.
  • UNION ALL: Merges results and retains duplicate items.

Operator System

The system provides comprehensive operator support:

  • Arithmetic Operations: +, -, *, /, %.
  • Comparison Operations: ==, =, !=, >, >=, <, <=.
  • Logical Operations: AND, BETWEEN, EXISTS, etc. (12 types in total).
  • String Concatenation: ||.
  • Bitwise Operations: &, |, ~, <<, >>.
  • Vector Distance Operations: <->, <= (support use in aggregate functions).

Time and Date Functions

Built-in multiple time processing functions:

Function Description Format
DATE Date "YYYY-MM-DD"
TIME Time "HH:MM:SS"
DATETIME Date and time "YYYY-MM-DD HH:MM:SS"
JULIANDAY Julian day Number of days
STRFTIME Format date Custom format

Aggregation and Analysis Functions

The system provides rich analysis functions:

  • COUNT: Row count statistics.
  • MAX/MIN: Extreme value calculation.
  • AVG: Average value.
  • SUM: Total sum.
  • RANDOM: Random number generation.
  • ABS: Absolute value.
  • UPPER/LOWER: String case conversion.
  • LENGTH: String length.

These functions can be combined with vector operations to implement complex data analysis requirements.

Development Interfaces and Practices

Environment Detection and Initialization

The first step in development is to detect whether the system supports the vector database:

import { relationalStore } from '@kit.ArkData';
import { UIAbility } from '@kit.AbilityKit';

class EntryAbility extends UIAbility {
  async onWindowStageCreate(windowStage: window.WindowStage) {
    let ret = relationalStore.isVectorSupported();
    if (!ret) {
      console.error(`vectorDB is not supported.`);
      return;
    }
    // Initialize the database
  }
}
Enter fullscreen mode Exit fullscreen mode

Database Creation and Configuration

Create a database instance through the getRdbStore interface:

const STORE_CONFIG: relationalStore.StoreConfig = {
  name: 'VectorTest.db',
  securityLevel: relationalStore.SecurityLevel.S1,
  vector: true // Enable vector support
};

relationalStore.getRdbStore(this.context, STORE_CONFIG)
  .then(async (rdbStore) => {
    // Table creation operation
    const SQL_CREATE_TABLE = 'CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, repr floatvector(2));';
    await rdbStore.execute(SQL_CREATE_TABLE, 0, undefined);
  })
  .catch((err) => {
    console.error(`Get RdbStore failed: ${err.code}, ${err.message}`);
  });
Enter fullscreen mode Exit fullscreen mode

Data Operation Practices

Insert Data

Supports two methods: parameter binding and non-binding:

// Parameter binding method
const vectorValue: Float32Array = Float32Array.from([1.2, 2.3]);
await store.execute("insert into test VALUES(?, ?);", 0, [0, vectorValue]);

// Non-binding method
await store.execute("insert into test VALUES(1, '[1.3, 2.4]');", 0, undefined);
Enter fullscreen mode Exit fullscreen mode
Update and Delete
// Vector update
const vectorValue1: Float32Array = Float32Array.from([2.1, 3.2]);
await store.execute("update test set repr = ? where id = ?", 0, [vectorValue1, 0]);

// Data deletion
await store.execute("delete from test where id = ?", 0, [0]);
Enter fullscreen mode Exit fullscreen mode

Query Operations

Basic Query
// Parameterized query
const vectorValue2: Float32Array = Float32Array.from([6.2, 7.3]);
let resultSet = await store.querySql(
  "select id, repr <-> ? as distance from test where id > ? order by repr <-> ? limit 5;",
  [vectorValue2, 0, vectorValue2]
);

while (resultSet.goToNextRow()) {
  let id = resultSet.getValue(0);
  let dis = resultSet.getValue(1);
}
resultSet.close();
Enter fullscreen mode Exit fullscreen mode
Subquery and Aggregation
// Subquery example
let resultSet = await store.querySql(
  "select * from test where id in (select id from test1);"
);
resultSet.close();

// Aggregation query
resultSet = await store.querySql(
  "select * from test where repr <-> '[1.0, 1.0]' > 0 group by id having max(repr <=> '[1.0, 1.0]');"
);
resultSet.close();
Enter fullscreen mode Exit fullscreen mode

Advanced Features

Vector Index Optimization

Vector indexing is a key technology to improve query performance. The system supports the following index types:

Index Type Description Applicable Scenarios
gsdiskann High-dimensional dense vector index Text embeddings, image features, etc.
Index Creation Syntax

Basic syntax:

CREATE INDEX [IF NOT EXISTS] index_name ON table_name USING index_type (column_name dist_function);
Enter fullscreen mode Exit fullscreen mode

Extended syntax (with parameters):

CREATE INDEX [basic syntax] WITH(parameter = value [, ...]);
Enter fullscreen mode Exit fullscreen mode

Parameter configuration:

  • QUEUE_SIZE: [10, 1000], default 20.
  • OUT_DEGREE: [1, 1200], default 60.
Index Management Example
// Create an L2 distance index
await store.execute(
  "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2);"
);

// Create an index with parameters
await store.execute(
  "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2) WITH (queue_size=20, out_degree=50);"
);

// Delete an index
await store.execute("DROP INDEX test.diskann_l2_idx;");
Enter fullscreen mode Exit fullscreen mode

Index Hit Conditions

To ensure that queries can utilize vector indexes, the following conditions must be met:

  1. The query must be of the ORDER BY + LIMIT type.
  2. ORDER BY can only have one vector distance sorting condition.
  3. DESC descending order cannot be used.
  4. The query distance metric must be consistent with that when the index was created.

Disk Fragment Management

Starting from API version 20, manual fragment recovery is supported:

// Manually trigger fragment recovery
await store.execute("PRAGMA DISKANN_ASYNC_COLLECTING;");
Enter fullscreen mode Exit fullscreen mode

This function solves problems in the following scenarios:

  • Closing the database immediately after deleting vectors.
  • No subsequent operations after batch deletions.

Data Management Strategies

Data Aging Configuration

Implement automated data cleaning through table creation parameters:

Parameter Required Description
time_col Yes Time column name (integer type)
interval No Aging check interval (default 1 day)
ttl No Data retention time (default 3 months)
max_num No Maximum data volume limit (default 1024)

Example configuration:

await store.execute(
  "CREATE TABLE test2(rec_time integer not null) WITH (time_col = 'rec_time', interval = '5 minute');"
);
Enter fullscreen mode Exit fullscreen mode

Data Compression Function

Supports compressed storage of TEXT type columns:

await store.execute(
  "CREATE TABLE IF NOT EXISTS test3 (time integer not null, content text) with (time_col = 'time', interval = '5 minute', compress_col = 'content');"
);
Enter fullscreen mode Exit fullscreen mode

Summary and Outlook

As an emerging data management technology, vector databases are reshaping the data infrastructure of the AI era. This article has provided a detailed introduction to the core features, development interfaces, and advanced functions of vector databases, offering a comprehensive technical guide for developers.

With the popularity of large models and generative AI, the importance of vector databases will further increase. In the future, we can expect to see smarter indexing algorithms, more efficient query optimization, and tighter integration with AI frameworks, making vector databases the core component of AI-native applications.

Developers should pay close attention to the development of vector database technology, master its core principles and usage skills, and thus gain an advantage in AI-driven application development.

Top comments (0)