Introduction to HarmonyOS Next Data Base Vector Database
Background
At this year's HDC, I met the architect of the HarmonyOS data base in the exhibition hall, who introduced an intelligent assistant based on the data base to achieve edge-side capabilities. It was very appealing to hear that HarmonyOS has integrated edge-side AI into the system layer, creating unlimited possibilities for developers. After returning, I quickly checked the updated documents and found that HarmonyOS provides two language interfaces: ArkTS and C++. This article will deeply explore the technical characteristics, core concepts, operation interfaces, and advanced functions of the vector database to help developers comprehensively master this emerging technology.
Overview of Vector Database
A vector database is a database system that supports the storage, management, and retrieval of vector data, while being compatible with traditional relational data processing capabilities. Its core data type floatvector
is used to store vectorized results, enabling the system to efficiently implement similarity search and fast retrieval functions.
Starting from API version 18, the vector database officially supports data persistence through standardized interfaces, providing developers with a reliable data storage solution.
Basic Concepts and Architecture
ResultSet Mechanism
The result set returned by a query operation is called a ResultSet, which provides a flexible way to access data, allowing developers to easily obtain the required information. The result set adopts a lazy loading strategy, loading data from the storage layer only when it is actually accessed, effectively reducing memory consumption.
Vector Data Representation
floatvector
is the core data type of the vector database, used to represent high-dimensional vector data. For example, a numerical array like [1.0, 3.0, 2.4, 5.1, 6.2, 11.7]
is a typical vector representation, widely used in fields such as image recognition and natural language processing.
System Constraints and Limitations
The vector database considers the balance between performance and resources in its design, with the following key constraints:
- Logging Mode: The default is WAL (Write Ahead Log) mode to ensure the atomicity and durability of data writing.
- Disk Persistence Strategy: The FULL mode ensures that data is completely written to the storage medium.
- Connection Management: The system maintains 4 read connections and 1 write connection by default, using connection pool technology to optimize resource usage.
- Write Concurrency: Only one write operation is supported at a time, and concurrent write requests are automatically serialized.
- Data Size Limit: It is recommended that a single piece of data does not exceed 2MB, as exceeding this may cause reading failures.
Data Cleaning Mechanism
When an application is uninstalled, the relevant database files and temporary files on the device are automatically cleared. This design simplifies application lifecycle management and avoids residual file issues.
Data Types and Constraints
Supported Data Types
The vector database supports a rich set of field types to meet diverse data storage needs:
Type | Description | Supported |
---|---|---|
NULL | Null value | Yes |
INTEGER | Integer type | Yes |
DOUBLE | Floating-point type | Yes |
TEXT | String type | Yes |
BLOB | Binary type | Yes |
FLOATVECTOR | Vector data type | Yes |
Field Constraint Mechanism
To ensure data integrity, the system provides various field constraints:
- NOT NULL: Ensures that the field is not empty.
- DEFAULT: Sets the default value for the field.
- UNIQUE: Ensures the uniqueness of the field value.
- PRIMARY KEY: Defines the primary key index.
Note: The system currently does not support foreign key constraints (FOREIGN) and CHECK constraints.
Query Language Features
Supported Query Clauses
The vector database supports rich SQL query clauses:
- WHERE: Conditional filtering.
- LIMIT: Result quantity limitation.
- ORDER BY: Multi-column sorting, especially supporting vector distance sorting.
- GROUP BY: Data grouping.
- HAVING: Aggregate result filtering.
- INDEXED BY: Forces the use of a specific index.
- DISTINCT: Deduplication (not yet supported).
It is particularly noteworthy that the vector distance sorting function supports two distance metrics:
-
<->
: L2 Euclidean distance. -
<=>
: Cosine similarity.
Set Operations
Supports standard set operations:
- UNION: Merges results and removes duplicates.
- UNION ALL: Merges results and retains duplicate items.
Operator System
The system provides comprehensive operator support:
- Arithmetic Operations: +, -, *, /, %.
- Comparison Operations: ==, =, !=, >, >=, <, <=.
- Logical Operations: AND, BETWEEN, EXISTS, etc. (12 types in total).
- String Concatenation: ||.
- Bitwise Operations: &, |, ~, <<, >>.
- Vector Distance Operations: <->, <= (support use in aggregate functions).
Time and Date Functions
Built-in multiple time processing functions:
Function | Description | Format |
---|---|---|
DATE | Date | "YYYY-MM-DD" |
TIME | Time | "HH:MM:SS" |
DATETIME | Date and time | "YYYY-MM-DD HH:MM:SS" |
JULIANDAY | Julian day | Number of days |
STRFTIME | Format date | Custom format |
Aggregation and Analysis Functions
The system provides rich analysis functions:
- COUNT: Row count statistics.
- MAX/MIN: Extreme value calculation.
- AVG: Average value.
- SUM: Total sum.
- RANDOM: Random number generation.
- ABS: Absolute value.
- UPPER/LOWER: String case conversion.
- LENGTH: String length.
These functions can be combined with vector operations to implement complex data analysis requirements.
Development Interfaces and Practices
Environment Detection and Initialization
The first step in development is to detect whether the system supports the vector database:
import { relationalStore } from '@kit.ArkData';
import { UIAbility } from '@kit.AbilityKit';
class EntryAbility extends UIAbility {
async onWindowStageCreate(windowStage: window.WindowStage) {
let ret = relationalStore.isVectorSupported();
if (!ret) {
console.error(`vectorDB is not supported.`);
return;
}
// Initialize the database
}
}
Database Creation and Configuration
Create a database instance through the getRdbStore
interface:
const STORE_CONFIG: relationalStore.StoreConfig = {
name: 'VectorTest.db',
securityLevel: relationalStore.SecurityLevel.S1,
vector: true // Enable vector support
};
relationalStore.getRdbStore(this.context, STORE_CONFIG)
.then(async (rdbStore) => {
// Table creation operation
const SQL_CREATE_TABLE = 'CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, repr floatvector(2));';
await rdbStore.execute(SQL_CREATE_TABLE, 0, undefined);
})
.catch((err) => {
console.error(`Get RdbStore failed: ${err.code}, ${err.message}`);
});
Data Operation Practices
Insert Data
Supports two methods: parameter binding and non-binding:
// Parameter binding method
const vectorValue: Float32Array = Float32Array.from([1.2, 2.3]);
await store.execute("insert into test VALUES(?, ?);", 0, [0, vectorValue]);
// Non-binding method
await store.execute("insert into test VALUES(1, '[1.3, 2.4]');", 0, undefined);
Update and Delete
// Vector update
const vectorValue1: Float32Array = Float32Array.from([2.1, 3.2]);
await store.execute("update test set repr = ? where id = ?", 0, [vectorValue1, 0]);
// Data deletion
await store.execute("delete from test where id = ?", 0, [0]);
Query Operations
Basic Query
// Parameterized query
const vectorValue2: Float32Array = Float32Array.from([6.2, 7.3]);
let resultSet = await store.querySql(
"select id, repr <-> ? as distance from test where id > ? order by repr <-> ? limit 5;",
[vectorValue2, 0, vectorValue2]
);
while (resultSet.goToNextRow()) {
let id = resultSet.getValue(0);
let dis = resultSet.getValue(1);
}
resultSet.close();
Subquery and Aggregation
// Subquery example
let resultSet = await store.querySql(
"select * from test where id in (select id from test1);"
);
resultSet.close();
// Aggregation query
resultSet = await store.querySql(
"select * from test where repr <-> '[1.0, 1.0]' > 0 group by id having max(repr <=> '[1.0, 1.0]');"
);
resultSet.close();
Advanced Features
Vector Index Optimization
Vector indexing is a key technology to improve query performance. The system supports the following index types:
Index Type | Description | Applicable Scenarios |
---|---|---|
gsdiskann | High-dimensional dense vector index | Text embeddings, image features, etc. |
Index Creation Syntax
Basic syntax:
CREATE INDEX [IF NOT EXISTS] index_name ON table_name USING index_type (column_name dist_function);
Extended syntax (with parameters):
CREATE INDEX [basic syntax] WITH(parameter = value [, ...]);
Parameter configuration:
-
QUEUE_SIZE
: [10, 1000], default 20. -
OUT_DEGREE
: [1, 1200], default 60.
Index Management Example
// Create an L2 distance index
await store.execute(
"CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2);"
);
// Create an index with parameters
await store.execute(
"CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2) WITH (queue_size=20, out_degree=50);"
);
// Delete an index
await store.execute("DROP INDEX test.diskann_l2_idx;");
Index Hit Conditions
To ensure that queries can utilize vector indexes, the following conditions must be met:
- The query must be of the
ORDER BY + LIMIT
type. -
ORDER BY
can only have one vector distance sorting condition. -
DESC
descending order cannot be used. - The query distance metric must be consistent with that when the index was created.
Disk Fragment Management
Starting from API version 20, manual fragment recovery is supported:
// Manually trigger fragment recovery
await store.execute("PRAGMA DISKANN_ASYNC_COLLECTING;");
This function solves problems in the following scenarios:
- Closing the database immediately after deleting vectors.
- No subsequent operations after batch deletions.
Data Management Strategies
Data Aging Configuration
Implement automated data cleaning through table creation parameters:
Parameter | Required | Description |
---|---|---|
time_col | Yes | Time column name (integer type) |
interval | No | Aging check interval (default 1 day) |
ttl | No | Data retention time (default 3 months) |
max_num | No | Maximum data volume limit (default 1024) |
Example configuration:
await store.execute(
"CREATE TABLE test2(rec_time integer not null) WITH (time_col = 'rec_time', interval = '5 minute');"
);
Data Compression Function
Supports compressed storage of TEXT type columns:
await store.execute(
"CREATE TABLE IF NOT EXISTS test3 (time integer not null, content text) with (time_col = 'time', interval = '5 minute', compress_col = 'content');"
);
Summary and Outlook
As an emerging data management technology, vector databases are reshaping the data infrastructure of the AI era. This article has provided a detailed introduction to the core features, development interfaces, and advanced functions of vector databases, offering a comprehensive technical guide for developers.
With the popularity of large models and generative AI, the importance of vector databases will further increase. In the future, we can expect to see smarter indexing algorithms, more efficient query optimization, and tighter integration with AI frameworks, making vector databases the core component of AI-native applications.
Developers should pay close attention to the development of vector database technology, master its core principles and usage skills, and thus gain an advantage in AI-driven application development.
Top comments (0)