DEV Community

AXUM中文博客
AXUM中文博客

Posted on

使用 Zhparser 插件实现 PostgreSQL 中文全文检索

Docker 容器

docker run \
        --name postgres \
        -e POSTGRES_PASSWORD=postgres \
        -e TZ=PRC \
        --restart=always \
        -e PGDATA=/var/lib/postgresql/data/pgdata \
        -v /var/docker/postgres:/var/lib/postgresql/data \
        -p 5432:5432 \
        -d postgres

docker exec -it postgres bash # 进入 pg 容器
Enter fullscreen mode Exit fullscreen mode

编译安装 Zhparser

以下步骤都是在 PG 容器中操作。

安装依赖:

postgresql-server-dev-17 改成对应版本,也可以在 docker run 中明确指定拉取的镜像版本,以便保持统一。

apt update -y && apt install wget gcc make git bzip2 postgresql-server-dev-17 -y
Enter fullscreen mode Exit fullscreen mode

编译 Zhparser:

cd /tmp
wget http://www.xunsearch.com/scws/down/scws-1.2.3.tar.bz2
tar -jxvf scws-1.2.3.tar.bz2 
cd scws-1.2.3
./configure && make && make install

cd ..
git clone https://github.com/amutu/zhparser.git
cd zhparser/
make && make install
Enter fullscreen mode Exit fullscreen mode

验证安装。首先连接到 PG 服务器:

psql -U postgres
Enter fullscreen mode Exit fullscreen mode

然后:

CREATE EXTENSION zhparser; -- 启用 Zhparser 扩展
CREATE TEXT SEARCH CONFIGURATION chinese (PARSER = zhparser); -- 中文全文检索
ALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple; -- 修改词性
select ts_token_type('zhparser'); -- 词性列表
Enter fullscreen mode Exit fullscreen mode

测试:

to_tsvector 测试:

SELECT to_tsvector('chinese','人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。Hello world');
Enter fullscreen mode Exit fullscreen mode

结果:

                                                         to_tsvector
------------------------------------------------------------------------------------------------------------------------------
 'hello':12 'world':13 '人生':1 '使':4 '千金':8 '复来':11 '天生我材必有用':7 '对月':6 '尽':10 '尽欢':3 '得意':2 '散':9 '空':5
(1 row)

Enter fullscreen mode Exit fullscreen mode

to_tsquery 测试:

SELECT to_tsquery('chinese', '金风玉露一相逢,便胜却人间无数。It & works');
Enter fullscreen mode Exit fullscreen mode

结果:

                          to_tsquery
--------------------------------------------------------------
 '金风玉露' <-> '相逢' <-> '胜' <-> '人间' <-> 'it' & 'works'
(1 row)
Enter fullscreen mode Exit fullscreen mode

参考:https://www.fdevops.com/2023/02/05/postgres-zhparser-31246

Top comments (0)