Building a RAG application is currently a popular topic, and one key requirement is a vector store or database to store and query embeddings. While there are specialized vector databases available, we can also use the OG PostgreSQL
as a vector database thanks to the pgvector extension.
In this article, we'll explore how to integrate with an existing PostgreSQL database. The official repository also explains how to install it, so you can check there for more details.
If you're starting from scratch, there's a Docker image available with built-in vector support, officially maintained by the pgvector
community. If you already have a database, it's better to integrate with it since it already contains data.
Prerequisite:
Before starting, ensure you have a container running with PostgreSQL. If it's installed locally, make sure it's running on your system.
Installation:
Find PostgreSQL Container
Since we're using PostgreSQL in a container, let's locate the container with the following command:
docker ps
This will list all the containers running on our system. You need to check the Names
column to find the name of the Postgres container. Alternatively, you can list only the containers running from the Postgres image with the following command:
docker ps --filter "ancestor=postgres"
Here are my results for the above command:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
36622542432f postgres:latest "docker-entrypoint.s…" 3 weeks ago Up 4 hours 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp postgres_container
Now, as you can see, my container name is postgres_container
.
Next, let's open the container using a terminal with the following command:
docker exec -it <container-name> bash
# docker exec -it postgres_container bash
Installing pgvector
Now, the Docker image is based on an Ubuntu (Debian-based Linux distribution) image. Since we're inside the Docker image, we can run apt
commands. We need to install some packages to ensure essential tools are available to build the extension, as well as all necessary packages for Postgres to use the pgvector
extension.
You can install the necessary packages with the following command:
apt-get update
apt-get install -y build-essential git postgresql-server-dev-all
Now that these packages are installed, let's proceed with installing the pgvector
extension.
First, let's copy the source from the official repository (preferably in the root directory).
git clone https://github.com/pgvector/pgvector.git
Next, install the extension from this source with the following command:
cd pgvector
make
make install
After successfully installing, exit and restart the container using the following command:
docker restart <container-name>
# docker restart postgres_container
Now that the pgvector
extension is successfully added to the container, the next step is to enable it for the database which we will integrate with pgvector
.
Enable pgvector
First, let's connect to the PostgreSQL database instance from our terminal using the following command:
docker exec -it <container-name> psql -h localhost -U <db-user> -d <db-name>
# docker exec -it postgres_container psql -h localhost -U dushmanta -d klansity
After successfully connecting, run the following command to create the extension in the database:
CREATE EXTENSION vector;
This command will create the vector extension in the database we are logged into.
Next, let's double-check if it appears in the extensions using the following command:
\dx
If the vector extension is successfully created, you'll see it in the results. In my case, this is how it appears:
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+------------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
vector | 0.8.0 | public | vector data type and ivfflat and hnsw access methods
(2 rows)
After successfully installing, let's check if it works.
Testing
Here, we'll create a table with vector fields. Then, we'll store some vectors, fetch them, and see if it works.
Create Table
We'll create a table named pgvector
with a field called embedding
, which is a vector field with a dimension of 3, using the following command.
CREATE TABLE pgvector (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(3)
);
If you want to learn more about vector dimensions, I suggest checking out these Cloudflare docs.
Note: Set the dimensions based on the model you’re going to use to generate the vector embedding.
If you have pgAdmin
, Adminer
, or phpMyAdmin
running, check if the database is created. You can also verify this in the terminal with the following command.
\d pgvector
My results looked like this:
Table "public.pgvector"
Column | Type | Collation | Nullable | Default
-----------+-----------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('pgvector_id_seq'::regclass)
content | text | | |
embedding | vector(3) | | |
Indexes:
"pgvector_pkey" PRIMARY KEY, btree (id)
And in Adminer
, it looks something like this:
Insert Vector Embedding
Now let’s insert a vector embedding into the embedding field using the following command:
INSERT INTO pgvector (content, embedding) VALUES
('first doc', '[1,2,3]'),
('second doc', '[4,5,6]');
This will create two records. Let's check the records.
Note: In real use cases these embedding will be generated by models and the vector numerical values will be very different.
SELECT * FROM pgvector;
Here are the results:
id | content | embedding
----+------------+-----------
1 | first doc | [1,2,3]
2 | second doc | [4,5,6]
(2 rows)
Vector Search
Now that we've added two records with vector data, let's perform a vector search using the vector [1,2,2]
to find which data is closest to it with the following command:
SELECT id, content, embedding
FROM pgvector
ORDER BY embedding <-> '[1,2,2]' -- THe <-> symbol is used for vector search
LIMIT 1;
The result will be the following:
id | content | embedding
----+-----------+-----------
1 | first doc | [1,2,3]
(1 row)
As you can see, it fetched the [1,2,3]
value as its nearest.
Delete Table
Since we created this table for testing purposes, let’s delete it with the following command.
DROP TABLE pgvector;
Now we have successfully installed the pgvector
extension and can use our PostgreSQL
database as a vector database for GenAI applications.
This article is intended as a personal note, but if you're looking to integrate pgvector, I hope you find it helpful. If you notice any mistakes or have any issues, please feel free to contact me. Thank you.
Top comments (0)