DEV Community

Cover image for Transform PostgreSQL into a Vector Database with pgvector
Dushmanta
Dushmanta

Posted on

Transform PostgreSQL into a Vector Database with pgvector

Building a RAG application is currently a popular topic, and one key requirement is a vector store or database to store and query embeddings. While there are specialized vector databases available, we can also use the OG PostgreSQL as a vector database thanks to the pgvector extension.

In this article, we'll explore how to integrate with an existing PostgreSQL database. The official repository also explains how to install it, so you can check there for more details.

If you're starting from scratch, there's a Docker image available with built-in vector support, officially maintained by the pgvector community. If you already have a database, it's better to integrate with it since it already contains data.

Prerequisite:

Before starting, ensure you have a container running with PostgreSQL. If it's installed locally, make sure it's running on your system.

Installation:

Find PostgreSQL Container

Since we're using PostgreSQL in a container, let's locate the container with the following command:

docker ps
Enter fullscreen mode Exit fullscreen mode

This will list all the containers running on our system. You need to check the Names column to find the name of the Postgres container. Alternatively, you can list only the containers running from the Postgres image with the following command:

docker ps --filter "ancestor=postgres"
Enter fullscreen mode Exit fullscreen mode

Here are my results for the above command:

CONTAINER ID   IMAGE             COMMAND                  CREATED       STATUS       PORTS                                       NAMES
36622542432f   postgres:latest   "docker-entrypoint.s…"   3 weeks ago   Up 4 hours   0.0.0.0:5432->5432/tcp, :::5432->5432/tcp   postgres_container
Enter fullscreen mode Exit fullscreen mode

Now, as you can see, my container name is postgres_container.

Next, let's open the container using a terminal with the following command:

docker exec -it <container-name> bash
# docker exec -it postgres_container bash
Enter fullscreen mode Exit fullscreen mode

Installing pgvector

Now, the Docker image is based on an Ubuntu (Debian-based Linux distribution) image. Since we're inside the Docker image, we can run apt commands. We need to install some packages to ensure essential tools are available to build the extension, as well as all necessary packages for Postgres to use the pgvector extension.

You can install the necessary packages with the following command:

apt-get update
apt-get install -y build-essential git postgresql-server-dev-all
Enter fullscreen mode Exit fullscreen mode

Now that these packages are installed, let's proceed with installing the pgvector extension.

First, let's copy the source from the official repository (preferably in the root directory).

git clone https://github.com/pgvector/pgvector.git
Enter fullscreen mode Exit fullscreen mode

Next, install the extension from this source with the following command:

cd pgvector
make
make install
Enter fullscreen mode Exit fullscreen mode

After successfully installing, exit and restart the container using the following command:

docker restart <container-name>
# docker restart postgres_container
Enter fullscreen mode Exit fullscreen mode

Now that the pgvector extension is successfully added to the container, the next step is to enable it for the database which we will integrate with pgvector.

Enable pgvector

First, let's connect to the PostgreSQL database instance from our terminal using the following command:

docker exec -it <container-name> psql -h localhost -U <db-user> -d <db-name>

# docker exec -it postgres_container psql -h localhost -U dushmanta -d klansity
Enter fullscreen mode Exit fullscreen mode

After successfully connecting, run the following command to create the extension in the database:

CREATE EXTENSION vector;
Enter fullscreen mode Exit fullscreen mode

This command will create the vector extension in the database we are logged into.

Next, let's double-check if it appears in the extensions using the following command:

\dx
Enter fullscreen mode Exit fullscreen mode

If the vector extension is successfully created, you'll see it in the results. In my case, this is how it appears:

List of installed extensions
  Name   | Version |   Schema   |                     Description                      
---------+---------+------------+------------------------------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector  | 0.8.0   | public     | vector data type and ivfflat and hnsw access methods
(2 rows)
Enter fullscreen mode Exit fullscreen mode

After successfully installing, let's check if it works.

Testing

Here, we'll create a table with vector fields. Then, we'll store some vectors, fetch them, and see if it works.

Create Table

We'll create a table named pgvector with a field called embedding, which is a vector field with a dimension of 3, using the following command.

CREATE TABLE pgvector (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding VECTOR(3)
);
Enter fullscreen mode Exit fullscreen mode

If you want to learn more about vector dimensions, I suggest checking out these Cloudflare docs.

Note: Set the dimensions based on the model you’re going to use to generate the vector embedding.

If you have pgAdmin, Adminer, or phpMyAdmin running, check if the database is created. You can also verify this in the terminal with the following command.

\d pgvector
Enter fullscreen mode Exit fullscreen mode

My results looked like this:

                               Table "public.pgvector"
  Column   |   Type    | Collation | Nullable |               Default                
-----------+-----------+-----------+----------+--------------------------------------
 id        | integer   |           | not null | nextval('pgvector_id_seq'::regclass)
 content   | text      |           |          | 
 embedding | vector(3) |           |          | 
Indexes:
    "pgvector_pkey" PRIMARY KEY, btree (id)
Enter fullscreen mode Exit fullscreen mode

And in Adminer, it looks something like this:

Adminer result

Insert Vector Embedding

Now let’s insert a vector embedding into the embedding field using the following command:

INSERT INTO pgvector (content, embedding) VALUES
('first doc', '[1,2,3]'),
('second doc', '[4,5,6]');
Enter fullscreen mode Exit fullscreen mode

This will create two records. Let's check the records.

Note: In real use cases these embedding will be generated by models and the vector numerical values will be very different.

SELECT * FROM pgvector;
Enter fullscreen mode Exit fullscreen mode

Here are the results:

 id |  content   | embedding 
----+------------+-----------
  1 | first doc  | [1,2,3]
  2 | second doc | [4,5,6]
(2 rows)
Enter fullscreen mode Exit fullscreen mode

Vector Search

Now that we've added two records with vector data, let's perform a vector search using the vector [1,2,2] to find which data is closest to it with the following command:

SELECT id, content, embedding
FROM pgvector
ORDER BY embedding <-> '[1,2,2]' -- THe <-> symbol is used for vector search
LIMIT 1;
Enter fullscreen mode Exit fullscreen mode

The result will be the following:

 id |  content  | embedding 
----+-----------+-----------
  1 | first doc | [1,2,3]
(1 row)
Enter fullscreen mode Exit fullscreen mode

As you can see, it fetched the [1,2,3] value as its nearest.

Delete Table

Since we created this table for testing purposes, let’s delete it with the following command.

DROP TABLE pgvector;
Enter fullscreen mode Exit fullscreen mode

Now we have successfully installed the pgvector extension and can use our PostgreSQL database as a vector database for GenAI applications.


This article is intended as a personal note, but if you're looking to integrate pgvector, I hope you find it helpful. If you notice any mistakes or have any issues, please feel free to contact me. Thank you.

Top comments (0)