DEV Community

Cover image for Setup PostgreSQL w/ pgvector in a docker container
David Y Soards
David Y Soards

Posted on โ€ข Originally published at blog.soards.me

Setup PostgreSQL w/ pgvector in a docker container

This post is a follow-up to my previous post on how to setup a local MySQL instance in docker.

RAG (Retrieval Augmented Generation) is quickly becoming the "Hello World" of AI apps. If you are working or playing with Large Language Models, you will no doubt need to create a RAG pipeline at some point. An important component of RAG is a vector database, and a popular option is pgvector - an open-source vector similarity search for Postgres. Here's how to quickly setup a local instance in a Docker container.

Pull and run the image

Pull the latest image from the docker repository. Replace 17 with your Postgres server version of choice.

docker pull pgvector/pgvector:pg17
Enter fullscreen mode Exit fullscreen mode

Run the image, set the root user password, and expose the default Postgres port.

docker run -d --name <container_name> -e POSTGRES_PASSWORD=postgres -p 5432:5432 pgvector/pgvector:pg17
Enter fullscreen mode Exit fullscreen mode

Create a db inside the container

With the Postgres server running, create a database inside the container.

docker exec -it <container_name> createdb -U postgres <database_name>
Enter fullscreen mode Exit fullscreen mode

Connect to the database

Now we can connect to the database from our application and initialize the pgvector extension. I'll be using JavaScript. Setting up the entire application is outside the scope of this post, but you will need to install a couple dependencies:

pnpm add pg pgvector
Enter fullscreen mode Exit fullscreen mode

Set a DATABASE_URL in your environment. I use a .env file. It should follow this format:

DATABASE_URL=postgresql://<pg_user>:<pg_password>@localhost:5432/<database_name>
Enter fullscreen mode Exit fullscreen mode

For local development use @localhost, but if you are using something like docker-compose.yml and have named the service, you should use the name of the service e.g. @db.

In your application code, create the connection:

const pool = new pg.Pool({
  connectionString: process.env.DATABASE_URL,
});
Enter fullscreen mode Exit fullscreen mode

Then, initialize pgvector and create a new table:

async function createStore() {
  // Initialize pgvector extension and create table if not exists
  await pool.query('CREATE EXTENSION IF NOT EXISTS vector');

  return {
    vectorStore: await PGVectorStore.initialize(embeddings, {
      postgresConnectionOptions: {
        connectionString: process.env.DATABASE_URL,
      },
      tableName: 'documents', // Default table name
    }),
  };
}
Enter fullscreen mode Exit fullscreen mode

With the vectorStore setup, you can add content to it using vectorStore.addDocuments and query for context using vectorStore.similaritySearch.

That's it for this post. Maybe next time I will explore more specific uses of pgvector, and/or using it with Drizzle ORM! ๐Ÿ‘‹

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

Image of Timescale

Timescale โ€“ the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

๐Ÿ‘‹ Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay