RAG with OpenAI Embeddings, pgvector and LangChain

#rag #embeddings #ai #langchain

Retrieval-Augmented Generation (RAG) is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context.

This guide shows an end-to-end RAG flow with LangChain, OpenAI embeddings, PostgreSQL + pgvector, and an LCEL answer chain. For LangChain basics, see the LangChain overview post. For loaders and splitter choice, see the loaders and chunking post.

Prerequisites

OpenAI account
Generated API key
Enabled billing
Node.js version 26
PostgreSQL with pgvector extension enabled
npm packages: @langchain/pgvector, @langchain/openai, @langchain/core, @langchain/textsplitters, langchain, pg

npm i @langchain/pgvector @langchain/openai @langchain/core @langchain/textsplitters langchain pg

What are embeddings?

Embeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space.

In this pipeline:

Split source documents into chunks
Embed chunks with OpenAIEmbeddings and store them in pgvector via PGVectorStore
Embed the user question at query time and retrieve nearest chunks with a LangChain retriever
Pass retrieved context into an LCEL chain that calls ChatOpenAI

Chunk documents

Chunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts.
Start with chunkSize: 800 and chunkOverlap: 120, then adjust based on your document style and answer quality.

import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 800,
  chunkOverlap: 120
});

const docs = await splitter.createDocuments(
  ['RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time.'],
  [{ source: 'notes.md' }]
);

Store chunks in pgvector

Use PGVectorStore from @langchain/pgvector. It creates the table if needed, embeds documents, and stores vectors with metadata.

import pg from 'pg';
import { OpenAIEmbeddings } from '@langchain/openai';
import { PGVectorStore } from '@langchain/pgvector';

const embeddings = new OpenAIEmbeddings({ model: 'text-embedding-3-small' });

const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

const vectorStore = await PGVectorStore.initialize(embeddings, {
  pool,
  tableName: 'rag_documents',
  columns: {
    idColumnName: 'id',
    vectorColumnName: 'vector',
    contentColumnName: 'content',
    metadataColumnName: 'metadata'
  },
  distanceStrategy: 'cosine'
});

await vectorStore.addDocuments(docs);

Retrieve context

Turn the vector store into a retriever to fetch the top-k relevant chunks for a question:

const retriever = vectorStore.asRetriever({ k: 4 });

const chunks = await retriever.invoke('How does pgvector semantic search work?');

RAG chain with LCEL

Wire retrieval and generation with LCEL. The retriever supplies context; the model answers from that context only.

import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { RunnablePassthrough, RunnableSequence } from '@langchain/core/runnables';
import { ChatOpenAI } from '@langchain/openai';

const prompt = ChatPromptTemplate.fromMessages([
  [
    'system',
    'Answer only from the provided context. If context is insufficient, say you need more data.'
  ],
  ['human', 'Context:\n{context}\n\nQuestion: {question}']
]);

const model = new ChatOpenAI({ model: 'gpt-5.5' });

const formatDocs = (documents) =>
  documents.map((doc) => doc.pageContent).join('\n\n---\n\n');

const chain = RunnableSequence.from([
  {
    context: retriever,
    question: new RunnablePassthrough()
  },
  (input) => ({
    context: formatDocs(input.context),
    question: input.question
  }),
  prompt,
  model,
  new StringOutputParser()
]);

const answer = await chain.invoke('How does pgvector semantic search work?');
console.log(answer);