DEV Community

OLUWOLE
OLUWOLE

Posted on • Edited on

Teach Your LLM About Your Own Data Using This Simple RAG Setup

If you're building a chatbot, search engine, or any AI application that needs to "know stuff," you've probably bumped into a hard truth:

Large Language Models (LLMs) can't access your private or domain-specific data unless you feed it to them.

Whether it’s product documentation, internal policies, or real-time records, large language models can’t access external knowledge unless you explicitly feed it to them.

Enter RAG (Retrieval-Augmented Generation).

RAG combines the creative power of an LLM with the factual accuracy of your own data. At its core, it relies on semantic search finding the most relevant pieces of text based on meaning, not just keywords.

Instead of asking an LLM to hallucinate answers, RAG pipelines first retrieve relevant content from your data sources, then pass it into the model. The result? More accurate, grounded, and useful responses.

But how do you build the retrieval part?

In this tutorial, we’ll build a lightweight, fast, and cost-free semantic search API using:

  • PostgreSQL + pgvector to store and query embeddings
  • Transformers.js to run a MiniLM model in JavaScript, no cloud required
  • Fastify for a blazing-fast web server

Note: This guide assumes you’re comfortable with basic JavaScript/Node.js and have PostgreSQL installed. Some familiarity with REST APIs and vector embeddings will help, but we’ll keep things practical and code-focused throughout

Let's get started.


1. Setting Up PostgreSQL with pgvector

Install the vector extension in your PostgreSQL database:

CREATE EXTENSION IF NOT EXISTS vector;
Enter fullscreen mode Exit fullscreen mode

Then, create a simple table to store documents and their embeddings:

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT,
  content TEXT,
  embedding vector(384)  -- Dimensions of MiniLM-L6-v2
);
Enter fullscreen mode Exit fullscreen mode

2. Generating Embeddings with Transformers.js

Install the dependencies:

npm install pg @xenova/transformers
Enter fullscreen mode Exit fullscreen mode

Use this script to embed your content and store it in Postgres:

import { Client } from 'pg';
import { pipeline } from '@xenova/transformers';

const db = new Client({ connectionString: 'postgres://localhost/yourdb' });
let embedder = null;

async function generateEmbedding(text) {
  if (!embedder) {
    embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
  }
  const output = await embedder(text, { pooling: 'mean', normalize: true });
  return Array.from(output.data);
}

const docs = [
  { title: 'Cats', content: 'Cats are independent and curious animals.' },
  { title: 'Space', content: 'The universe is vast and mostly unexplored.' },
  { title: 'Bananas', content: 'Bananas are a yellow tropical fruit.' },
];

await db.connect();

for (const doc of docs) {
  const vec = await generateEmbedding(doc.content);
  const pgVector = `[${vec.join(',')}]`;
  await db.query(
    'INSERT INTO documents (title, content, embedding) VALUES ($1, $2, $3::vector)',
    [doc.title, doc.content, pgVector]
  );
  console.log(`Inserted: ${doc.title}`);
}

await db.end();
Enter fullscreen mode Exit fullscreen mode

3. Querying with pgvector

To find the most relevant content to a user's query, embed the query and compare it to your document vectors using cosine distance:

SELECT title, content, embedding <#> $1::vector AS score
FROM documents
ORDER BY score ASC
LIMIT 3;
Enter fullscreen mode Exit fullscreen mode

The <#> operator returns the cosine distance. Lower means more similar.


4. Building the Fastify Search API

npm install fastify @fastify/cors
Enter fullscreen mode Exit fullscreen mode
import Fastify from 'fastify';
import cors from '@fastify/cors';
import { Pool } from 'pg';
import { pipeline } from '@xenova/transformers';

const fastify = Fastify();
await fastify.register(cors, { origin: '*' });
const pool = new Pool({ connectionString: 'postgres://localhost/yourdb' });

let embedder = null;
async function generateEmbedding(text) {
  if (!embedder) {
    embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
  }
  const output = await embedder(text, { pooling: 'mean', normalize: true });
  return `[${Array.from(output.data).join(',')}]`;
}

fastify.post('/search', async (req, res) => {
  const { query } = req.body;
  if (!query) return res.status(400).send({ error: 'Query is required' });

  const vector = await generateEmbedding(query);
  const { rows } = await pool.query(
    `SELECT title, content, embedding <#> $1::vector AS score
     FROM documents
     ORDER BY score ASC
     LIMIT 3`,
    [vector]
  );

  res.send(rows);
});

fastify.listen({ port: 3000 }, () => {
  console.log('API ready at http://localhost:3000');
});
Enter fullscreen mode Exit fullscreen mode

Test it with:

curl -X POST http://localhost:3000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "Tell me about fruit"}'
Enter fullscreen mode Exit fullscreen mode

5. Use Cases and Next Steps

This stack is perfect for:

  • RAG pipelines (feed results into an LLM)
  • Internal knowledge search
  • Chatbot memory lookup
  • Smart filtering with natural language

What to add next:

  • Metadata filtering in queries
  • Chunking for longer docs
  • Hybrid search (text + vector)
  • Integration with OpenAI or Mistral

Conclusion

Vector search is no longer just for ML engineers. With pgvector, Tansformer.js, and Fastify, you can build your own semantic search engine in under an hour, without vendor lock-in. This is not production ready but can help as a baseline for your prod apps.

Happy hacking!

Top comments (3)

Collapse
 
iamcymentho profile image
Odumosu Matthew

Really solid write-up; clean, simple, and avoids the usual cloud bloat. Love the use of pgvector and Transformers.js for a fully local, vendor-free RAG setup. That makes this super practical for early-stage apps or internal tools where cost and data privacy matter.

Would be interesting to see how this holds up with larger corpuses; maybe chunking, metadata filtering, or even hybrid search as you mentioned. Either way, this is a great starting point. Bookmarked. 👏

Collapse
 
promise206 profile image
okpalaugo chukwuka promise

Great piece

Collapse
 
iyosayi profile image
King Etiosasere

Well written, Oluwole. We can take this approach and test it with our code, just like the AI IDEs do. We can convert our JS files into text, create embeddings around them, and perform a quick lookup on our code via RAG. 👏