Gaurav Kumar Singh

Posted on Jun 14

Build a Local RAG AI App with Ollama, Mistral, and Node.js

#node #rag #javascript #ai

Most people start using Large Language Models by asking questions directly:

Question -> LLM -> Answer

This works well for general questions.

But what happens when you ask about your own documents, company policy, product FAQ, or internal notes?

The model may not know the answer. Even worse, it may guess confidently. This is called a hallucination.

That is the problem RAG solves.

In this article, we will build a simple local RAG app using Ollama, Mistral, and Node.js.

Complete code is available here:

https://github.com/gaurav101/ai-experiment/tree/main/rag

What Is RAG?

RAG stands for Retrieval-Augmented Generation.

That sounds complex, but the idea is simple:

Before asking the LLM to answer, first search your own documents and give the most useful parts to the model.

So instead of this:

User question -> LLM -> Answer

we do this:

User question
   -> Search local documents
   -> Find relevant text
   -> Send that text to the LLM
   -> Generate an answer

Think of it like an open-book exam.

The LLM is still doing the writing, but now it has the right page open before it answers.

Why RAG Matters

RAG is important because most real AI apps need private or updated information.

For example:

A chatbot that answers questions from company documents
A support assistant that reads product FAQs
A legal assistant that searches contracts
A coding assistant that understands project docs
A personal assistant that uses your own notes

Without RAG, the model only uses what it already learned during training.

With RAG, we can give the model fresh information at runtime.

This means:

No need to retrain the model
Documents can stay private
Answers are based on your data
The system is easier to update

If your refund policy changes, you update the document and rebuild the index. You do not retrain the LLM.

What We Will Build

We will build a local RAG app that can answer questions from files stored on your machine.

The app uses:

Node.js for the code
Ollama to run models locally
Mistral to generate answers
nomic-embed-text to create embeddings
A local data/docs folder for documents
A local data/index.json file as a simple vector index

The project flow is:

Documents -> Chunks -> Embeddings -> Search -> Context -> Mistral -> Answer

Do not worry if words like "embeddings" or "vector index" feel new. We will walk through them step by step.

Step 1: Install Ollama

First, install Ollama:

https://ollama.com/download

On Linux or macOS, you can also install it with:

curl -fsSL https://ollama.com/install.sh | sh

Start Ollama:

ollama serve

Now pull the two models we need.

Mistral will generate the final answer:

ollama pull mistral

nomic-embed-text will convert text into embeddings:

ollama pull nomic-embed-text

You can test Mistral with:

ollama run mistral

Step 2: Create the Node.js Project

Create a new project:

mkdir local-rag
cd local-rag
npm init -y

Use ES modules by adding this to package.json:

{
  "type": "module"
}

Install dotenv:

npm install dotenv

In this project, we use these scripts:

{
  "scripts": {
    "index": "node src/index-docs.js",
    "ask:ollama": "node src/ask-ollama.js"
  }
}

npm run index builds the searchable document index.

npm run ask:ollama asks a question using Ollama and Mistral.

Step 3: Add Local Documents

Create a folder for your documents:

mkdir -p data/docs

Add a file:

data/docs/company-faq.txt

Example content:

Refunds are allowed within 14 days of purchase.
Enterprise customers get priority email support.
The product supports SSO on the Business and Enterprise plans.

These documents are the knowledge base for our RAG app.

Later, when the user asks a question, the app will search these files first.

Step 4: Add Configuration

Create src/config.js.

This file keeps all important settings in one place:

export const DOCS_DIR = process.env.DOCS_DIR || "data/docs";
export const INDEX_FILE = process.env.INDEX_FILE || "data/index.json";

export const OLLAMA_BASE =
  process.env.OLLAMA_BASE_URL || "http://localhost:11434";

export const OLLAMA_EMBED_ENDPOINT = "/api/embeddings";
export const OLLAMA_EMBED_MODEL = "nomic-embed-text";

export const OLLAMA_GEN_ENDPOINT = "/api/generate";
export const OLLAMA_GEN_MODEL = process.env.OLLAMA_MODEL || "mistral";

This tells the app:

Where to read documents from
Where to save the index
Which model to use for embeddings
Which model to use for answers

Step 5: Read and Split Documents

LLMs work better when we give them small, focused pieces of text.

So we split long documents into smaller parts called chunks.

export function chunkText(text, size = 900, overlap = 150) {
  const chunks = [];
  let start = 0;

  while (start < text.length) {
    const end = start + size;
    chunks.push(text.slice(start, end));
    start += size - overlap;
  }

  return chunks.map((chunk) => chunk.trim()).filter(Boolean);
}

Here:

size = 900 means each chunk is around 900 characters
overlap = 150 means the next chunk repeats 150 characters from the previous one

The overlap is useful because important meaning can sit between two chunks.

Example:

Chunk 1: characters 0 to 900
Chunk 2: characters 750 to 1650
Chunk 3: characters 1500 to 2400

Step 6: Create Embeddings

An embedding is a list of numbers that represents the meaning of text.

For example, this sentence:

The product supports SSO on Business and Enterprise plans.

is converted into a vector:

[0.12, -0.04, 0.89, ...]

The exact numbers are not important for us.

What matters is this:

Similar text gets similar embeddings.

Why Do We Need `nomic-embed-text`?

Mistral is good at generating answers, but we also need a way to search our documents by meaning.

That is what nomic-embed-text does.

It converts text into embeddings so our app can compare:

The user's question
The chunks from our documents

Without embeddings, our app would only do simple keyword matching.

For example, if the document says:

The product supports SSO on the Business and Enterprise plans.

and the user asks:

Which subscription includes single sign-on?

keyword search may miss the connection because the words are different.

But embeddings can understand that SSO and single sign-on are related.

So in this project:

nomic-embed-text is used for search
mistral is used for answering

That means a question like:

Which plans have SSO?

should be close to the document sentence:

The product supports SSO on the Business and Enterprise plans.

Here is the embedding function:

export async function embed(text) {
  const response = await fetch("http://localhost:11434/api/embeddings", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "nomic-embed-text",
      prompt: text
    })
  });

  const data = await response.json();
  return data.embedding;
}

We use this same function for both:

Document chunks
User questions

That is how we compare a question with our documents.

Step 7: Build the Index

Now we create the index.

The index is a JSON file that stores:

The document name
The chunk text
The embedding for that chunk

export async function buildIndex() {
  const docs = await readDocuments();
  const records = [];

  for (const doc of docs) {
    const chunks = chunkText(doc.text);

    for (const [i, chunk] of chunks.entries()) {
      records.push({
        id: `${doc.source}#${i}`,
        source: doc.source,
        text: chunk,
        embedding: await embed(chunk)
      });
    }
  }

  await fs.writeFile("data/index.json", JSON.stringify(records, null, 2));
  return records.length;
}

Run:

npm run index

This creates:

data/index.json

Now your documents are searchable by meaning, not just by exact words.

Step 8: Search the Best Chunks

When the user asks a question, we need to find the document chunks that are closest to that question.

To do that, we:

Convert the question into an embedding
Compare it with every saved document embedding
Sort the results
Keep the best matches

The comparison uses cosine similarity:

function cosineSimilarity(a, b) {
  let dot = 0;
  let normA = 0;
  let normB = 0;

  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }

  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

Then we search the index:

export async function search(query, limit = 4) {
  const raw = await fs.readFile("data/index.json", "utf8");
  const index = JSON.parse(raw);

  const queryEmbedding = await embed(query);

  return index
    .map((item) => ({
      ...item,
      score: cosineSimilarity(queryEmbedding, item.embedding)
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, limit);
}

The result is a small set of document chunks that are most likely to contain the answer.

Step 9: Give Context to Mistral

Now we have the useful document chunks.

Next, we send them to Mistral with the user's question.

The prompt looks like this:

const prompt = `
Answer using only the context below.
If the answer is missing, say you do not know.

Context:
${context}

Question:
${question}
`;

This line is very important:

Answer using only the context below.

It tells the model not to guess.

Then we call Ollama's generation API:

const resp = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "mistral",
    prompt,
    max_tokens: 512,
    temperature: 0.2
  })
});

temperature: 0.2 makes the answer more focused and less random.

Step 10: Ask a Question

Now run:

npm run ask:ollama -- "What plans support SSO?"

Example answer:

The product supports SSO on the Business and Enterprise plans.

This answer came from the local document.

Mistral did not need to already know your product FAQ. The RAG pipeline found the right context and gave it to the model.

The Full RAG Flow

Here is the complete flow again:

1. Put documents in data/docs
2. Split documents into chunks
3. Convert chunks into embeddings
4. Save embeddings in data/index.json
5. User asks a question
6. Convert the question into an embedding
7. Find the most similar chunks
8. Add those chunks to the prompt
9. Ask Mistral to answer using that context

That is RAG.

Search first. Generate second.

Why This Local Version Is Useful

This project is intentionally simple.

It uses a JSON file instead of a vector database. That makes it easier to understand what is happening.

For learning, this is perfect.

For production, you may later replace data/index.json with a vector database such as:

Chroma
Qdrant
Weaviate
pgvector

But the core idea stays the same:

Store embeddings -> Search similar chunks -> Send context to the LLM

Conclusion

RAG is one of the most useful patterns for building practical AI apps.

It helps LLMs answer using your data without retraining the model.

In this article, we built a local RAG app with:

Ollama
Mistral
Node.js
nomic-embed-text
Local documents
A JSON-based vector index

The main idea is simple:

Search your documents first, then let the LLM answer with that context.

Complete implementation:

https://github.com/gaurav101/ai-experiment/tree/main/rag

Setup references:

DEV Community

Build a Local RAG AI App with Ollama, Mistral, and Node.js

What Is RAG?

Why RAG Matters

What We Will Build

Step 1: Install Ollama

Step 2: Create the Node.js Project

Step 3: Add Local Documents

Step 4: Add Configuration

Step 5: Read and Split Documents

Step 6: Create Embeddings

Why Do We Need `nomic-embed-text`?

Step 7: Build the Index

Step 8: Search the Best Chunks

Step 9: Give Context to Mistral

Step 10: Ask a Question

The Full RAG Flow

Why This Local Version Is Useful

Conclusion

Top comments (0)

What Is RAG?

Why RAG Matters

What We Will Build

Step 1: Install Ollama

Step 2: Create the Node.js Project

Step 3: Add Local Documents

Step 4: Add Configuration

Step 5: Read and Split Documents

Step 6: Create Embeddings

Why Do We Need nomic-embed-text?

Step 7: Build the Index

Step 8: Search the Best Chunks

Step 9: Give Context to Mistral

Step 10: Ask a Question

The Full RAG Flow

Why This Local Version Is Useful

Conclusion

Why Do We Need `nomic-embed-text`?