NeuroLink AI

Posted on Apr 5 • Edited on Jun 27 • Originally published at blog.neurolink.ink

Build a Complete RAG System with TypeScript: From Embeddings to Retrieval

#typescript #ai #webdev #tutorial

Build a Complete RAG System with TypeScript: From Embeddings to Retrieval

Retrieval-Augmented Generation (RAG) has become the standard pattern for building AI applications that can reason over private data. Instead of fine-tuning models on your documents, RAG retrieves relevant context at query time, making it more flexible and cost-effective. With NeuroLink, building a production-ready RAG system in TypeScript takes minutes, not weeks.

Understanding the RAG Pipeline

A complete RAG system consists of four key components:

Document Processing: Loading and chunking documents
Embedding Generation: Converting text to vector representations
Vector Storage: Persisting embeddings for efficient retrieval
Retrieval & Generation: Finding relevant context and generating responses

NeuroLink provides unified APIs for all these stages, working seamlessly across 13 AI providers.

Step 1: Document Processing and Chunking

Start by processing your documents with NeuroLink's built-in RAG capabilities:

import { NeuroLink } from "@juspay/neurolink";
import { readFile } from "fs/promises";

const neurolink = new NeuroLink();

// Load your documents
const documents = await Promise.all([
  readFile("./docs/api-reference.md", "utf-8"),
  readFile("./docs/guides/authentication.md", "utf-8"),
  readFile("./docs/guides/webhooks.md", "utf-8"),
  readFile("./docs/faq.md", "utf-8")
]);

// Configure chunking strategy
const ragConfig = {
  chunking: {
    strategy: "semantic",     // semantic, fixed, recursive, or markdown
    chunkSize: 512,           // tokens per chunk
    chunkOverlap: 50,         // overlap between chunks
    separators: ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]
  },
  embedding: {
    model: "text-embedding-3-large",
    provider: "openai"
  }
};

NeuroLink supports 10 different chunking strategies:

Strategy	Best For
`semantic`	Preserves meaning boundaries
`fixed`	Simple, predictable chunks
`recursive`	Hierarchical document structure
`markdown`	Markdown-formatted docs
`code`	Source code files

Step 2: Generating Embeddings

NeuroLink's embed() and embedMany() APIs handle embedding generation with automatic batching and provider fallback:

// Generate embeddings for a single text
const singleEmbedding = await neurolink.embed({
  text: "How do I authenticate API requests?",
  model: "text-embedding-3-large",
  provider: "openai"
});

console.log(singleEmbedding.embedding); // float[] vector

// Batch embed multiple documents efficiently
const chunks = [
  "Authentication requires an API key in the Authorization header...",
  "Webhooks are sent as POST requests to your configured endpoint...",
  "Rate limits are enforced per API key at 1000 requests per minute...",
  // ... hundreds more chunks
];

const embeddings = await neurolink.embedMany({
  texts: chunks,
  model: "text-embedding-3-large",
  provider: "openai",
  batchSize: 100  // Automatic batching for rate limits
});

console.log(`Generated ${embeddings.length} embeddings`);

The embedMany() API automatically:

Batches requests to respect provider rate limits
Retries failed batches with exponential backoff
Falls back to alternative providers if configured
Caches results to reduce API costs

Step 3: Vector Storage and Retrieval

For production systems, you'll want to store embeddings in a vector database. Here's how to integrate with popular options:

import { Pinecone } from "@pinecone-database/pinecone";

// Initialize vector store
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index("neurolink-docs");

// Store embeddings with metadata
async function storeEmbeddings(
  chunks: string[],
  embeddings: number[][]
) {
  const records = chunks.map((chunk, i) => ({
    id: `chunk-${i}`,
    values: embeddings[i],
    metadata: {
      text: chunk,
      source: "api-reference.md",
      timestamp: Date.now()
    }
  }));

  await index.upsert(records);
}

// Retrieve relevant chunks
async function retrieveContext(query: string, topK: number = 5) {
  // Generate query embedding
  const queryEmbedding = await neurolink.embed({
    text: query,
    model: "text-embedding-3-large",
    provider: "openai"
  });

  // Search vector store
  const results = await index.query({
    vector: queryEmbedding.embedding,
    topK,
    includeMetadata: true
  });

  return results.matches?.map(m => m.metadata?.text) ?? [];
}

Step 4: The Complete RAG Query

Now put it all together for intelligent document Q&A:

async function askQuestion(question: string) {
  // 1. Retrieve relevant context
  const contexts = await retrieveContext(question, 5);

  // 2. Build augmented prompt
  const contextText = contexts.join("\n\n---\n\n");

  // 3. Generate answer with context
  const result = await neurolink.generate({
    input: {
      text: `Answer the question based on the following context.

             Context:
             ${contextText}

             Question: ${question}

             Provide a clear, accurate answer. If the context doesn't contain
             the answer, say "I don't have enough information to answer that."`
    },
    model: "claude-4-sonnet",
    provider: "anthropic"
  });

  return result.output.text;
}

// Usage
const answer = await askQuestion(
  "What's the rate limit for webhook endpoints?"
);
console.log(answer);

Simplified RAG with NeuroLink's Built-in Feature

For common use cases, NeuroLink offers a streamlined approach—just pass your files directly:

// One-line RAG: NeuroLink handles chunking, embedding, and retrieval
const result = await neurolink.generate({
  input: {
    text: "What are the authentication requirements?",
    files: ["./docs/api-reference.md", "./docs/guides/authentication.md"]
  },
  model: "claude-4-sonnet",
  provider: "anthropic",
  rag: {
    chunking: "semantic",
    chunkSize: 512,
    search: "hybrid"  // semantic + keyword search
  }
});

// NeuroLink automatically:
// 1. Reads the files
// 2. Chunks them using semantic strategy
// 3. Generates embeddings
// 4. Retrieves relevant chunks
// 5. Includes them in the prompt
// 6. Generates the response

Advanced: Hybrid Search and Reranking

For better retrieval quality, combine semantic search with keyword matching and reranking:

const result = await neurolink.generate({
  input: { text: "How do I handle webhook retries?" },
  rag: {
    files: ["./docs/webhooks.md", "./docs/error-handling.md"],
    search: "hybrid",           // semantic + keyword
    reranker: "cohere",         // cohere, flashrank, or cross-encoder
    topK: 10,                   // retrieve top 10
    rerankTopK: 5               // return top 5 after reranking
  }
});

Rerankers reorder results based on relevance to the query, significantly improving accuracy for complex questions.

Building a Production-Ready Document Q&A System

Here's a complete example with error handling and streaming:

import { NeuroLink } from "@juspay/neurolink";
import { Pinecone } from "@pinecone-database/pinecone";

class DocumentQASystem {
  private neurolink: NeuroLink;
  private vectorStore: any;

  constructor() {
    this.neurolink = new NeuroLink();
    this.vectorStore = new Pinecone({
      apiKey: process.env.PINECONE_API_KEY!
    }).index("docs");
  }

  // Ingest documents into the system
  async ingestDocuments(files: string[]) {
    for (const file of files) {
      const result = await this.neurolink.generate({
        input: { files: [file] },
        rag: {
          chunking: "semantic",
          chunkSize: 512,
          storeEmbeddings: async (embeddings, chunks) => {
            await this.vectorStore.upsert(
              chunks.map((chunk, i) => ({
                id: `${file}-${i}`,
                values: embeddings[i],
                metadata: { text: chunk, source: file }
              }))
            );
          }
        }
      });
    }
  }

  // Ask a question with streaming response
  async *ask(question: string) {
    // Get query embedding
    const { embedding } = await this.neurolink.embed({
      text: question,
      model: "text-embedding-3-large"
    });

    // Retrieve context
    const searchResults = await this.vectorStore.query({
      vector: embedding,
      topK: 5,
      includeMetadata: true
    });

    const context = searchResults.matches
      ?.map((m: any) => m.metadata?.text)
      .join("\n\n");

    // Stream the response
    const stream = await this.neurolink.stream({
      input: {
        text: `Context:\n${context}\n\nQuestion: ${question}\n\nAnswer:`
      },
      model: "claude-4-sonnet",
      provider: "anthropic"
    });

    for await (const chunk of stream.stream) {
      if ("content" in chunk) {
        yield chunk.content;
      }
    }
  }
}

// Usage
const qa = new DocumentQASystem();
await qa.ingestDocuments(["./docs/api.md", "./docs/faq.md"]);

// Stream the answer
for await (const token of qa.ask("What are the webhook retry policies?")) {
  process.stdout.write(token);
}

Best Practices for Production RAG

Choose chunk sizes wisely: 512 tokens works for most docs; use smaller chunks (256) for dense technical content
Use semantic chunking: Preserves meaning better than fixed-size chunks
Implement hybrid search: Combines semantic similarity with keyword matching
Add reranking: Improves precision, especially for large knowledge bases
Monitor retrieval metrics: Track hit rate, latency, and relevance scores
Handle edge cases: Return "I don't know" when confidence is low

Conclusion

Building a RAG system doesn't require complex infrastructure or multiple libraries. NeuroLink provides a unified, provider-agnostic approach to document processing, embedding generation, and retrieval—letting you focus on building great AI experiences rather than plumbing.

Whether you need quick one-line RAG or a fully customized pipeline with hybrid search and reranking, NeuroLink scales from prototype to production seamlessly.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

DEV Community

Build a Complete RAG System with TypeScript: From Embeddings to Retrieval

Build a Complete RAG System with TypeScript: From Embeddings to Retrieval

Understanding the RAG Pipeline

Step 1: Document Processing and Chunking

Step 2: Generating Embeddings

Step 3: Vector Storage and Retrieval

Step 4: The Complete RAG Query

Simplified RAG with NeuroLink's Built-in Feature

Advanced: Hybrid Search and Reranking

Building a Production-Ready Document Q&A System

Best Practices for Production RAG

Conclusion

Top comments (0)