DEV Community

NeuroLink AI
NeuroLink AI

Posted on

Build a Complete RAG System with TypeScript: From Embeddings to Retrieval

Build a Complete RAG System with TypeScript: From Embeddings to Retrieval

Retrieval-Augmented Generation (RAG) has become the standard pattern for building AI applications that can reason over private data. Instead of fine-tuning models on your documents, RAG retrieves relevant context at query time, making it more flexible and cost-effective. With NeuroLink, building a production-ready RAG system in TypeScript takes minutes, not weeks.

Understanding the RAG Pipeline

A complete RAG system consists of four key components:

  1. Document Processing: Loading and chunking documents
  2. Embedding Generation: Converting text to vector representations
  3. Vector Storage: Persisting embeddings for efficient retrieval
  4. Retrieval & Generation: Finding relevant context and generating responses

NeuroLink provides unified APIs for all these stages, working seamlessly across 13 AI providers.

Step 1: Document Processing and Chunking

Start by processing your documents with NeuroLink's built-in RAG capabilities:

import { NeuroLink } from "@juspay/neurolink";
import { readFile } from "fs/promises";

const neurolink = new NeuroLink();

// Load your documents
const documents = await Promise.all([
  readFile("./docs/api-reference.md", "utf-8"),
  readFile("./docs/guides/authentication.md", "utf-8"),
  readFile("./docs/guides/webhooks.md", "utf-8"),
  readFile("./docs/faq.md", "utf-8")
]);

// Configure chunking strategy
const ragConfig = {
  chunking: {
    strategy: "semantic",     // semantic, fixed, recursive, or markdown
    chunkSize: 512,           // tokens per chunk
    chunkOverlap: 50,         // overlap between chunks
    separators: ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]
  },
  embedding: {
    model: "text-embedding-3-large",
    provider: "openai"
  }
};
Enter fullscreen mode Exit fullscreen mode

NeuroLink supports 10 different chunking strategies:

Strategy Best For
semantic Preserves meaning boundaries
fixed Simple, predictable chunks
recursive Hierarchical document structure
markdown Markdown-formatted docs
code Source code files

Step 2: Generating Embeddings

NeuroLink's embed() and embedMany() APIs handle embedding generation with automatic batching and provider fallback:

// Generate embeddings for a single text
const singleEmbedding = await neurolink.embed({
  text: "How do I authenticate API requests?",
  model: "text-embedding-3-large",
  provider: "openai"
});

console.log(singleEmbedding.embedding); // float[] vector

// Batch embed multiple documents efficiently
const chunks = [
  "Authentication requires an API key in the Authorization header...",
  "Webhooks are sent as POST requests to your configured endpoint...",
  "Rate limits are enforced per API key at 1000 requests per minute...",
  // ... hundreds more chunks
];

const embeddings = await neurolink.embedMany({
  texts: chunks,
  model: "text-embedding-3-large",
  provider: "openai",
  batchSize: 100  // Automatic batching for rate limits
});

console.log(`Generated ${embeddings.length} embeddings`);
Enter fullscreen mode Exit fullscreen mode

The embedMany() API automatically:

  • Batches requests to respect provider rate limits
  • Retries failed batches with exponential backoff
  • Falls back to alternative providers if configured
  • Caches results to reduce API costs

Step 3: Vector Storage and Retrieval

For production systems, you'll want to store embeddings in a vector database. Here's how to integrate with popular options:

import { Pinecone } from "@pinecone-database/pinecone";

// Initialize vector store
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index("neurolink-docs");

// Store embeddings with metadata
async function storeEmbeddings(
  chunks: string[],
  embeddings: number[][]
) {
  const records = chunks.map((chunk, i) => ({
    id: `chunk-${i}`,
    values: embeddings[i],
    metadata: {
      text: chunk,
      source: "api-reference.md",
      timestamp: Date.now()
    }
  }));

  await index.upsert(records);
}

// Retrieve relevant chunks
async function retrieveContext(query: string, topK: number = 5) {
  // Generate query embedding
  const queryEmbedding = await neurolink.embed({
    text: query,
    model: "text-embedding-3-large",
    provider: "openai"
  });

  // Search vector store
  const results = await index.query({
    vector: queryEmbedding.embedding,
    topK,
    includeMetadata: true
  });

  return results.matches?.map(m => m.metadata?.text) ?? [];
}
Enter fullscreen mode Exit fullscreen mode

Step 4: The Complete RAG Query

Now put it all together for intelligent document Q&A:

async function askQuestion(question: string) {
  // 1. Retrieve relevant context
  const contexts = await retrieveContext(question, 5);

  // 2. Build augmented prompt
  const contextText = contexts.join("\n\n---\n\n");

  // 3. Generate answer with context
  const result = await neurolink.generate({
    input: {
      text: `Answer the question based on the following context.

             Context:
             ${contextText}

             Question: ${question}

             Provide a clear, accurate answer. If the context doesn't contain
             the answer, say "I don't have enough information to answer that."`
    },
    model: "claude-4-sonnet",
    provider: "anthropic"
  });

  return result.output.text;
}

// Usage
const answer = await askQuestion(
  "What's the rate limit for webhook endpoints?"
);
console.log(answer);
Enter fullscreen mode Exit fullscreen mode

Simplified RAG with NeuroLink's Built-in Feature

For common use cases, NeuroLink offers a streamlined approach—just pass your files directly:

// One-line RAG: NeuroLink handles chunking, embedding, and retrieval
const result = await neurolink.generate({
  input: {
    text: "What are the authentication requirements?",
    files: ["./docs/api-reference.md", "./docs/guides/authentication.md"]
  },
  model: "claude-4-sonnet",
  provider: "anthropic",
  rag: {
    chunking: "semantic",
    chunkSize: 512,
    search: "hybrid"  // semantic + keyword search
  }
});

// NeuroLink automatically:
// 1. Reads the files
// 2. Chunks them using semantic strategy
// 3. Generates embeddings
// 4. Retrieves relevant chunks
// 5. Includes them in the prompt
// 6. Generates the response
Enter fullscreen mode Exit fullscreen mode

Advanced: Hybrid Search and Reranking

For better retrieval quality, combine semantic search with keyword matching and reranking:

const result = await neurolink.generate({
  input: { text: "How do I handle webhook retries?" },
  rag: {
    files: ["./docs/webhooks.md", "./docs/error-handling.md"],
    search: "hybrid",           // semantic + keyword
    reranker: "cohere",         // cohere, flashrank, or cross-encoder
    topK: 10,                   // retrieve top 10
    rerankTopK: 5               // return top 5 after reranking
  }
});
Enter fullscreen mode Exit fullscreen mode

Rerankers reorder results based on relevance to the query, significantly improving accuracy for complex questions.

Building a Production-Ready Document Q&A System

Here's a complete example with error handling and streaming:

import { NeuroLink } from "@juspay/neurolink";
import { Pinecone } from "@pinecone-database/pinecone";

class DocumentQASystem {
  private neurolink: NeuroLink;
  private vectorStore: any;

  constructor() {
    this.neurolink = new NeuroLink();
    this.vectorStore = new Pinecone({
      apiKey: process.env.PINECONE_API_KEY!
    }).index("docs");
  }

  // Ingest documents into the system
  async ingestDocuments(files: string[]) {
    for (const file of files) {
      const result = await this.neurolink.generate({
        input: { files: [file] },
        rag: {
          chunking: "semantic",
          chunkSize: 512,
          storeEmbeddings: async (embeddings, chunks) => {
            await this.vectorStore.upsert(
              chunks.map((chunk, i) => ({
                id: `${file}-${i}`,
                values: embeddings[i],
                metadata: { text: chunk, source: file }
              }))
            );
          }
        }
      });
    }
  }

  // Ask a question with streaming response
  async *ask(question: string) {
    // Get query embedding
    const { embedding } = await this.neurolink.embed({
      text: question,
      model: "text-embedding-3-large"
    });

    // Retrieve context
    const searchResults = await this.vectorStore.query({
      vector: embedding,
      topK: 5,
      includeMetadata: true
    });

    const context = searchResults.matches
      ?.map((m: any) => m.metadata?.text)
      .join("\n\n");

    // Stream the response
    const stream = await this.neurolink.stream({
      input: {
        text: `Context:\n${context}\n\nQuestion: ${question}\n\nAnswer:`
      },
      model: "claude-4-sonnet",
      provider: "anthropic"
    });

    for await (const chunk of stream.stream) {
      if ("content" in chunk) {
        yield chunk.content;
      }
    }
  }
}

// Usage
const qa = new DocumentQASystem();
await qa.ingestDocuments(["./docs/api.md", "./docs/faq.md"]);

// Stream the answer
for await (const token of qa.ask("What are the webhook retry policies?")) {
  process.stdout.write(token);
}
Enter fullscreen mode Exit fullscreen mode

Best Practices for Production RAG

  1. Choose chunk sizes wisely: 512 tokens works for most docs; use smaller chunks (256) for dense technical content
  2. Use semantic chunking: Preserves meaning better than fixed-size chunks
  3. Implement hybrid search: Combines semantic similarity with keyword matching
  4. Add reranking: Improves precision, especially for large knowledge bases
  5. Monitor retrieval metrics: Track hit rate, latency, and relevance scores
  6. Handle edge cases: Return "I don't know" when confidence is low

Conclusion

Building a RAG system doesn't require complex infrastructure or multiple libraries. NeuroLink provides a unified, provider-agnostic approach to document processing, embedding generation, and retrieval—letting you focus on building great AI experiences rather than plumbing.

Whether you need quick one-line RAG or a fully customized pipeline with hybrid search and reranking, NeuroLink scales from prototype to production seamlessly.


NeuroLink — The Universal AI SDK for TypeScript

Top comments (0)