DEV Community

Wilbur Suero
Wilbur Suero

Posted on

How I Built a RAG System in Rails Using Nomic Embeddings and OpenAI

Retrieval-Augmented Generation (RAG) lets you bring your own data to LLMs—and get real answers. I’ll show how I used the open-source nomic-embed-text-v2-moe model for semantic search in a Rails app, while still using OpenAI for generation.

🧠 What is RAG?

RAG (Retrieval-Augmented Generation) enhances LLMs by feeding them relevant chunks of your data before generating a response. Instead of fine-tuning, we give the model useful context.

Here's the basic pipeline:

[ User Question ]
        
[ Embed the Question (Nomic) ]
        
[ Vector Search in PgVector ]
        
[ Retrieve Relevant Chunks ]
        
[ Assemble Prompt ]
        
[ Generate Answer with OpenAI ]

Enter fullscreen mode Exit fullscreen mode

🧰 The Stack

  • Rails – Backend framework, routes, controllers, and persistence
  • Nomic Embedding Model – For semantic understanding of data
  • FastAPI – Lightweight Python server to serve embeddings
  • PgVector – PostgreSQL extension to store and query vector data
  • OpenAI GPT-4 / GPT-3.5 – For the final response generation

🛠 Step 1: Run the Nomic Model Locally (Optional but Fast)

You can run the nomic-embed-text-v2-moe model using sentence-transformers in a Python FastAPI app:

from fastapi import FastAPI, Request
from sentence_transformers import SentenceTransformer

app = FastAPI()
model = SentenceTransformer("nomic-ai/nomic-embed-text-v2-moe")

@app.post("/embed")
async def embed(req: Request):
    data = await req.json()
    input_text = data["input"]
    embedding = model.encode(input_text).tolist()
    return { "embedding": embedding }
Enter fullscreen mode Exit fullscreen mode

This becomes your internal embedding API, replacing OpenAI’s /embeddings.


📄 Step 2: Chunk and Store Your Data

Split your content into short passages (~100–300 words), embed them via your FastAPI endpoint, and store the results in PostgreSQL with pgvector.

Add a vector column:

psql -d your_db -c "CREATE EXTENSION IF NOT EXISTS vector;"
Enter fullscreen mode Exit fullscreen mode
class AddEmbeddingToDocuments < ActiveRecord::Migration[7.1]
  def change
    add_column :documents, :embedding, :vector, limit: 768 # Nomic v2-moe size
  end
end
Enter fullscreen mode Exit fullscreen mode

🤖 Step 3: Embed User Queries via Nomic

In your Rails controller:

def get_embedding(text)
  response = Faraday.post("http://localhost:8000/embed", { input: text }.to_json,
                          "Content-Type" => "application/json")
  JSON.parse(response.body)["embedding"]
end
Enter fullscreen mode Exit fullscreen mode

Use the same model for both document and query embeddings.


🔍 Step 4: Perform Vector Search with PgVector

Search your documents for the closest matches using cosine distance:

Document.order("embedding <-> cube(array[?])", query_vector).limit(5)
Enter fullscreen mode Exit fullscreen mode

These top chunks become the context for the LLM.


🧾 Step 5: Build a Smart Prompt for OpenAI

Concatenate the top passages and feed them into OpenAI’s chat API:

client.chat(
  parameters: {
    model: "gpt-4",
    messages: [
      { role: "system", content: "You are an assistant answering based on the provided context." },
      { role: "user", content: build_contextual_prompt(user_input, top_chunks) }
    ]
  }
)
Enter fullscreen mode Exit fullscreen mode

✅ Why Use Nomic for Embeddings?

  • High-quality, open-source, multilingual
  • No token limits — runs locally or self-hosted
  • Zero vendor lock-in at the embedding layer
  • Great performance on MTEB and real-world retrieval

💡 Why I Still Use OpenAI for the LLM

The generation step is where OpenAI shines. Instead of replacing it prematurely, I decoupled the embedding stage. Now I can experiment, optimize, and even switch LLMs later if needed.


🧠 Takeaways

  • RAG doesn’t need to be a heavyweight system.
  • Open-source embeddings + OpenAI generation = powerful, flexible hybrid.
  • PgVector + Rails makes vector search feel native and hackable.

Top comments (1)

Collapse
 
tryfreetool profile image
Try Free Tool

Good!