DEV Community

Ganesh Navale
Ganesh Navale

Posted on

Why My Second RAG System Was Built in Rails, Not Python’s FastAPI

Built the same RAG system in FastAPI and Ruby on Rails. FastAPI took weeks, Rails took 24 hours. Here's what that taught me about choosing frameworks for AI products.

A hands-on comparison from a Rails developer's first AI project

Diagram showing Rails framework integrating with AI components: OpenAI API, embeddings, documents, and knowledge representations

Picture this: You’re starting a new RAG project. You open your laptop and wonder: “Do I really need Python for this?”

Every tutorial assumes Python. Almost every example uses FastAPI or Flask. And there I was thinking, “But… I already know Rails.”

I’m a Ruby on Rails developer, and I love Rails. I decided to dive in anyway and build a RAG system with FastAPI, learning embeddings, vector databases, and how to make an LLM actually answer questions from documents.

Then our company announced an AI hackathon. I had 48 hours to build another RAG system. This time, I built it in Rails. It wasn’t about rewriting; it was a practical choice under tight deadlines.

Same features. Same vector database. Same LLM. Same project, different framework.

What surprised me most was how much smoother the experience felt.

This article shares my experience building the same RAG system twice, what changed, and what I learned about Rails, FastAPI, and building AI-powered features in real projects.

TL;DR

  • Built the same RAG system in FastAPI (took weeks) and Rails (took 24 hours)
  • The actual AI logic was identical—the difference was infrastructure
  • Rails' mature tooling (Sidekiq, ActiveRecord, console) made development faster
  • Python still wins for ML-heavy experimentation
  • My recommendation: Rails for the app, Python microservices only when needed
  • You don't need to rewrite your Rails app in Python to add AI features

What I built (in both versions)

Both RAG systems had the same core functionality:

The tech stack:

FastAPI version:

  • FastAPI for the API layer
  • Celery for background jobs
  • SQLAlchemy for database access
  • OpenAI API for embeddings and completions
  • pgvector for similarity search

Rails version:

  • Rails API backend
  • React frontend
  • Sidekiq for background jobs
  • ActiveRecord for database access
  • Same OpenAI API and pgvector setup

The AI logic was identical. The framework wrapping it was different.

The Architecture (Same for Both)

Both implementations follow this exact flow:

  1. Document Upload → User uploads PDF via web interface
  2. Text Extraction → Extract and clean text from PDF
  3. Chunking → Split document into ~500 token chunks with overlap
  4. Embedding Generation → Send chunks to OpenAI's embedding API
  5. Vector Storage → Store embeddings in Pinecone with metadata
  6. Question Processing → When user asks a question:
    • Generate embedding for the question(text-embedding-3-small model)
    • Query Pinecone for top 5 similar chunks
    • Send question + context to GPT-4
    • Stream response back to user

Next Up: Where the Differences Actually Showed Up

The FastAPI Version: Where I Spent My Time

Building the FastAPI version worked. But I spent more time on infrastructure than on the actual AI features.

In theory, Celery handles async tasks. In practice, I became the one handling Celery.

When an embedding job failed (and they did, API timeouts, rate limits, malformed PDFs), here's what debugging looked like:

Background jobs became a maintenance burden:

# FastAPI/Celery - Replay a failed embedding job
# 1. Find the task ID in logs
# 2. Check Celery flower or redis
# 3. Manually construct retry logic
# 4. Hope the async session doesn't break again

@celery_app.task(bind=True, max_retries=3)
async def embed_document(self, doc_id):
    try:
        async with get_db_session() as session:
            # embedding logic
            pass
    except Exception as exc:
        raise self.retry(exc=exc, countdown=60)
Enter fullscreen mode Exit fullscreen mode

Compare this to Rails:

# Rails - Replay failed job from console or UI
DocumentEmbeddingJob.perform_later(document.id)

# Built-in retry with exponential backoff
class DocumentEmbeddingJob < ApplicationJob
  retry_on OpenAI::Error, wait: :exponentially_longer, attempts: 5

  def perform(document_id)
    document = Document.find(document_id)
    embedding = OpenAI.embed(document.content)
    document.update!(embedding: embedding)
  end
end
Enter fullscreen mode Exit fullscreen mode

In Rails, I can see failed jobs in Sidekiq's web UI, click "Retry," and watch it work. In FastAPI, I was writing custom monitoring and retry logic.

Database session management became a daily puzzle:

Every async endpoint needed careful session handling. I'd write a feature, run tests, and watch them randomly fail because some session somewhere wasn't properly closed. I spent more time reading asyncio documentation than building features.

# FastAPI - Manual session lifecycle everywhere
@app.post("/documents")
async def create_document(doc: DocumentCreate):
    async with get_db_session() as session:
        async with session.begin():
            # Don't forget to close this!
            # Or rollback on error!
            # Or handle connection pool limits!
            pass
Enter fullscreen mode Exit fullscreen mode

Meanwhile in Rails? ActiveRecord just handles it. I never thought about sessions once during the hackathon.

Deployment felt fragile:

I had to manually configure:

  • Async workers
  • Job queues
  • Retry policies
  • Monitoring dashboards
  • Database connection pools for async

Rails gives me all of this out of the box.

The Rails version: exactly what I needed

During the hackathon, I had 48 hours to ship a working demo. Not a prototype. Not a proof-of-concept. A working system that non-technical people could use.

I chose Rails not because it's "better for AI" (it's probably not), but because I knew exactly where my time would go: building features, not configuring infrastructure.

Here's what the system did:

  • Ingested documentation from multiple sources
  • Split content into chunks
  • Generated embeddings via OpenAI
  • Stored vectors in Postgres with pgvector
  • Answered questions using retrieved context
  • Actually ship it before the demo

The entire backend:

# Model
class Document < ApplicationRecord
  has_neighbors :embedding

  after_create_commit :enqueue_embedding_job

  def enqueue_embedding_job
    DocumentEmbeddingJob.perform_later(id)
  end
end

# Background job
class DocumentEmbeddingJob < ApplicationJob
  retry_on OpenAI::Error, wait: :exponentially_longer

  def perform(document_id)
    document = Document.find(document_id)

    chunks = split_into_chunks(document.content)
    chunks.each do |chunk|
      embedding = generate_embedding(chunk)
      DocumentChunk.create!(
        document: document,
        content: chunk,
        embedding: embedding
      )
    end
  end

  private

  def generate_embedding(text)
    client = OpenAI::Client.new
    response = client.embeddings(
      parameters: {
        model: "text-embedding-3-small",
        input: text
      }
    )
    response.dig("data", 0, "embedding")
  end
end

# Query service
class RagQueryService
  def initialize(query)
    @query = query
    @embedding = generate_embedding(query) # Convert question to vector
  end

  def answer
    # Find the 5 most similar document chunks
    relevant_chunks = DocumentChunk
      .nearest_neighbors(:embedding, @embedding, distance: "cosine")
      .limit(5)

    # Combine them into context
    context = relevant_chunks.map(&:content).join("\n\n")

    # Ask GPT-4 with the context
    client = OpenAI::Client.new
    response = client.chat(
      parameters: {
        model: "gpt-4",
        messages: [
          { role: "system", content: "Answer based on this context: #{context}" },
          { role: "user", content: @query }
        ]
      }
    )

    response.dig("choices", 0, "message", "content")
  end
end
Enter fullscreen mode Exit fullscreen mode

Reality check: This code is simplified for the article. The real version had error handling, logging, and rate limiting. But the core logic? Pretty much this.

That's it. The entire RAG pipeline in ~60 lines of readable Ruby.
No async session management. No custom retry logic. No Celery flower dashboard. Just Rails doing what Rails does best: letting you build features instead of infrastructure.

Could I have made the FastAPI version this clean? Maybe. But I didn't have time to figure it out. And that's the point.

What made Rails faster for me

I shipped the Rails version in 24 hours. The FastAPI version took weeks to get stable.

Here's why:

1. Background jobs are a solved problem

Sidekiq gives me:

  • Web UI to monitor jobs
  • Automatic retries with backoff
  • Dead job queues
  • Performance metrics
  • One-click job replay

I didn't write any of this. It was already there.

2. Database access is predictable

No async session managers. No connection pool tuning. No event loop surprises. It just works.

3. Debugging is straightforward

When an embedding job failed:

  • Open Sidekiq UI
  • See the error and full backtrace
  • Click "Retry"
  • Check logs if needed

In FastAPI, I was tailing Celery logs and rebuilding context manually.

4. The ecosystem has what I needed

  • ruby-openai gem for API calls
  • neighbor gem for vector similarity
  • pgvector extension for Postgres
  • Standard Rails patterns for everything else

No async complications. No compatibility issues.

5. The developer workflow is integrated, not assembled

This is easy to underestimate until you feel it.

Rails gives you a tight feedback loop by default:

  • A powerful interactive Rails console for debugging live data and jobs
  • Database migrations that are simple, versioned, and reversible
  • A test setup that’s predictable and deeply integrated with the framework

When I wanted to:

  • Inspect a document’s embeddings
  • Replay a failed job
  • Tweak a schema and re-run ingestion
  • Write or debug a failing test

When I needed to inspect embeddings, replay a failed job, or tweak the schema, I did it from the console in seconds.

In the FastAPI setup, these same tasks required more manual work:

  • Managing Alembic migrations explicitly
  • Configuring async test fixtures
  • Debugging through logs instead of an interactive console

None of this is impossible in Python — but it is more fragmented.

Rails optimizes for flow.
FastAPI optimizes for flexibility.

When you're iterating on AI features under a deadline, that difference compounds daily.

The key insight: AI primitives are framework-agnostic

After building both versions, here's what became clear:

The actual AI logic was identical.

Both systems used the same process:

  1. Split documents into chunks
  2. Generate embeddings via OpenAI API
  3. Store vectors in Postgres
  4. Retrieve similar chunks
  5. Pass context to LLM

The intelligence came from:

  • Quality of input data
  • Chunking strategy
  • Prompt engineering
  • Retrieval precision

Not from the framework.

Rails didn’t make the model smarter. It made the system easier to reason about, operate, and change.

And for a product engineer shipping features, that matters more than access to the latest ML libraries.

Where Python still clearly wins

Let me be clear: there are cases where Python is the right choice.

Use Python when you need:

  • Advanced document loaders (LangChain, LlamaIndex)
  • Custom re-ranking models
  • Sophisticated evaluation frameworks
  • Rapid ML experimentation
  • Fine-tuning workflows

For research and ML-heavy work, Python is unmatched.

Examples where I'd choose Python:

  • You're experimenting with multiple embedding models weekly
  • You need custom re-ranking with a BERT model
  • You're running A/B tests on different chunking strategies
  • Your team is already Python-first

But for building a production RAG feature in an existing Rails app? You probably don't need to rewrite everything in Python.

My Approach Now: Hybrid Architecture

After building both versions, I use this mental model:

Rails handles the application:

  • API endpoints
  • Background jobs
  • Database models
  • User authentication
  • Business logic

Python microservices for ML-specific work:

  • Custom re-ranking models
  • Advanced document parsing
  • Specialized ML pipelines
  • Evaluation frameworks

Why this works:

  • Product code stays stable and maintainable
  • AI experiments stay isolated
  • Infrastructure stays simple
  • Team velocity stays high

Instead of migrating my entire Rails app to FastAPI, I integrate Python only where it adds specific value.

The real takeaway

Strip away the AI layer, and a RAG system is still just a distributed application:

  • Background jobs
  • Database transactions
  • Retries and failures
  • User-facing latency

The framework you choose determines how painful these problems are to live with.

That's why this comparison isn't really about Rails vs FastAPI.

It's about choosing tools that let you focus on product behavior instead of infrastructure glue.

Final thoughts

Building the same RAG system in two different frameworks taught me something simple but important:

The hard parts of production software aren't the AI API calls. They're:

  • Reliable background processing
  • Debugging production failures
  • Managing deployments
  • Maintaining clean architecture
  • Shipping quickly

If you're already working in Rails (or Django, or any mature web framework), you already have solutions for these problems. Adding AI features doesn't change that.

Python has incredible AI tooling. Rails has incredible application tooling.

You don't need to choose one or the other. You can use both strategically.

If you're a Rails developer wondering whether you need to learn FastAPI to build AI features: you probably don't. Start with Rails. Add Python services only when you hit a real limitation.


Building AI features in non-Python frameworks? I'd love to hear about your experience. Drop a comment or connect with me on LinkedIn.

Top comments (0)