Built the same RAG system in FastAPI and Ruby on Rails. FastAPI took weeks, Rails took 24 hours. Here's what that taught me about choosing frameworks for AI products.
A hands-on comparison from a Rails developer's first AI project
Picture this: You’re starting a new RAG project. You open your laptop and wonder: “Do I really need Python for this?”
Every tutorial assumes Python. Almost every example uses FastAPI or Flask. And there I was thinking, “But… I already know Rails.”
I’m a Ruby on Rails developer, and I love Rails. I decided to dive in anyway and build a RAG system with FastAPI, learning embeddings, vector databases, and how to make an LLM actually answer questions from documents.
Then our company announced an AI hackathon. I had 48 hours to build another RAG system. This time, I built it in Rails. It wasn’t about rewriting; it was a practical choice under tight deadlines.
Same features. Same vector database. Same LLM. Same project, different framework.
What surprised me most was how much smoother the experience felt.
This article shares my experience building the same RAG system twice, what changed, and what I learned about Rails, FastAPI, and building AI-powered features in real projects.
TL;DR
- Built the same RAG system in FastAPI (took weeks) and Rails (took 24 hours)
- The actual AI logic was identical—the difference was infrastructure
- Rails' mature tooling (Sidekiq, ActiveRecord, console) made development faster
- Python still wins for ML-heavy experimentation
- My recommendation: Rails for the app, Python microservices only when needed
- You don't need to rewrite your Rails app in Python to add AI features
What I built (in both versions)
Both RAG systems had the same core functionality:
The tech stack:
FastAPI version:
- FastAPI for the API layer
- Celery for background jobs
- SQLAlchemy for database access
- OpenAI API for embeddings and completions
- pgvector for similarity search
Rails version:
- Rails API backend
- React frontend
- Sidekiq for background jobs
- ActiveRecord for database access
- Same OpenAI API and pgvector setup
The AI logic was identical. The framework wrapping it was different.
The Architecture (Same for Both)
Both implementations follow this exact flow:
- Document Upload → User uploads PDF via web interface
- Text Extraction → Extract and clean text from PDF
- Chunking → Split document into ~500 token chunks with overlap
- Embedding Generation → Send chunks to OpenAI's embedding API
- Vector Storage → Store embeddings in Pinecone with metadata
-
Question Processing → When user asks a question:
- Generate embedding for the question(text-embedding-3-small model)
- Query Pinecone for top 5 similar chunks
- Send question + context to GPT-4
- Stream response back to user
Next Up: Where the Differences Actually Showed Up
The FastAPI Version: Where I Spent My Time
Building the FastAPI version worked. But I spent more time on infrastructure than on the actual AI features.
In theory, Celery handles async tasks. In practice, I became the one handling Celery.
When an embedding job failed (and they did, API timeouts, rate limits, malformed PDFs), here's what debugging looked like:
Background jobs became a maintenance burden:
# FastAPI/Celery - Replay a failed embedding job
# 1. Find the task ID in logs
# 2. Check Celery flower or redis
# 3. Manually construct retry logic
# 4. Hope the async session doesn't break again
@celery_app.task(bind=True, max_retries=3)
async def embed_document(self, doc_id):
try:
async with get_db_session() as session:
# embedding logic
pass
except Exception as exc:
raise self.retry(exc=exc, countdown=60)
Compare this to Rails:
# Rails - Replay failed job from console or UI
DocumentEmbeddingJob.perform_later(document.id)
# Built-in retry with exponential backoff
class DocumentEmbeddingJob < ApplicationJob
retry_on OpenAI::Error, wait: :exponentially_longer, attempts: 5
def perform(document_id)
document = Document.find(document_id)
embedding = OpenAI.embed(document.content)
document.update!(embedding: embedding)
end
end
In Rails, I can see failed jobs in Sidekiq's web UI, click "Retry," and watch it work. In FastAPI, I was writing custom monitoring and retry logic.
Database session management became a daily puzzle:
Every async endpoint needed careful session handling. I'd write a feature, run tests, and watch them randomly fail because some session somewhere wasn't properly closed. I spent more time reading asyncio documentation than building features.
# FastAPI - Manual session lifecycle everywhere
@app.post("/documents")
async def create_document(doc: DocumentCreate):
async with get_db_session() as session:
async with session.begin():
# Don't forget to close this!
# Or rollback on error!
# Or handle connection pool limits!
pass
Meanwhile in Rails? ActiveRecord just handles it. I never thought about sessions once during the hackathon.
Deployment felt fragile:
I had to manually configure:
- Async workers
- Job queues
- Retry policies
- Monitoring dashboards
- Database connection pools for async
Rails gives me all of this out of the box.
The Rails version: exactly what I needed
During the hackathon, I had 48 hours to ship a working demo. Not a prototype. Not a proof-of-concept. A working system that non-technical people could use.
I chose Rails not because it's "better for AI" (it's probably not), but because I knew exactly where my time would go: building features, not configuring infrastructure.
Here's what the system did:
- Ingested documentation from multiple sources
- Split content into chunks
- Generated embeddings via OpenAI
- Stored vectors in Postgres with pgvector
- Answered questions using retrieved context
- Actually ship it before the demo
The entire backend:
# Model
class Document < ApplicationRecord
has_neighbors :embedding
after_create_commit :enqueue_embedding_job
def enqueue_embedding_job
DocumentEmbeddingJob.perform_later(id)
end
end
# Background job
class DocumentEmbeddingJob < ApplicationJob
retry_on OpenAI::Error, wait: :exponentially_longer
def perform(document_id)
document = Document.find(document_id)
chunks = split_into_chunks(document.content)
chunks.each do |chunk|
embedding = generate_embedding(chunk)
DocumentChunk.create!(
document: document,
content: chunk,
embedding: embedding
)
end
end
private
def generate_embedding(text)
client = OpenAI::Client.new
response = client.embeddings(
parameters: {
model: "text-embedding-3-small",
input: text
}
)
response.dig("data", 0, "embedding")
end
end
# Query service
class RagQueryService
def initialize(query)
@query = query
@embedding = generate_embedding(query) # Convert question to vector
end
def answer
# Find the 5 most similar document chunks
relevant_chunks = DocumentChunk
.nearest_neighbors(:embedding, @embedding, distance: "cosine")
.limit(5)
# Combine them into context
context = relevant_chunks.map(&:content).join("\n\n")
# Ask GPT-4 with the context
client = OpenAI::Client.new
response = client.chat(
parameters: {
model: "gpt-4",
messages: [
{ role: "system", content: "Answer based on this context: #{context}" },
{ role: "user", content: @query }
]
}
)
response.dig("choices", 0, "message", "content")
end
end
Reality check: This code is simplified for the article. The real version had error handling, logging, and rate limiting. But the core logic? Pretty much this.
That's it. The entire RAG pipeline in ~60 lines of readable Ruby.
No async session management. No custom retry logic. No Celery flower dashboard. Just Rails doing what Rails does best: letting you build features instead of infrastructure.
Could I have made the FastAPI version this clean? Maybe. But I didn't have time to figure it out. And that's the point.
What made Rails faster for me
I shipped the Rails version in 24 hours. The FastAPI version took weeks to get stable.
Here's why:
1. Background jobs are a solved problem
Sidekiq gives me:
- Web UI to monitor jobs
- Automatic retries with backoff
- Dead job queues
- Performance metrics
- One-click job replay
I didn't write any of this. It was already there.
2. Database access is predictable
No async session managers. No connection pool tuning. No event loop surprises. It just works.
3. Debugging is straightforward
When an embedding job failed:
- Open Sidekiq UI
- See the error and full backtrace
- Click "Retry"
- Check logs if needed
In FastAPI, I was tailing Celery logs and rebuilding context manually.
4. The ecosystem has what I needed
-
ruby-openaigem for API calls -
neighborgem for vector similarity -
pgvectorextension for Postgres - Standard Rails patterns for everything else
No async complications. No compatibility issues.
5. The developer workflow is integrated, not assembled
This is easy to underestimate until you feel it.
Rails gives you a tight feedback loop by default:
- A powerful interactive Rails console for debugging live data and jobs
- Database migrations that are simple, versioned, and reversible
- A test setup that’s predictable and deeply integrated with the framework
When I wanted to:
- Inspect a document’s embeddings
- Replay a failed job
- Tweak a schema and re-run ingestion
- Write or debug a failing test
When I needed to inspect embeddings, replay a failed job, or tweak the schema, I did it from the console in seconds.
In the FastAPI setup, these same tasks required more manual work:
- Managing Alembic migrations explicitly
- Configuring async test fixtures
- Debugging through logs instead of an interactive console
None of this is impossible in Python — but it is more fragmented.
Rails optimizes for flow.
FastAPI optimizes for flexibility.
When you're iterating on AI features under a deadline, that difference compounds daily.
The key insight: AI primitives are framework-agnostic
After building both versions, here's what became clear:
The actual AI logic was identical.
Both systems used the same process:
- Split documents into chunks
- Generate embeddings via OpenAI API
- Store vectors in Postgres
- Retrieve similar chunks
- Pass context to LLM
The intelligence came from:
- Quality of input data
- Chunking strategy
- Prompt engineering
- Retrieval precision
Not from the framework.
Rails didn’t make the model smarter. It made the system easier to reason about, operate, and change.
And for a product engineer shipping features, that matters more than access to the latest ML libraries.
Where Python still clearly wins
Let me be clear: there are cases where Python is the right choice.
Use Python when you need:
- Advanced document loaders (LangChain, LlamaIndex)
- Custom re-ranking models
- Sophisticated evaluation frameworks
- Rapid ML experimentation
- Fine-tuning workflows
For research and ML-heavy work, Python is unmatched.
Examples where I'd choose Python:
- You're experimenting with multiple embedding models weekly
- You need custom re-ranking with a BERT model
- You're running A/B tests on different chunking strategies
- Your team is already Python-first
But for building a production RAG feature in an existing Rails app? You probably don't need to rewrite everything in Python.
My Approach Now: Hybrid Architecture
After building both versions, I use this mental model:
Rails handles the application:
- API endpoints
- Background jobs
- Database models
- User authentication
- Business logic
Python microservices for ML-specific work:
- Custom re-ranking models
- Advanced document parsing
- Specialized ML pipelines
- Evaluation frameworks
Why this works:
- Product code stays stable and maintainable
- AI experiments stay isolated
- Infrastructure stays simple
- Team velocity stays high
Instead of migrating my entire Rails app to FastAPI, I integrate Python only where it adds specific value.
The real takeaway
Strip away the AI layer, and a RAG system is still just a distributed application:
- Background jobs
- Database transactions
- Retries and failures
- User-facing latency
The framework you choose determines how painful these problems are to live with.
That's why this comparison isn't really about Rails vs FastAPI.
It's about choosing tools that let you focus on product behavior instead of infrastructure glue.
Final thoughts
Building the same RAG system in two different frameworks taught me something simple but important:
The hard parts of production software aren't the AI API calls. They're:
- Reliable background processing
- Debugging production failures
- Managing deployments
- Maintaining clean architecture
- Shipping quickly
If you're already working in Rails (or Django, or any mature web framework), you already have solutions for these problems. Adding AI features doesn't change that.
Python has incredible AI tooling. Rails has incredible application tooling.
You don't need to choose one or the other. You can use both strategically.
If you're a Rails developer wondering whether you need to learn FastAPI to build AI features: you probably don't. Start with Rails. Add Python services only when you hit a real limitation.
Building AI features in non-Python frameworks? I'd love to hear about your experience. Drop a comment or connect with me on LinkedIn.

Top comments (0)