If you are building an AI product in Rails, the hard part is usually not calling the model API. The hard part is fitting model calls into a real application that has users, permissions, jobs, caching, search, and failure handling.
In this post, we’ll put the pieces together into a complete Rails architecture you can run on your own VPS stack.
The goal is simple:
- Rails handles HTTP, auth, admin screens, and business logic
- PostgreSQL stores app data and vectors
- Redis handles cache and job coordination
- background jobs run slow AI tasks
- model providers stay behind service objects
The moving parts
A practical Rails AI stack looks like this:
Browser
↓
Nginx
↓
Rails app (controllers, models, views, APIs)
├─ PostgreSQL (app data + pgvector)
├─ Redis (cache + job coordination)
├─ Active Job / Sidekiq workers
└─ AI providers (OpenAI, Anthropic, local models, etc.)
Do not let controllers talk directly to model APIs. Keep AI work behind service classes and jobs so the app stays testable.
Start with a normal Rails app
Create the app the usual way:
rails new ai_stack --postgresql
cd ai_stack
bundle add sidekiq neighbor ruby-openai
bin/rails generate model Document title:string content:text embedding:vector{1536}
bin/rails db:create db:migrate
neighbor gives Rails-friendly vector search support on top of pgvector.
Enable the extension in a migration:
class EnablePgvector < ActiveRecord::Migration[8.0]
def change
enable_extension "vector"
end
end
And define the model:
class Document < ApplicationRecord
has_neighbors :embedding
validates :title, :content, presence: true
end
Now your app can store both normal business data and embeddings in PostgreSQL.
Keep model access behind a service object
Wrap provider calls in one place.
# app/services/ai/chat_client.rb
module Ai
class ChatClient
def initialize(client: OpenAI::Client.new(access_token: ENV.fetch("OPENAI_API_KEY")))
@client = client
end
def ask(prompt)
response = @client.chat(
parameters: {
model: "gpt-4.1-mini",
messages: [{ role: "user", content: prompt }]
}
)
response.dig("choices", 0, "message", "content")
end
end
end
Your controller should not care which provider is underneath.
class ChatsController < ApplicationController
def create
answer = Ai::ChatClient.new.ask(params[:prompt])
render json: { answer: answer }
end
end
This looks small, but it gives you one place to handle timeouts, retries, logging, and provider swaps.
Move slow AI work into jobs
Embeddings, summarization, document parsing, and image generation should usually happen outside the request cycle.
# app/jobs/embed_document_job.rb
class EmbedDocumentJob < ApplicationJob
queue_as :default
def perform(document_id)
document = Document.find(document_id)
response = OpenAI::Client.new(access_token: ENV.fetch("OPENAI_API_KEY")).embeddings(
parameters: {
model: "text-embedding-3-small",
input: document.content
}
)
vector = response.dig("data", 0, "embedding")
document.update!(embedding: vector)
end
end
Kick the job off after create:
class Document < ApplicationRecord
has_neighbors :embedding
after_commit :enqueue_embedding, on: :create
private
def enqueue_embedding
EmbedDocumentJob.perform_later(id)
end
end
This keeps the user-facing request fast.
Build retrieval in one query path
Once documents have embeddings, retrieval stays inside your app.
# app/services/ai/retriever.rb
module Ai
class Retriever
def initialize(client: OpenAI::Client.new(access_token: ENV.fetch("OPENAI_API_KEY")))
@client = client
end
def search(query)
embedding = @client.embeddings(
parameters: {
model: "text-embedding-3-small",
input: query
}
).dig("data", 0, "embedding")
Document.nearest_neighbors(:embedding, embedding, distance: "cosine").first(5)
end
end
end
Now a RAG-style answer can combine retrieved records with a chat prompt.
module Ai
class AnswerQuestion
def call(question)
docs = Retriever.new.search(question)
context = docs.map(&:content).join("\n\n")
ChatClient.new.ask(<<~PROMPT)
Answer the question using the context below.
If the context is missing the answer, say you do not know.
Context:
#{context}
Question:
#{question}
PROMPT
end
end
end
That is the core AI loop: ingest, embed, retrieve, answer.
Separate web and worker processes
On a VPS, run at least two process types:
bundle exec puma -C config/puma.rb
bundle exec sidekiq
The web process serves users.
The worker process handles embeddings, summarization, retries, and fan-out jobs.
A basic docker-compose.yml on your own server could look like this:
services:
web:
build: .
command: bundle exec puma -C config/puma.rb
env_file: .env
depends_on: [db, redis]
worker:
build: .
command: bundle exec sidekiq
env_file: .env
depends_on: [db, redis]
db:
image: pgvector/pgvector:pg16
redis:
image: redis:7
No platform magic. Just predictable services.
Add caching and observability early
AI features get expensive fast. Cache what you can.
Rails.cache.fetch(["summary", article.cache_key_with_version], expires_in: 6.hours) do
Ai::ChatClient.new.ask("Summarize: #{article.body}")
end
Also log provider timing and failures.
Rails.logger.info({ event: "ai.request", feature: "chat", model: "gpt-4.1-mini" }.to_json)
You want answers to these questions in production:
- which feature is using tokens
- which jobs fail most often
- which prompts are slow
- which users trigger the highest cost
Keep the boundaries clean
A maintainable AI Rails app usually has these boundaries:
- controllers: accept input, authorize, render output
- models: persistence and domain rules
- services: talk to providers and compose AI flows
- jobs: run slow or retryable work
- PostgreSQL: source of truth, including vectors
- Redis: cache and queue support
If you blur those layers, the app gets hard to debug fast.
What to remember
The complete AI Rails stack is not one gem or one API call. It is a set of clean responsibilities:
- Rails owns the product
- jobs own long-running AI work
- PostgreSQL owns structured data and vectors
- Redis removes pressure from the app
- service objects isolate provider complexity
That architecture scales from a small VPS to a serious production system without changing the core ideas.
Next time, we’ll close the series with what’s next in the Ruby AI ecosystem and which tools are worth watching.
Top comments (0)