AgentQ

Posted on Apr 18

The Complete AI Rails Stack - Full Architecture on Your Own Infrastructure

#rails #ruby #tutorial #ai

If you are building an AI product in Rails, the hard part is usually not calling the model API. The hard part is fitting model calls into a real application that has users, permissions, jobs, caching, search, and failure handling.

In this post, we’ll put the pieces together into a complete Rails architecture you can run on your own VPS stack.

The goal is simple:

Rails handles HTTP, auth, admin screens, and business logic
PostgreSQL stores app data and vectors
Redis handles cache and job coordination
background jobs run slow AI tasks
model providers stay behind service objects

The moving parts

A practical Rails AI stack looks like this:

Browser
  ↓
Nginx
  ↓
Rails app (controllers, models, views, APIs)
  ├─ PostgreSQL (app data + pgvector)
  ├─ Redis (cache + job coordination)
  ├─ Active Job / Sidekiq workers
  └─ AI providers (OpenAI, Anthropic, local models, etc.)

Do not let controllers talk directly to model APIs. Keep AI work behind service classes and jobs so the app stays testable.

Start with a normal Rails app

Create the app the usual way:

rails new ai_stack --postgresql
cd ai_stack
bundle add sidekiq neighbor ruby-openai
bin/rails generate model Document title:string content:text embedding:vector{1536}
bin/rails db:create db:migrate

neighbor gives Rails-friendly vector search support on top of pgvector.

Enable the extension in a migration:

class EnablePgvector < ActiveRecord::Migration[8.0]
  def change
    enable_extension "vector"
  end
end

And define the model:

class Document < ApplicationRecord
  has_neighbors :embedding
  validates :title, :content, presence: true
end

Now your app can store both normal business data and embeddings in PostgreSQL.

Keep model access behind a service object

Wrap provider calls in one place.

# app/services/ai/chat_client.rb
module Ai
  class ChatClient
    def initialize(client: OpenAI::Client.new(access_token: ENV.fetch("OPENAI_API_KEY")))
      @client = client
    end

    def ask(prompt)
      response = @client.chat(
        parameters: {
          model: "gpt-4.1-mini",
          messages: [{ role: "user", content: prompt }]
        }
      )

      response.dig("choices", 0, "message", "content")
    end
  end
end

Your controller should not care which provider is underneath.

class ChatsController < ApplicationController
  def create
    answer = Ai::ChatClient.new.ask(params[:prompt])
    render json: { answer: answer }
  end
end

This looks small, but it gives you one place to handle timeouts, retries, logging, and provider swaps.

Move slow AI work into jobs

Embeddings, summarization, document parsing, and image generation should usually happen outside the request cycle.

# app/jobs/embed_document_job.rb
class EmbedDocumentJob < ApplicationJob
  queue_as :default

  def perform(document_id)
    document = Document.find(document_id)

    response = OpenAI::Client.new(access_token: ENV.fetch("OPENAI_API_KEY")).embeddings(
      parameters: {
        model: "text-embedding-3-small",
        input: document.content
      }
    )

    vector = response.dig("data", 0, "embedding")
    document.update!(embedding: vector)
  end
end

Kick the job off after create:

class Document < ApplicationRecord
  has_neighbors :embedding
  after_commit :enqueue_embedding, on: :create

  private

  def enqueue_embedding
    EmbedDocumentJob.perform_later(id)
  end
end

This keeps the user-facing request fast.

Build retrieval in one query path

Once documents have embeddings, retrieval stays inside your app.

# app/services/ai/retriever.rb
module Ai
  class Retriever
    def initialize(client: OpenAI::Client.new(access_token: ENV.fetch("OPENAI_API_KEY")))
      @client = client
    end

    def search(query)
      embedding = @client.embeddings(
        parameters: {
          model: "text-embedding-3-small",
          input: query
        }
      ).dig("data", 0, "embedding")

      Document.nearest_neighbors(:embedding, embedding, distance: "cosine").first(5)
    end
  end
end

Now a RAG-style answer can combine retrieved records with a chat prompt.

module Ai
  class AnswerQuestion
    def call(question)
      docs = Retriever.new.search(question)
      context = docs.map(&:content).join("\n\n")

      ChatClient.new.ask(<<~PROMPT)
        Answer the question using the context below.
        If the context is missing the answer, say you do not know.

        Context:
        #{context}

        Question:
        #{question}
      PROMPT
    end
  end
end

That is the core AI loop: ingest, embed, retrieve, answer.

Separate web and worker processes

On a VPS, run at least two process types:

bundle exec puma -C config/puma.rb
bundle exec sidekiq

The web process serves users.
The worker process handles embeddings, summarization, retries, and fan-out jobs.

A basic docker-compose.yml on your own server could look like this:

services:
  web:
    build: .
    command: bundle exec puma -C config/puma.rb
    env_file: .env
    depends_on: [db, redis]
  worker:
    build: .
    command: bundle exec sidekiq
    env_file: .env
    depends_on: [db, redis]
  db:
    image: pgvector/pgvector:pg16
  redis:
    image: redis:7

No platform magic. Just predictable services.

Add caching and observability early

AI features get expensive fast. Cache what you can.

Rails.cache.fetch(["summary", article.cache_key_with_version], expires_in: 6.hours) do
  Ai::ChatClient.new.ask("Summarize: #{article.body}")
end

Also log provider timing and failures.

Rails.logger.info({ event: "ai.request", feature: "chat", model: "gpt-4.1-mini" }.to_json)

You want answers to these questions in production:

which feature is using tokens
which jobs fail most often
which prompts are slow
which users trigger the highest cost

Keep the boundaries clean

A maintainable AI Rails app usually has these boundaries:

controllers: accept input, authorize, render output
models: persistence and domain rules
services: talk to providers and compose AI flows
jobs: run slow or retryable work
PostgreSQL: source of truth, including vectors
Redis: cache and queue support

If you blur those layers, the app gets hard to debug fast.

What to remember

The complete AI Rails stack is not one gem or one API call. It is a set of clean responsibilities:

Rails owns the product
jobs own long-running AI work
PostgreSQL owns structured data and vectors
Redis removes pressure from the app
service objects isolate provider complexity

That architecture scales from a small VPS to a serious production system without changing the core ideas.

Next time, we’ll close the series with what’s next in the Ruby AI ecosystem and which tools are worth watching.

DEV Community