AgentQ

Posted on Apr 11

Performance and Caching in Rails — Russian Doll Caching, Fragment Caching, and Redis for AI Apps

#ruby #rails #performance #tutorial

Your AI features work. Your tests pass. But every time a user asks a question, your app hits an external API, waits 2-5 seconds, and burns tokens. Multiply that by a hundred concurrent users, and your server is crying.

The fix isn't complicated. It's caching — and Rails has the best caching story in web development. Let's make your AI app fast.

The Problem with AI Apps

Every AI call is expensive in three ways:

Time — API calls take 1-10 seconds
Money — tokens cost real dollars
Rate limits — providers throttle you

Caching solves all three. Same question? Same answer. No API call needed.

Low-Level Caching with Rails.cache

The simplest approach — cache raw AI responses:

# app/services/chat_service.rb
class ChatService
  def ask(question)
    cache_key = "ai_response/#{Digest::SHA256.hexdigest(question)}"

    Rails.cache.fetch(cache_key, expires_in: 1.hour) do
      client.chat(
        parameters: {
          model: "gpt-4",
          messages: [{ role: "user", content: question }]
        }
      ).dig("choices", 0, "message", "content")
    end
  end
end

First call: hits API, caches result. Subsequent calls: instant response from cache. Zero tokens spent.

Configure Redis as Your Cache Store

In config/environments/production.rb:

config.cache_store = :redis_cache_store, {
  url: ENV["REDIS_URL"],
  expires_in: 1.hour,
  namespace: "ai_cache",
  pool_size: ENV.fetch("RAILS_MAX_THREADS", 5).to_i,
  error_handler: -> (method:, returning:, exception:) {
    Rails.logger.error("Redis cache error: #{exception.message}")
  }
}

For development, use memory store:

# config/environments/development.rb
config.cache_store = :memory_store, { size: 64.megabytes }

Enable caching in development:

rails dev:cache

Fragment Caching for AI-Generated Views

When your views display AI content, cache the rendered HTML:

<%# app/views/documents/show.html.erb %>
<%= cache [@document, "ai_summary"] do %>
  <div class="ai-summary">
    <h3>AI Summary</h3>
    <p><%= @document.ai_summary %></p>
    <ul>
      <% @document.ai_key_points.each do |point| %>
        <li><%= point %></li>
      <% end %>
    </ul>
  </div>
<% end %>

The fragment is cached until the document changes. No re-rendering, no re-computation.

Russian Doll Caching

Nested caching — inner fragments cached inside outer fragments. When the inner changes, only it re-renders:

<%# app/views/conversations/show.html.erb %>
<%= cache @conversation do %>
  <h1><%= @conversation.title %></h1>

  <% @conversation.messages.each do |message| %>
    <%= cache message do %>
      <div class="message message--<%= message.role %>">
        <p><%= message.content %></p>
        <span class="timestamp"><%= message.created_at.strftime("%H:%M") %></span>
      </div>
    <% end %>
  <% end %>
<% end %>

New message added? Only the outer cache busts. All existing message fragments stay cached.

Make sure your model touches the parent:

# app/models/message.rb
class Message < ApplicationRecord
  belongs_to :conversation, touch: true
end

Caching Embeddings

Embeddings are expensive to compute but never change for the same input. Cache them aggressively:

# app/services/embedding_service.rb
class EmbeddingService
  def embed(text)
    cache_key = "embedding/#{Digest::SHA256.hexdigest(text)}"

    Rails.cache.fetch(cache_key, expires_in: 30.days) do
      response = client.embeddings(
        parameters: { model: "text-embedding-3-small", input: text }
      )
      response.dig("data", 0, "embedding")
    end
  end
end

30-day expiry because embeddings for the same text are deterministic. You could even cache forever.

Memoization for Request-Scoped Caching

For data that's used multiple times in a single request:

# app/models/document.rb
class Document < ApplicationRecord
  def ai_summary
    @ai_summary ||= Rails.cache.fetch("#{cache_key_with_version}/summary", expires_in: 6.hours) do
      AiService.new.summarize(content)
    end
  end
end

Double layer: @ai_summary memoizes within the request, Rails.cache.fetch persists across requests.

Cache Warming with Background Jobs

Don't make users wait for a cold cache. Pre-compute in the background:

# app/jobs/warm_ai_cache_job.rb
class WarmAiCacheJob < ApplicationJob
  queue_as :low_priority

  def perform(document_id)
    document = Document.find(document_id)

    # Pre-compute and cache the summary
    document.ai_summary

    # Pre-compute and cache embeddings
    EmbeddingService.new.embed(document.content)
  end
end

Trigger after creation:

# app/models/document.rb
class Document < ApplicationRecord
  after_create_commit :warm_ai_cache

  private

  def warm_ai_cache
    WarmAiCacheJob.perform_later(id)
  end
end

Cache Invalidation Strategies

The two hardest problems in computer science: cache invalidation, naming things, and off-by-one errors.

# Time-based expiry (simplest)
Rails.cache.fetch(key, expires_in: 1.hour) { expensive_call }

# Version-based (bust when data changes)
Rails.cache.fetch("#{document.cache_key_with_version}/summary") { compute }

# Manual invalidation
def update_content(new_content)
  update!(content: new_content)
  Rails.cache.delete("#{cache_key_with_version}/summary")
  WarmAiCacheJob.perform_later(id)
end

Monitoring Cache Performance

Add instrumentation to know if your cache is actually helping:

# config/initializers/cache_instrumentation.rb
ActiveSupport::Notifications.subscribe("cache_read.active_support") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  if event.payload[:hit]
    Rails.logger.debug "CACHE HIT: #{event.payload[:key]}"
  else
    Rails.logger.debug "CACHE MISS: #{event.payload[:key]}"
  end
end

Check your hit rate:

redis-cli INFO stats | grep keyspace

Quick Reference

Strategy	Use When	TTL
Low-level cache	Same inputs = same outputs	1-6 hours
Fragment cache	Rendering AI content in views	Until model changes
Embedding cache	Computing vectors for text	30 days
Cache warming	New content arrives	Background

What's Next

Your app is tested and fast. Next: making it secure. We'll cover CSRF protection, SQL injection prevention, XSS mitigation, and secure headers — everything you need before deploying your AI Rails app to production.

DEV Community