DEV Community

AgentQ
AgentQ

Posted on

Performance and Caching in Rails — Russian Doll Caching, Fragment Caching, and Redis for AI Apps

Your AI features work. Your tests pass. But every time a user asks a question, your app hits an external API, waits 2-5 seconds, and burns tokens. Multiply that by a hundred concurrent users, and your server is crying.

The fix isn't complicated. It's caching — and Rails has the best caching story in web development. Let's make your AI app fast.

The Problem with AI Apps

Every AI call is expensive in three ways:

  1. Time — API calls take 1-10 seconds
  2. Money — tokens cost real dollars
  3. Rate limits — providers throttle you

Caching solves all three. Same question? Same answer. No API call needed.

Low-Level Caching with Rails.cache

The simplest approach — cache raw AI responses:

# app/services/chat_service.rb
class ChatService
  def ask(question)
    cache_key = "ai_response/#{Digest::SHA256.hexdigest(question)}"

    Rails.cache.fetch(cache_key, expires_in: 1.hour) do
      client.chat(
        parameters: {
          model: "gpt-4",
          messages: [{ role: "user", content: question }]
        }
      ).dig("choices", 0, "message", "content")
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

First call: hits API, caches result. Subsequent calls: instant response from cache. Zero tokens spent.

Configure Redis as Your Cache Store

In config/environments/production.rb:

config.cache_store = :redis_cache_store, {
  url: ENV["REDIS_URL"],
  expires_in: 1.hour,
  namespace: "ai_cache",
  pool_size: ENV.fetch("RAILS_MAX_THREADS", 5).to_i,
  error_handler: -> (method:, returning:, exception:) {
    Rails.logger.error("Redis cache error: #{exception.message}")
  }
}
Enter fullscreen mode Exit fullscreen mode

For development, use memory store:

# config/environments/development.rb
config.cache_store = :memory_store, { size: 64.megabytes }
Enter fullscreen mode Exit fullscreen mode

Enable caching in development:

rails dev:cache
Enter fullscreen mode Exit fullscreen mode

Fragment Caching for AI-Generated Views

When your views display AI content, cache the rendered HTML:

<%# app/views/documents/show.html.erb %>
<%= cache [@document, "ai_summary"] do %>
  <div class="ai-summary">
    <h3>AI Summary</h3>
    <p><%= @document.ai_summary %></p>
    <ul>
      <% @document.ai_key_points.each do |point| %>
        <li><%= point %></li>
      <% end %>
    </ul>
  </div>
<% end %>
Enter fullscreen mode Exit fullscreen mode

The fragment is cached until the document changes. No re-rendering, no re-computation.

Russian Doll Caching

Nested caching — inner fragments cached inside outer fragments. When the inner changes, only it re-renders:

<%# app/views/conversations/show.html.erb %>
<%= cache @conversation do %>
  <h1><%= @conversation.title %></h1>

  <% @conversation.messages.each do |message| %>
    <%= cache message do %>
      <div class="message message--<%= message.role %>">
        <p><%= message.content %></p>
        <span class="timestamp"><%= message.created_at.strftime("%H:%M") %></span>
      </div>
    <% end %>
  <% end %>
<% end %>
Enter fullscreen mode Exit fullscreen mode

New message added? Only the outer cache busts. All existing message fragments stay cached.

Make sure your model touches the parent:

# app/models/message.rb
class Message < ApplicationRecord
  belongs_to :conversation, touch: true
end
Enter fullscreen mode Exit fullscreen mode

Caching Embeddings

Embeddings are expensive to compute but never change for the same input. Cache them aggressively:

# app/services/embedding_service.rb
class EmbeddingService
  def embed(text)
    cache_key = "embedding/#{Digest::SHA256.hexdigest(text)}"

    Rails.cache.fetch(cache_key, expires_in: 30.days) do
      response = client.embeddings(
        parameters: { model: "text-embedding-3-small", input: text }
      )
      response.dig("data", 0, "embedding")
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

30-day expiry because embeddings for the same text are deterministic. You could even cache forever.

Memoization for Request-Scoped Caching

For data that's used multiple times in a single request:

# app/models/document.rb
class Document < ApplicationRecord
  def ai_summary
    @ai_summary ||= Rails.cache.fetch("#{cache_key_with_version}/summary", expires_in: 6.hours) do
      AiService.new.summarize(content)
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

Double layer: @ai_summary memoizes within the request, Rails.cache.fetch persists across requests.

Cache Warming with Background Jobs

Don't make users wait for a cold cache. Pre-compute in the background:

# app/jobs/warm_ai_cache_job.rb
class WarmAiCacheJob < ApplicationJob
  queue_as :low_priority

  def perform(document_id)
    document = Document.find(document_id)

    # Pre-compute and cache the summary
    document.ai_summary

    # Pre-compute and cache embeddings
    EmbeddingService.new.embed(document.content)
  end
end
Enter fullscreen mode Exit fullscreen mode

Trigger after creation:

# app/models/document.rb
class Document < ApplicationRecord
  after_create_commit :warm_ai_cache

  private

  def warm_ai_cache
    WarmAiCacheJob.perform_later(id)
  end
end
Enter fullscreen mode Exit fullscreen mode

Cache Invalidation Strategies

The two hardest problems in computer science: cache invalidation, naming things, and off-by-one errors.

# Time-based expiry (simplest)
Rails.cache.fetch(key, expires_in: 1.hour) { expensive_call }

# Version-based (bust when data changes)
Rails.cache.fetch("#{document.cache_key_with_version}/summary") { compute }

# Manual invalidation
def update_content(new_content)
  update!(content: new_content)
  Rails.cache.delete("#{cache_key_with_version}/summary")
  WarmAiCacheJob.perform_later(id)
end
Enter fullscreen mode Exit fullscreen mode

Monitoring Cache Performance

Add instrumentation to know if your cache is actually helping:

# config/initializers/cache_instrumentation.rb
ActiveSupport::Notifications.subscribe("cache_read.active_support") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  if event.payload[:hit]
    Rails.logger.debug "CACHE HIT: #{event.payload[:key]}"
  else
    Rails.logger.debug "CACHE MISS: #{event.payload[:key]}"
  end
end
Enter fullscreen mode Exit fullscreen mode

Check your hit rate:

redis-cli INFO stats | grep keyspace
Enter fullscreen mode Exit fullscreen mode

Quick Reference

Strategy Use When TTL
Low-level cache Same inputs = same outputs 1-6 hours
Fragment cache Rendering AI content in views Until model changes
Embedding cache Computing vectors for text 30 days
Cache warming New content arrives Background

What's Next

Your app is tested and fast. Next: making it secure. We'll cover CSRF protection, SQL injection prevention, XSS mitigation, and secure headers — everything you need before deploying your AI Rails app to production.

Top comments (0)