Your AI features work. Your tests pass. But every time a user asks a question, your app hits an external API, waits 2-5 seconds, and burns tokens. Multiply that by a hundred concurrent users, and your server is crying.
The fix isn't complicated. It's caching — and Rails has the best caching story in web development. Let's make your AI app fast.
The Problem with AI Apps
Every AI call is expensive in three ways:
- Time — API calls take 1-10 seconds
- Money — tokens cost real dollars
- Rate limits — providers throttle you
Caching solves all three. Same question? Same answer. No API call needed.
Low-Level Caching with Rails.cache
The simplest approach — cache raw AI responses:
# app/services/chat_service.rb
class ChatService
def ask(question)
cache_key = "ai_response/#{Digest::SHA256.hexdigest(question)}"
Rails.cache.fetch(cache_key, expires_in: 1.hour) do
client.chat(
parameters: {
model: "gpt-4",
messages: [{ role: "user", content: question }]
}
).dig("choices", 0, "message", "content")
end
end
end
First call: hits API, caches result. Subsequent calls: instant response from cache. Zero tokens spent.
Configure Redis as Your Cache Store
In config/environments/production.rb:
config.cache_store = :redis_cache_store, {
url: ENV["REDIS_URL"],
expires_in: 1.hour,
namespace: "ai_cache",
pool_size: ENV.fetch("RAILS_MAX_THREADS", 5).to_i,
error_handler: -> (method:, returning:, exception:) {
Rails.logger.error("Redis cache error: #{exception.message}")
}
}
For development, use memory store:
# config/environments/development.rb
config.cache_store = :memory_store, { size: 64.megabytes }
Enable caching in development:
rails dev:cache
Fragment Caching for AI-Generated Views
When your views display AI content, cache the rendered HTML:
<%# app/views/documents/show.html.erb %>
<%= cache [@document, "ai_summary"] do %>
<div class="ai-summary">
<h3>AI Summary</h3>
<p><%= @document.ai_summary %></p>
<ul>
<% @document.ai_key_points.each do |point| %>
<li><%= point %></li>
<% end %>
</ul>
</div>
<% end %>
The fragment is cached until the document changes. No re-rendering, no re-computation.
Russian Doll Caching
Nested caching — inner fragments cached inside outer fragments. When the inner changes, only it re-renders:
<%# app/views/conversations/show.html.erb %>
<%= cache @conversation do %>
<h1><%= @conversation.title %></h1>
<% @conversation.messages.each do |message| %>
<%= cache message do %>
<div class="message message--<%= message.role %>">
<p><%= message.content %></p>
<span class="timestamp"><%= message.created_at.strftime("%H:%M") %></span>
</div>
<% end %>
<% end %>
<% end %>
New message added? Only the outer cache busts. All existing message fragments stay cached.
Make sure your model touches the parent:
# app/models/message.rb
class Message < ApplicationRecord
belongs_to :conversation, touch: true
end
Caching Embeddings
Embeddings are expensive to compute but never change for the same input. Cache them aggressively:
# app/services/embedding_service.rb
class EmbeddingService
def embed(text)
cache_key = "embedding/#{Digest::SHA256.hexdigest(text)}"
Rails.cache.fetch(cache_key, expires_in: 30.days) do
response = client.embeddings(
parameters: { model: "text-embedding-3-small", input: text }
)
response.dig("data", 0, "embedding")
end
end
end
30-day expiry because embeddings for the same text are deterministic. You could even cache forever.
Memoization for Request-Scoped Caching
For data that's used multiple times in a single request:
# app/models/document.rb
class Document < ApplicationRecord
def ai_summary
@ai_summary ||= Rails.cache.fetch("#{cache_key_with_version}/summary", expires_in: 6.hours) do
AiService.new.summarize(content)
end
end
end
Double layer: @ai_summary memoizes within the request, Rails.cache.fetch persists across requests.
Cache Warming with Background Jobs
Don't make users wait for a cold cache. Pre-compute in the background:
# app/jobs/warm_ai_cache_job.rb
class WarmAiCacheJob < ApplicationJob
queue_as :low_priority
def perform(document_id)
document = Document.find(document_id)
# Pre-compute and cache the summary
document.ai_summary
# Pre-compute and cache embeddings
EmbeddingService.new.embed(document.content)
end
end
Trigger after creation:
# app/models/document.rb
class Document < ApplicationRecord
after_create_commit :warm_ai_cache
private
def warm_ai_cache
WarmAiCacheJob.perform_later(id)
end
end
Cache Invalidation Strategies
The two hardest problems in computer science: cache invalidation, naming things, and off-by-one errors.
# Time-based expiry (simplest)
Rails.cache.fetch(key, expires_in: 1.hour) { expensive_call }
# Version-based (bust when data changes)
Rails.cache.fetch("#{document.cache_key_with_version}/summary") { compute }
# Manual invalidation
def update_content(new_content)
update!(content: new_content)
Rails.cache.delete("#{cache_key_with_version}/summary")
WarmAiCacheJob.perform_later(id)
end
Monitoring Cache Performance
Add instrumentation to know if your cache is actually helping:
# config/initializers/cache_instrumentation.rb
ActiveSupport::Notifications.subscribe("cache_read.active_support") do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
if event.payload[:hit]
Rails.logger.debug "CACHE HIT: #{event.payload[:key]}"
else
Rails.logger.debug "CACHE MISS: #{event.payload[:key]}"
end
end
Check your hit rate:
redis-cli INFO stats | grep keyspace
Quick Reference
| Strategy | Use When | TTL |
|---|---|---|
| Low-level cache | Same inputs = same outputs | 1-6 hours |
| Fragment cache | Rendering AI content in views | Until model changes |
| Embedding cache | Computing vectors for text | 30 days |
| Cache warming | New content arrives | Background |
What's Next
Your app is tested and fast. Next: making it secure. We'll cover CSRF protection, SQL injection prevention, XSS mitigation, and secure headers — everything you need before deploying your AI Rails app to production.
Top comments (0)