If you're using RubyLLM to build AI-powered applications in Ruby, you're already enjoying a clean, unified API across multiple providers. But there's one problem RubyLLM doesn't solve: every API call costs money.
Enter SemanticCache — a semantic caching layer that integrates seamlessly with RubyLLM to dramatically reduce your AI costs.
The Problem: Redundant API Calls
Consider a typical chatbot or AI assistant. Users often ask variations of the same question:
- "What's the capital of France?"
- "What is France's capital city?"
- "Tell me the capital of France"
- "Capital of France?"
Without caching, each of these triggers a separate API call. With traditional caching, you'd need an exact string match — which almost never happens with natural language.
Semantic caching solves this. It understands that these questions are semantically identical and returns the cached response instantly.
Why SemanticCache + RubyLLM?
1. Provider Flexibility
RubyLLM already gives you the freedom to switch between OpenAI, Gemini, Mistral, Ollama, Bedrock, and more. SemanticCache extends this flexibility to embeddings:
# Use Gemini for embeddings (cheaper) while using GPT-4 for completions
RubyLLM.configure do |config|
config.gemini_api_key = ENV["GEMINI_API_KEY"]
config.openai_api_key = ENV["OPENAI_API_KEY"]
end
SemanticCache.configure do |c|
c.embedding_adapter = :ruby_llm
c.embedding_model = "text-embedding-004" # Gemini's embedding model
end
# Now cache expensive GPT-4 calls using cheap Gemini embeddings
cache = SemanticCache.new
response = cache.fetch("Explain quantum computing") do
RubyLLM.chat(model: "gpt-4o", messages: [{ role: "user", content: "Explain quantum computing" }])
end
2. Local Embeddings with Ollama
Running Ollama locally? You can generate embeddings without any API costs:
RubyLLM.configure do |config|
config.ollama_api_base = "http://localhost:11434/v1"
end
SemanticCache.configure do |c|
c.embedding_adapter = :ruby_llm
c.embedding_model = "nomic-embed-text" # Free, local embeddings
end
This is perfect for:
- Development environments (no API costs while testing)
- Privacy-sensitive applications (embeddings never leave your server)
- High-volume applications where embedding costs add up
3. Enterprise-Ready with AWS Bedrock
For enterprise deployments, you might need to keep everything within AWS:
RubyLLM.configure do |config|
config.bedrock_region = "us-east-1"
end
SemanticCache.configure do |c|
c.embedding_adapter = :ruby_llm
c.embedding_model = "amazon.titan-embed-text-v1"
end
No data leaves your AWS environment, and you get the compliance benefits of Bedrock.
Real-World Impact
Let's do some quick math. Assume:
- 10,000 queries per day
- Average query: 50 tokens
- GPT-4o pricing: ~$5/1M input tokens, ~$15/1M output tokens
- Average response: 200 tokens
Without caching:
- Daily cost: ~$35/day
- Monthly cost: ~$1,050/month
With SemanticCache (assuming 70% hit rate):
- Daily cost: ~$10.50/day + minimal embedding costs
- Monthly cost: ~$315/month
- Savings: $735/month
And that's a conservative estimate. Applications with repetitive queries (FAQ bots, customer support, documentation assistants) often see 80-90% hit rates.
How It Works
- Query comes in → SemanticCache generates an embedding vector using your configured RubyLLM provider
- Similarity search → The cache searches for stored entries with high cosine similarity
- Cache hit? → If similarity exceeds your threshold (default 0.85), return the cached response
- Cache miss? → Execute the LLM call, store the result with its embedding for future queries
cache = SemanticCache.new(similarity_threshold: 0.85)
# First call - cache miss, calls the API
response = cache.fetch("What is Ruby?") do
RubyLLM.chat(messages: [{ role: "user", content: "What is Ruby?" }])
end
# Second call - cache hit! Returns instantly
response = cache.fetch("Tell me about the Ruby programming language") do
RubyLLM.chat(messages: [{ role: "user", content: "Tell me about Ruby" }])
end
# => Returns cached response, no API call made
Getting Started
Add both gems to your Gemfile:
gem "ruby_llm"
gem "semantic-cache"
Configure your providers:
# config/initializers/ruby_llm.rb
RubyLLM.configure do |config|
config.openai_api_key = ENV["OPENAI_API_KEY"]
end
# config/initializers/semantic_cache.rb
SemanticCache.configure do |c|
c.embedding_adapter = :ruby_llm
c.embedding_model = "text-embedding-3-small"
c.similarity_threshold = 0.85
c.store = :redis # For production
c.store_options = { url: ENV["REDIS_URL"] }
end
Start caching:
cache = SemanticCache.new
response = cache.fetch(user_message, model: "gpt-4o") do
RubyLLM.chat(
model: "gpt-4o",
messages: [{ role: "user", content: user_message }]
)
end
# Check your savings
puts cache.savings_report
# => Total saved: $23.45 (156 cached calls)
Advanced Patterns
Tag-Based Invalidation
Group related cache entries for bulk invalidation:
# Cache with tags
cache.fetch("Ruby version?", tags: [:ruby, :versions]) do
RubyLLM.chat(messages: [{ role: "user", content: "What's the latest Ruby version?" }])
end
# When Ruby 3.4 releases, invalidate all version-related caches
cache.invalidate(tags: [:versions])
TTL for Time-Sensitive Data
# News summaries should expire quickly
cache.fetch("Latest tech news?", ttl: 3600) do # 1 hour
RubyLLM.chat(messages: [{ role: "user", content: "Summarize today's tech news" }])
end
Per-User Namespacing
# Each user gets their own cache namespace
SemanticCache.with_cache(namespace: "user_#{current_user.id}") do
# Cached responses are isolated per user
end
Conclusion
SemanticCache and RubyLLM are a natural fit. RubyLLM gives you provider flexibility for LLM calls; SemanticCache extends that flexibility to embeddings while dramatically cutting your costs.
The combination is particularly powerful because:
- No vendor lock-in — Switch embedding providers without changing your caching logic
- Cost optimization — Use cheaper providers for embeddings, premium providers for completions
- Local development — Use Ollama for free local embeddings during development
- Enterprise compliance — Keep everything within AWS using Bedrock
Stop paying for the same answers twice. Add SemanticCache to your RubyLLM project today.
Links:
Top comments (0)