TL;DR: RubyLLM::Agents is a Rails engine that makes LLM-powered systems production-ready by providing retries, fallbacks, circuit breakers, cost and budget controls, multi-tenant limits, workflow orchestration, and a real-time dashboard — all through a clean Ruby DSL.
Why most LLM prototypes fail in production
Prototypes are deceptively easy. You call an LLM API, get a response, and ship. Everything looks fine in development.
Then production happens:
- APIs time out under real traffic
- A single runaway prompt causes a billing spike
- A provider deprecates a model and a background job starts failing at 3 AM
- You have no visibility into which agents are burning budget
- Rate limits surface as user-facing errors
These are not edge cases. They are the natural outcome of running LLMs at scale without reliability and observability built in.
What RubyLLM::Agents gives you
RubyLLM::Agents is a Rails engine built on top of RubyLLM, the provider-agnostic LLM client. It treats production concerns as first-class citizens: execution tracking, reliability patterns, budget enforcement, tenant isolation, multi-agent workflows, streaming, PII redaction, and a live dashboard.
Instead of stitching this together yourself, you get a coherent framework designed for long-running, revenue-impacting AI features.
Quick start
# Gemfile
gem "ruby_llm-agents"
bundle install
rails generate ruby_llm_agents:install
rails db:migrate
After installation, you have database tables for execution tracking, a configuration initializer, and a dashboard you can mount immediately.
Defining your first agent
# app/agents/search_intent_agent.rb
class SearchIntentAgent < ApplicationAgent
model "gpt-4o"
temperature 0.0
description "Extracts search intent and filters from user queries"
param :query, required: true
param :limit, type: :integer, default: 10
def system_prompt
<<~PROMPT
You are a search query analyzer. Extract the user's intent
and any filters (price, color, category, etc.) from their query.
Return structured JSON.
PROMPT
end
def user_prompt
"Analyze this search query: #{query}"
end
def schema
@schema ||= RubyLLM::Schema.create do
string :refined_query, description: "Cleaned search query"
array :filters, of: :string, description: "Extracted filters"
string :intent, description: "User intent category"
end
end
end
Calling the agent:
result = SearchIntentAgent.call(query: "red summer dress under $50")
result.content
# => { refined_query: "summer dress", filters: ["color:red", "price:<50"], intent: "product_search" }
result.total_cost # => 0.00025
result.total_tokens # => 150
result.duration_ms # => 850
result.success? # => true
Every execution is automatically persisted with full metadata: parameters, model used, retries, fallbacks, cost, latency, and errors.
Production reliability with the Reliability DSL
In production, LLM calls must handle transient failures, degraded providers, and latency spikes. RubyLLM::Agents exposes this via a declarative reliability DSL.
class CriticalAgent < ApplicationAgent
model "gpt-4o"
reliability do
retries max: 3, backoff: :exponential
fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
circuit_breaker errors: 10, within: 60, cooldown: 300
total_timeout 30
end
end
At runtime:
- The primary model is attempted first.
- Failures trigger retries with exponential backoff.
- After retries, fallback models are attempted in order.
- Repeated failures open a circuit breaker to fail fast and protect upstream services.
- A total timeout bounds the entire call sequence.
The result object makes this transparent:
result.chosen_model_id # Model that produced the final response
result.attempts_count # Total attempts including retries
result.fallback_reason # Why a fallback was used (timeout, rate_limit, etc.)
Cost control and budget enforcement
Unbounded LLM usage is a liability. RubyLLM::Agents provides hard and soft budget enforcement at multiple levels.
# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
config.budgets = {
global_daily: 100.0,
global_monthly: 2000.0,
per_agent_daily: {
"ExpensiveAgent" => 50.0,
"CheapAgent" => 5.0
},
enforcement: :hard
}
end
With :hard enforcement, calls raise BudgetExceededError once a limit is crossed. With :soft, limits trigger alerts but allow execution to continue.
Alerts integrate directly with Slack or webhooks:
config.alerts = {
on_events: [:budget_soft_cap, :budget_hard_cap, :breaker_open],
slack_webhook_url: ENV['SLACK_WEBHOOK_URL']
}
Multi-tenancy and per-customer limits
For SaaS products, each tenant can have isolated budgets and circuit breakers.
SearchAgent.call(
query: "find products",
tenant: {
id: "acme-corp",
daily_limit: 100.0,
monthly_limit: 1000.0
}
)
This ensures a single noisy customer cannot affect system reliability or cost for others.
Workflow orchestration
Real-world AI systems rarely rely on a single LLM call. RubyLLM::Agents supports structured orchestration through three workflow types.
Pipeline (sequential)
class ContentPipeline < RubyLLM::Agents::Workflow::Pipeline
timeout 60.seconds
max_cost 1.00
step :classify, agent: ClassifierAgent
step :enrich, agent: EnricherAgent
step :format, agent: FormatterAgent, optional: true
end
result = ContentPipeline.call(text: raw_content)
Pipelines support conditional steps, input transformation between steps, and per-step error handling.
Parallel (concurrent)
class AnalysisPipeline < RubyLLM::Agents::Workflow::Parallel
concurrency 3
fail_fast false
branch :sentiment, agent: SentimentAgent
branch :entities, agent: EntityAgent
branch :summary, agent: SummaryAgent
end
Branches execute concurrently with thread-safe aggregation and optional failure isolation.
Router (conditional)
class SupportRouter < RubyLLM::Agents::Workflow::Router
classifier_model "gpt-4o-mini"
route :technical, to: TechSupportAgent
route :billing, to: BillingAgent
route :default, to: GeneralAgent
end
Routers combine low-cost classification with rule-based overrides for efficient dispatch.
All workflows return aggregated metrics:
result.total_cost
result.duration_ms
result.status # success, partial, or error
result.errors
Observability and the dashboard
Mount the engine to enable a real-time dashboard:
# config/routes.rb
mount RubyLLM::Agents::Engine => "/agents"
The dashboard provides:
- Execution history with filtering by agent, model, and status
- Cost and token analytics over time
- Performance and throughput trends
- Error tracking with stack traces and context
The UI updates live using Turbo, without page refreshes.
Streaming, multi-turn conversations, and attachments
RubyLLM::Agents supports advanced interaction patterns:
- Streaming: Yield partial responses for chat and live UIs.
- Multi-turn conversations: Pass prior messages for context-aware agents.
- Attachments: Images, PDFs, and documents for vision and analysis tasks.
ChatAgent.call(
query: "What's my name?",
messages: [
{ role: :user, content: "My name is Alice" },
{ role: :assistant, content: "Nice to meet you, Alice!" }
]
)
PII protection and redaction
Sensitive data is automatically redacted from logs and dashboards. You can also define custom redaction patterns:
RubyLLM::Agents.configure do |config|
config.redaction_patterns = {
api_key: /sk-[a-zA-Z0-9]+/,
internal_id: /INTERNAL-\d+/
}
end
Testing agents
Agents can be tested in dry-run mode without calling external APIs:
result = SearchIntentAgent.call(query: "test", dry_run: true)
This allows validation of parameters, prompts, and schemas as part of your test suite.
Generators
Rails generators speed up adoption:
rails generate ruby_llm_agents:agent ProductSearch query:required category
rails generate ruby_llm_agents:agent Chat::Support message:required user_id
Why not call LLM APIs directly?
Direct API calls are fine for prototypes. Production systems require:
- Structured execution logs
- Predictable cost controls
- Reliability patterns and graceful degradation
- Tenant isolation for SaaS
- Centralized observability
RubyLLM::Agents provides these capabilities out of the box for Rails applications.
Getting started
gem "ruby_llm-agents"
bundle install
rails generate ruby_llm_agents:install
rails db:migrate
Documentation and source:
Roadmap
- v0.5.0 introduced tenant token limits and database-backed API configuration
- v0.4.0 added the reliability DSL and enhanced execution tracking
- Upcoming releases will expand workflow primitives and analytics
Building AI features that matter means treating LLMs like any other critical dependency. RubyLLM::Agents gives Rails developers the tooling needed to do exactly that.
Top comments (0)