Adham EL-Deeb

Posted on Jan 18

Building Production-Ready AI Agents in Rails with RubyLLM::Agents

#agents #llm #rails #ruby

TL;DR: RubyLLM::Agents is a Rails engine that makes LLM-powered systems production-ready by providing retries, fallbacks, circuit breakers, cost and budget controls, multi-tenant limits, workflow orchestration, and a real-time dashboard — all through a clean Ruby DSL.

Why most LLM prototypes fail in production

Prototypes are deceptively easy. You call an LLM API, get a response, and ship. Everything looks fine in development.

Then production happens:

APIs time out under real traffic
A single runaway prompt causes a billing spike
A provider deprecates a model and a background job starts failing at 3 AM
You have no visibility into which agents are burning budget
Rate limits surface as user-facing errors

These are not edge cases. They are the natural outcome of running LLMs at scale without reliability and observability built in.

What RubyLLM::Agents gives you

RubyLLM::Agents is a Rails engine built on top of RubyLLM, the provider-agnostic LLM client. It treats production concerns as first-class citizens: execution tracking, reliability patterns, budget enforcement, tenant isolation, multi-agent workflows, streaming, PII redaction, and a live dashboard.

Instead of stitching this together yourself, you get a coherent framework designed for long-running, revenue-impacting AI features.

Quick start

# Gemfile
gem "ruby_llm-agents"

bundle install
rails generate ruby_llm_agents:install
rails db:migrate

After installation, you have database tables for execution tracking, a configuration initializer, and a dashboard you can mount immediately.

Defining your first agent

# app/agents/search_intent_agent.rb
class SearchIntentAgent < ApplicationAgent
  model "gpt-4o"
  temperature 0.0
  description "Extracts search intent and filters from user queries"

  param :query, required: true
  param :limit, type: :integer, default: 10

  def system_prompt
    <<~PROMPT
      You are a search query analyzer. Extract the user's intent
      and any filters (price, color, category, etc.) from their query.
      Return structured JSON.
    PROMPT
  end

  def user_prompt
    "Analyze this search query: #{query}"
  end

  def schema
    @schema ||= RubyLLM::Schema.create do
      string :refined_query, description: "Cleaned search query"
      array :filters, of: :string, description: "Extracted filters"
      string :intent, description: "User intent category"
    end
  end
end

Calling the agent:

result = SearchIntentAgent.call(query: "red summer dress under $50")

result.content
# => { refined_query: "summer dress", filters: ["color:red", "price:<50"], intent: "product_search" }

result.total_cost     # => 0.00025
result.total_tokens   # => 150
result.duration_ms    # => 850
result.success?       # => true

Every execution is automatically persisted with full metadata: parameters, model used, retries, fallbacks, cost, latency, and errors.

Production reliability with the Reliability DSL

In production, LLM calls must handle transient failures, degraded providers, and latency spikes. RubyLLM::Agents exposes this via a declarative reliability DSL.

class CriticalAgent < ApplicationAgent
  model "gpt-4o"

  reliability do
    retries max: 3, backoff: :exponential
    fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
    circuit_breaker errors: 10, within: 60, cooldown: 300
    total_timeout 30
  end
end

At runtime:

The primary model is attempted first.
Failures trigger retries with exponential backoff.
After retries, fallback models are attempted in order.
Repeated failures open a circuit breaker to fail fast and protect upstream services.
A total timeout bounds the entire call sequence.

The result object makes this transparent:

result.chosen_model_id  # Model that produced the final response
result.attempts_count   # Total attempts including retries
result.fallback_reason  # Why a fallback was used (timeout, rate_limit, etc.)

Cost control and budget enforcement

Unbounded LLM usage is a liability. RubyLLM::Agents provides hard and soft budget enforcement at multiple levels.

# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
  config.budgets = {
    global_daily: 100.0,
    global_monthly: 2000.0,

    per_agent_daily: {
      "ExpensiveAgent" => 50.0,
      "CheapAgent" => 5.0
    },

    enforcement: :hard
  }
end

With :hard enforcement, calls raise BudgetExceededError once a limit is crossed. With :soft, limits trigger alerts but allow execution to continue.

Alerts integrate directly with Slack or webhooks:

config.alerts = {
  on_events: [:budget_soft_cap, :budget_hard_cap, :breaker_open],
  slack_webhook_url: ENV['SLACK_WEBHOOK_URL']
}

Multi-tenancy and per-customer limits

For SaaS products, each tenant can have isolated budgets and circuit breakers.

SearchAgent.call(
  query: "find products",
  tenant: {
    id: "acme-corp",
    daily_limit: 100.0,
    monthly_limit: 1000.0
  }
)

This ensures a single noisy customer cannot affect system reliability or cost for others.

Workflow orchestration

Real-world AI systems rarely rely on a single LLM call. RubyLLM::Agents supports structured orchestration through three workflow types.

Pipeline (sequential)

class ContentPipeline < RubyLLM::Agents::Workflow::Pipeline
  timeout 60.seconds
  max_cost 1.00

  step :classify, agent: ClassifierAgent
  step :enrich,   agent: EnricherAgent
  step :format,   agent: FormatterAgent, optional: true
end

result = ContentPipeline.call(text: raw_content)

Pipelines support conditional steps, input transformation between steps, and per-step error handling.

Parallel (concurrent)

class AnalysisPipeline < RubyLLM::Agents::Workflow::Parallel
  concurrency 3
  fail_fast false

  branch :sentiment, agent: SentimentAgent
  branch :entities,  agent: EntityAgent
  branch :summary,   agent: SummaryAgent
end

Branches execute concurrently with thread-safe aggregation and optional failure isolation.

Router (conditional)

class SupportRouter < RubyLLM::Agents::Workflow::Router
  classifier_model "gpt-4o-mini"

  route :technical, to: TechSupportAgent
  route :billing,   to: BillingAgent
  route :default,   to: GeneralAgent
end

Routers combine low-cost classification with rule-based overrides for efficient dispatch.

All workflows return aggregated metrics:

result.total_cost
result.duration_ms
result.status        # success, partial, or error
result.errors

Observability and the dashboard

Mount the engine to enable a real-time dashboard:

# config/routes.rb
mount RubyLLM::Agents::Engine => "/agents"

The dashboard provides:

Execution history with filtering by agent, model, and status
Cost and token analytics over time
Performance and throughput trends
Error tracking with stack traces and context

The UI updates live using Turbo, without page refreshes.

Streaming, multi-turn conversations, and attachments

RubyLLM::Agents supports advanced interaction patterns:

Streaming: Yield partial responses for chat and live UIs.
Multi-turn conversations: Pass prior messages for context-aware agents.
Attachments: Images, PDFs, and documents for vision and analysis tasks.

ChatAgent.call(
  query: "What's my name?",
  messages: [
    { role: :user, content: "My name is Alice" },
    { role: :assistant, content: "Nice to meet you, Alice!" }
  ]
)

PII protection and redaction

Sensitive data is automatically redacted from logs and dashboards. You can also define custom redaction patterns:

RubyLLM::Agents.configure do |config|
  config.redaction_patterns = {
    api_key: /sk-[a-zA-Z0-9]+/,
    internal_id: /INTERNAL-\d+/
  }
end

Testing agents

Agents can be tested in dry-run mode without calling external APIs:

result = SearchIntentAgent.call(query: "test", dry_run: true)

This allows validation of parameters, prompts, and schemas as part of your test suite.

Generators

Rails generators speed up adoption:

rails generate ruby_llm_agents:agent ProductSearch query:required category
rails generate ruby_llm_agents:agent Chat::Support message:required user_id

Why not call LLM APIs directly?

Direct API calls are fine for prototypes. Production systems require:

Structured execution logs
Predictable cost controls
Reliability patterns and graceful degradation
Tenant isolation for SaaS
Centralized observability

RubyLLM::Agents provides these capabilities out of the box for Rails applications.

Getting started

gem "ruby_llm-agents"

bundle install
rails generate ruby_llm_agents:install
rails db:migrate

Documentation and source:

Roadmap

v0.5.0 introduced tenant token limits and database-backed API configuration
v0.4.0 added the reliability DSL and enhanced execution tracking
Upcoming releases will expand workflow primitives and analytics

Building AI features that matter means treating LLMs like any other critical dependency. RubyLLM::Agents gives Rails developers the tooling needed to do exactly that.

DEV Community