DEV Community

Cover image for Building Production-Ready AI Agents in Rails with RubyLLM::Agents
Adham EL-Deeb
Adham EL-Deeb

Posted on

Building Production-Ready AI Agents in Rails with RubyLLM::Agents

TL;DR: RubyLLM::Agents is a Rails engine that makes LLM-powered systems production-ready by providing retries, fallbacks, circuit breakers, cost and budget controls, multi-tenant limits, workflow orchestration, and a real-time dashboard — all through a clean Ruby DSL.


Why most LLM prototypes fail in production

Prototypes are deceptively easy. You call an LLM API, get a response, and ship. Everything looks fine in development.

Then production happens:

  • APIs time out under real traffic
  • A single runaway prompt causes a billing spike
  • A provider deprecates a model and a background job starts failing at 3 AM
  • You have no visibility into which agents are burning budget
  • Rate limits surface as user-facing errors

These are not edge cases. They are the natural outcome of running LLMs at scale without reliability and observability built in.

What RubyLLM::Agents gives you

RubyLLM::Agents is a Rails engine built on top of RubyLLM, the provider-agnostic LLM client. It treats production concerns as first-class citizens: execution tracking, reliability patterns, budget enforcement, tenant isolation, multi-agent workflows, streaming, PII redaction, and a live dashboard.

Instead of stitching this together yourself, you get a coherent framework designed for long-running, revenue-impacting AI features.


Quick start

# Gemfile
gem "ruby_llm-agents"

bundle install
rails generate ruby_llm_agents:install
rails db:migrate
Enter fullscreen mode Exit fullscreen mode

After installation, you have database tables for execution tracking, a configuration initializer, and a dashboard you can mount immediately.


Defining your first agent

# app/agents/search_intent_agent.rb
class SearchIntentAgent < ApplicationAgent
  model "gpt-4o"
  temperature 0.0
  description "Extracts search intent and filters from user queries"

  param :query, required: true
  param :limit, type: :integer, default: 10

  def system_prompt
    <<~PROMPT
      You are a search query analyzer. Extract the user's intent
      and any filters (price, color, category, etc.) from their query.
      Return structured JSON.
    PROMPT
  end

  def user_prompt
    "Analyze this search query: #{query}"
  end

  def schema
    @schema ||= RubyLLM::Schema.create do
      string :refined_query, description: "Cleaned search query"
      array :filters, of: :string, description: "Extracted filters"
      string :intent, description: "User intent category"
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

Calling the agent:

result = SearchIntentAgent.call(query: "red summer dress under $50")

result.content
# => { refined_query: "summer dress", filters: ["color:red", "price:<50"], intent: "product_search" }

result.total_cost     # => 0.00025
result.total_tokens   # => 150
result.duration_ms    # => 850
result.success?       # => true
Enter fullscreen mode Exit fullscreen mode

Every execution is automatically persisted with full metadata: parameters, model used, retries, fallbacks, cost, latency, and errors.


Production reliability with the Reliability DSL

In production, LLM calls must handle transient failures, degraded providers, and latency spikes. RubyLLM::Agents exposes this via a declarative reliability DSL.

class CriticalAgent < ApplicationAgent
  model "gpt-4o"

  reliability do
    retries max: 3, backoff: :exponential
    fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
    circuit_breaker errors: 10, within: 60, cooldown: 300
    total_timeout 30
  end
end
Enter fullscreen mode Exit fullscreen mode

At runtime:

  1. The primary model is attempted first.
  2. Failures trigger retries with exponential backoff.
  3. After retries, fallback models are attempted in order.
  4. Repeated failures open a circuit breaker to fail fast and protect upstream services.
  5. A total timeout bounds the entire call sequence.

The result object makes this transparent:

result.chosen_model_id  # Model that produced the final response
result.attempts_count   # Total attempts including retries
result.fallback_reason  # Why a fallback was used (timeout, rate_limit, etc.)
Enter fullscreen mode Exit fullscreen mode

Cost control and budget enforcement

Unbounded LLM usage is a liability. RubyLLM::Agents provides hard and soft budget enforcement at multiple levels.

# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
  config.budgets = {
    global_daily: 100.0,
    global_monthly: 2000.0,

    per_agent_daily: {
      "ExpensiveAgent" => 50.0,
      "CheapAgent" => 5.0
    },

    enforcement: :hard
  }
end
Enter fullscreen mode Exit fullscreen mode

With :hard enforcement, calls raise BudgetExceededError once a limit is crossed. With :soft, limits trigger alerts but allow execution to continue.

Alerts integrate directly with Slack or webhooks:

config.alerts = {
  on_events: [:budget_soft_cap, :budget_hard_cap, :breaker_open],
  slack_webhook_url: ENV['SLACK_WEBHOOK_URL']
}
Enter fullscreen mode Exit fullscreen mode

Multi-tenancy and per-customer limits

For SaaS products, each tenant can have isolated budgets and circuit breakers.

SearchAgent.call(
  query: "find products",
  tenant: {
    id: "acme-corp",
    daily_limit: 100.0,
    monthly_limit: 1000.0
  }
)
Enter fullscreen mode Exit fullscreen mode

This ensures a single noisy customer cannot affect system reliability or cost for others.


Workflow orchestration

Real-world AI systems rarely rely on a single LLM call. RubyLLM::Agents supports structured orchestration through three workflow types.

Pipeline (sequential)

class ContentPipeline < RubyLLM::Agents::Workflow::Pipeline
  timeout 60.seconds
  max_cost 1.00

  step :classify, agent: ClassifierAgent
  step :enrich,   agent: EnricherAgent
  step :format,   agent: FormatterAgent, optional: true
end

result = ContentPipeline.call(text: raw_content)
Enter fullscreen mode Exit fullscreen mode

Pipelines support conditional steps, input transformation between steps, and per-step error handling.

Parallel (concurrent)

class AnalysisPipeline < RubyLLM::Agents::Workflow::Parallel
  concurrency 3
  fail_fast false

  branch :sentiment, agent: SentimentAgent
  branch :entities,  agent: EntityAgent
  branch :summary,   agent: SummaryAgent
end
Enter fullscreen mode Exit fullscreen mode

Branches execute concurrently with thread-safe aggregation and optional failure isolation.

Router (conditional)

class SupportRouter < RubyLLM::Agents::Workflow::Router
  classifier_model "gpt-4o-mini"

  route :technical, to: TechSupportAgent
  route :billing,   to: BillingAgent
  route :default,   to: GeneralAgent
end
Enter fullscreen mode Exit fullscreen mode

Routers combine low-cost classification with rule-based overrides for efficient dispatch.

All workflows return aggregated metrics:

result.total_cost
result.duration_ms
result.status        # success, partial, or error
result.errors
Enter fullscreen mode Exit fullscreen mode

Observability and the dashboard

Mount the engine to enable a real-time dashboard:

# config/routes.rb
mount RubyLLM::Agents::Engine => "/agents"
Enter fullscreen mode Exit fullscreen mode

The dashboard provides:

  • Execution history with filtering by agent, model, and status
  • Cost and token analytics over time
  • Performance and throughput trends
  • Error tracking with stack traces and context

The UI updates live using Turbo, without page refreshes.


Streaming, multi-turn conversations, and attachments

RubyLLM::Agents supports advanced interaction patterns:

  • Streaming: Yield partial responses for chat and live UIs.
  • Multi-turn conversations: Pass prior messages for context-aware agents.
  • Attachments: Images, PDFs, and documents for vision and analysis tasks.
ChatAgent.call(
  query: "What's my name?",
  messages: [
    { role: :user, content: "My name is Alice" },
    { role: :assistant, content: "Nice to meet you, Alice!" }
  ]
)
Enter fullscreen mode Exit fullscreen mode

PII protection and redaction

Sensitive data is automatically redacted from logs and dashboards. You can also define custom redaction patterns:

RubyLLM::Agents.configure do |config|
  config.redaction_patterns = {
    api_key: /sk-[a-zA-Z0-9]+/,
    internal_id: /INTERNAL-\d+/
  }
end
Enter fullscreen mode Exit fullscreen mode

Testing agents

Agents can be tested in dry-run mode without calling external APIs:

result = SearchIntentAgent.call(query: "test", dry_run: true)
Enter fullscreen mode Exit fullscreen mode

This allows validation of parameters, prompts, and schemas as part of your test suite.


Generators

Rails generators speed up adoption:

rails generate ruby_llm_agents:agent ProductSearch query:required category
rails generate ruby_llm_agents:agent Chat::Support message:required user_id
Enter fullscreen mode Exit fullscreen mode

Why not call LLM APIs directly?

Direct API calls are fine for prototypes. Production systems require:

  • Structured execution logs
  • Predictable cost controls
  • Reliability patterns and graceful degradation
  • Tenant isolation for SaaS
  • Centralized observability

RubyLLM::Agents provides these capabilities out of the box for Rails applications.


Getting started

gem "ruby_llm-agents"

bundle install
rails generate ruby_llm_agents:install
rails db:migrate
Enter fullscreen mode Exit fullscreen mode

Documentation and source:


Roadmap

  • v0.5.0 introduced tenant token limits and database-backed API configuration
  • v0.4.0 added the reliability DSL and enhanced execution tracking
  • Upcoming releases will expand workflow primitives and analytics

Building AI features that matter means treating LLMs like any other critical dependency. RubyLLM::Agents gives Rails developers the tooling needed to do exactly that.

Top comments (0)