AgentQ

Posted on Apr 13

Monitoring and Observability for AI-Powered Rails Apps

#ruby #rails #devops #ai

You built the AI features. You tested them. You cached the hell out of them. You hardened your background jobs.

Now you need to know when things break — before your users tell you.

This is post #28 in the Ruby for AI series. We're covering logging, error tracking, and application performance monitoring (APM) for Rails apps with AI features. Because AI workloads are different — they're slow, expensive, and fail in weird ways.

The Problem with AI Observability

Traditional Rails monitoring tracks request times, database queries, and error rates. AI features add new dimensions:

API latency — OpenAI calls take 2-30 seconds. That's not a bug, that's Tuesday.
Token costs — A runaway loop can burn $50 in minutes.
Non-deterministic failures — The same input gives different outputs. Sometimes bad ones.
Rate limits — You hit them. You need to know when.

You need monitoring that understands these patterns.

Structured Logging with Lograge

First, replace Rails' verbose default logging with something parseable:

# Gemfile
gem "lograge"
gem "logstash-event"

# config/environments/production.rb
config.lograge.enabled = true
config.lograge.formatter = Lograge::Formatters::Json.new

config.lograge.custom_payload do |controller|
  {
    user_id: controller.current_user&.id,
    request_id: controller.request.request_id
  }
end

Now add AI-specific logging. Create a concern you can include in any service:

# app/concerns/ai_loggable.rb
module AiLoggable
  extend ActiveSupport::Concern

  private

  def log_ai_request(provider:, model:, tokens_in:, tokens_out:, duration_ms:, cost_usd: nil)
    Rails.logger.info({
      event: "ai_request",
      provider: provider,
      model: model,
      tokens_in: tokens_in,
      tokens_out: tokens_out,
      duration_ms: duration_ms,
      cost_usd: cost_usd || estimate_cost(model, tokens_in, tokens_out)
    }.to_json)
  end

  def estimate_cost(model, tokens_in, tokens_out)
    rates = {
      "gpt-4o" => { input: 2.50 / 1_000_000, output: 10.0 / 1_000_000 },
      "gpt-4o-mini" => { input: 0.15 / 1_000_000, output: 0.60 / 1_000_000 }
    }
    rate = rates[model] || { input: 0.0, output: 0.0 }
    (tokens_in * rate[:input] + tokens_out * rate[:output]).round(6)
  end
end

Use it in your AI services:

class ChatService
  include AiLoggable

  def call(prompt)
    start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
    response = client.chat(parameters: { model: "gpt-4o", messages: [{ role: "user", content: prompt }] })
    duration = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000).round

    usage = response.dig("usage")
    log_ai_request(
      provider: "openai",
      model: "gpt-4o",
      tokens_in: usage["prompt_tokens"],
      tokens_out: usage["completion_tokens"],
      duration_ms: duration
    )

    response
  end
end

Tracking Costs with a Database Table

Logs are great for debugging. For dashboards and alerts, persist the data:

rails generate model AiUsageLog provider:string model:string tokens_in:integer \
  tokens_out:integer duration_ms:integer cost_usd:decimal request_id:string \
  user_id:references status:string error_message:text
rails db:migrate

# app/models/ai_usage_log.rb
class AiUsageLog < ApplicationRecord
  belongs_to :user, optional: true

  scope :today, -> { where("created_at >= ?", Time.current.beginning_of_day) }
  scope :this_month, -> { where("created_at >= ?", Time.current.beginning_of_month) }
  scope :failed, -> { where(status: "error") }

  def self.total_cost(scope = all)
    scope.sum(:cost_usd)
  end

  def self.avg_latency(scope = all)
    scope.average(:duration_ms)&.round || 0
  end
end

Query your spend anytime:

AiUsageLog.this_month.total_cost          # => 47.23
AiUsageLog.today.failed.count             # => 3
AiUsageLog.today.avg_latency              # => 2340 (ms)

Error Tracking with Exception Notification

# Gemfile
gem "exception_notification"

# config/initializers/exception_notification.rb
Rails.application.config.middleware.use ExceptionNotification::Rack,
  email: {
    email_prefix: "[AI App ERROR] ",
    sender_address: %("App" <errors@yourapp.com>),
    exception_recipients: %w[you@yourapp.com]
  }

For AI-specific errors, create a custom handler:

# app/services/ai_error_handler.rb
class AiErrorHandler
  RETRIABLE_ERRORS = [Faraday::TimeoutError, Faraday::ConnectionFailed].freeze

  def self.handle(error, context: {})
    AiUsageLog.create!(
      status: "error",
      error_message: error.message,
      provider: context[:provider],
      model: context[:model],
      tokens_in: 0, tokens_out: 0,
      duration_ms: context[:duration_ms] || 0,
      cost_usd: 0
    )

    case error
    when *RETRIABLE_ERRORS
      Rails.logger.warn("AI retriable error: #{error.message}")
    else
      Rails.logger.error("AI fatal error: #{error.message}")
      ExceptionNotifier.notify_exception(error, data: context)
    end
  end
end

Health Check Endpoint

Every production app needs a health check. Make yours AI-aware:

# config/routes.rb
get "/health", to: "health#show"

# app/controllers/health_controller.rb
class HealthController < ApplicationController
  skip_before_action :authenticate_user!

  def show
    checks = {
      database: check_database,
      redis: check_redis,
      ai_budget: check_ai_budget
    }

    status = checks.values.all? ? :ok : :service_unavailable
    render json: { status: status, checks: checks }, status: status
  end

  private

  def check_database
    ActiveRecord::Base.connection.execute("SELECT 1")
    true
  rescue StandardError
    false
  end

  def check_redis
    Redis.current.ping == "PONG"
  rescue StandardError
    false
  end

  def check_ai_budget
    AiUsageLog.today.total_cost < ENV.fetch("DAILY_AI_BUDGET", 100).to_f
  end
end

Budget Alerts

The scariest thing about AI in production: runaway costs. Add a simple guard:

# app/services/budget_guard.rb
class BudgetGuard
  DAILY_LIMIT = ENV.fetch("DAILY_AI_BUDGET", 100).to_f

  def self.allow_request?
    current_spend = AiUsageLog.today.total_cost
    if current_spend >= DAILY_LIMIT
      Rails.logger.error("AI budget exceeded: $#{current_spend}/$#{DAILY_LIMIT}")
      false
    else
      true
    end
  end
end

Check it before every AI call:

def call(prompt)
  raise "Daily AI budget exceeded" unless BudgetGuard.allow_request?
  # ... proceed with API call
end

What You Have Now

After this post, your Rails AI app has:

Structured JSON logging with AI-specific fields
Cost tracking in the database with queryable scopes
Error handling that distinguishes retriable from fatal failures
Health checks that include AI budget status
Budget guards that prevent runaway spending

Next up: we leave the code behind and hit the server. Post #29 covers deploying your Rails AI app to a real VPS with Kamal and Docker.

DEV Community