AgentQ

Posted on Apr 11

Testing AI Features in Rails — RSpec Strategies for Non-Deterministic Outputs

#ruby #rails #testing #ai

Welcome back to the Ruby for AI series. You've built AI features — chat interfaces, RAG pipelines, image generation, voice transcription. Now comes the part most tutorials skip: how do you test code that returns different results every time you call it?

AI outputs are non-deterministic. Ask the same question twice, get two different answers. Traditional "assert equals" testing breaks down. But that doesn't mean you skip tests. It means you test smarter.

Let's set up a proper testing strategy for AI-powered Rails apps.

Setting Up RSpec

If you haven't already:

bundle add rspec-rails --group "development, test"
bundle add webmock --group test
bundle add vcr --group test
rails generate rspec:install

Add to spec/rails_helper.rb:

require 'webmock/rspec'
require 'vcr'

VCR.configure do |config|
  config.cassette_library_dir = 'spec/cassettes'
  config.hook_into :webmock
  config.filter_sensitive_data('<OPENAI_KEY>') { ENV['OPENAI_API_KEY'] }
  config.default_cassette_options = { record: :once }
end

Strategy 1: Record Real API Calls with VCR

VCR records HTTP interactions and replays them. One real API call, deterministic tests forever after:

# spec/services/chat_service_spec.rb
RSpec.describe ChatService do
  describe '#ask' do
    it 'returns a response about Ruby' do
      VCR.use_cassette('chat_about_ruby') do
        service = ChatService.new
        response = service.ask("What is Ruby's main strength?")

        expect(response).to be_a(String)
        expect(response.length).to be > 20
        expect(response.downcase).to match(/ruby|programming|language/)
      end
    end
  end
end

The first run hits the real API and saves the response. Every run after that uses the recording. Deterministic, fast, free.

Strategy 2: Stub API Responses

For unit tests, skip the API entirely:

# spec/services/embedding_service_spec.rb
RSpec.describe EmbeddingService do
  let(:fake_embedding) { Array.new(1536) { rand(-1.0..1.0) } }

  before do
    stub_request(:post, 'https://api.openai.com/v1/embeddings')
      .to_return(
        status: 200,
        body: {
          data: [{ embedding: fake_embedding, index: 0 }],
          usage: { prompt_tokens: 8, total_tokens: 8 }
        }.to_json,
        headers: { 'Content-Type' => 'application/json' }
      )
  end

  it 'returns a 1536-dimension vector' do
    result = EmbeddingService.new.embed("test input")
    expect(result.length).to eq(1536)
    expect(result).to all(be_a(Float))
  end
end

Strategy 3: Test the Shape, Not the Content

AI responses vary in wording but follow patterns. Test structure:

# spec/services/ai_agent_spec.rb
RSpec.describe AiAgent do
  describe '#execute' do
    it 'returns a properly structured response' do
      VCR.use_cassette('agent_tool_call') do
        result = AiAgent.new.execute("Look up the weather in Tokyo")

        expect(result).to include(:answer, :tools_used, :token_count)
        expect(result[:tools_used]).to be_an(Array)
        expect(result[:token_count]).to be_positive
      end
    end
  end
end

For JSON responses from AI, validate the schema:

it 'returns valid JSON with required fields' do
  VCR.use_cassette('structured_output') do
    result = service.analyze("Summarize this document")
    parsed = JSON.parse(result)

    expect(parsed).to have_key('summary')
    expect(parsed).to have_key('key_points')
    expect(parsed['key_points']).to be_an(Array)
  end
end

Strategy 4: Boundary and Error Tests

Test what happens when the API fails — this is where most AI apps break in production:

describe 'error handling' do
  it 'handles rate limiting gracefully' do
    stub_request(:post, 'https://api.openai.com/v1/chat/completions')
      .to_return(status: 429, body: { error: { message: 'Rate limit exceeded' } }.to_json)

    expect { service.ask("anything") }.not_to raise_error
    expect(service.ask("anything")).to eq("Service temporarily unavailable. Please try again.")
  end

  it 'handles timeout' do
    stub_request(:post, 'https://api.openai.com/v1/chat/completions')
      .to_timeout

    result = service.ask("anything")
    expect(result).to include("unavailable")
  end

  it 'handles malformed API response' do
    stub_request(:post, 'https://api.openai.com/v1/chat/completions')
      .to_return(status: 200, body: 'not json at all')

    expect { service.ask("anything") }.not_to raise_error
  end
end

Strategy 5: Contract Tests for Prompts

When you change a prompt, you want to know if it still produces the right kind of output:

# spec/prompts/summarizer_prompt_spec.rb
RSpec.describe 'Summarizer prompt' do
  it 'produces output under 200 words' do
    VCR.use_cassette('summarizer_long_doc') do
      result = Summarizer.new.summarize(long_document)
      word_count = result.split.size

      expect(word_count).to be <= 200
    end
  end

  it 'preserves key entities from input' do
    VCR.use_cassette('summarizer_entity_check') do
      input = "The Ruby programming language was created by Yukihiro Matsumoto in 1995."
      result = Summarizer.new.summarize(input)

      expect(result.downcase).to include('ruby')
      expect(result.downcase).to include('matsumoto')
    end
  end
end

Running Your Test Suite

# Run all specs
bundle exec rspec

# Run only AI-related specs
bundle exec rspec spec/services/

# Run with documentation format
bundle exec rspec --format documentation

The Testing Pyramid for AI Apps

Here's how I think about it:

Unit tests (stubbed): 70% — Fast, test your code logic, error handling, data transformations
Integration tests (VCR): 25% — Record real API calls once, replay forever
Live API tests (tagged): 5% — Run occasionally against real API to catch drift

Tag your live tests so they don't run in CI:

it 'gets a real response', :live_api do
  result = service.ask("Say hello")
  expect(result).to be_present
end

# Skip live API tests in CI
bundle exec rspec --tag ~live_api

What's Next

Your AI features are tested. Next up: making them fast. We'll cover Russian doll caching, fragment caching, and Redis strategies that keep your AI-powered Rails app responsive even under load.

DEV Community