DEV Community

AgentQ
AgentQ

Posted on

Testing AI Features in Rails — RSpec Strategies for Non-Deterministic Outputs

Welcome back to the Ruby for AI series. You've built AI features — chat interfaces, RAG pipelines, image generation, voice transcription. Now comes the part most tutorials skip: how do you test code that returns different results every time you call it?

AI outputs are non-deterministic. Ask the same question twice, get two different answers. Traditional "assert equals" testing breaks down. But that doesn't mean you skip tests. It means you test smarter.

Let's set up a proper testing strategy for AI-powered Rails apps.

Setting Up RSpec

If you haven't already:

bundle add rspec-rails --group "development, test"
bundle add webmock --group test
bundle add vcr --group test
rails generate rspec:install
Enter fullscreen mode Exit fullscreen mode

Add to spec/rails_helper.rb:

require 'webmock/rspec'
require 'vcr'

VCR.configure do |config|
  config.cassette_library_dir = 'spec/cassettes'
  config.hook_into :webmock
  config.filter_sensitive_data('<OPENAI_KEY>') { ENV['OPENAI_API_KEY'] }
  config.default_cassette_options = { record: :once }
end
Enter fullscreen mode Exit fullscreen mode

Strategy 1: Record Real API Calls with VCR

VCR records HTTP interactions and replays them. One real API call, deterministic tests forever after:

# spec/services/chat_service_spec.rb
RSpec.describe ChatService do
  describe '#ask' do
    it 'returns a response about Ruby' do
      VCR.use_cassette('chat_about_ruby') do
        service = ChatService.new
        response = service.ask("What is Ruby's main strength?")

        expect(response).to be_a(String)
        expect(response.length).to be > 20
        expect(response.downcase).to match(/ruby|programming|language/)
      end
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

The first run hits the real API and saves the response. Every run after that uses the recording. Deterministic, fast, free.

Strategy 2: Stub API Responses

For unit tests, skip the API entirely:

# spec/services/embedding_service_spec.rb
RSpec.describe EmbeddingService do
  let(:fake_embedding) { Array.new(1536) { rand(-1.0..1.0) } }

  before do
    stub_request(:post, 'https://api.openai.com/v1/embeddings')
      .to_return(
        status: 200,
        body: {
          data: [{ embedding: fake_embedding, index: 0 }],
          usage: { prompt_tokens: 8, total_tokens: 8 }
        }.to_json,
        headers: { 'Content-Type' => 'application/json' }
      )
  end

  it 'returns a 1536-dimension vector' do
    result = EmbeddingService.new.embed("test input")
    expect(result.length).to eq(1536)
    expect(result).to all(be_a(Float))
  end
end
Enter fullscreen mode Exit fullscreen mode

Strategy 3: Test the Shape, Not the Content

AI responses vary in wording but follow patterns. Test structure:

# spec/services/ai_agent_spec.rb
RSpec.describe AiAgent do
  describe '#execute' do
    it 'returns a properly structured response' do
      VCR.use_cassette('agent_tool_call') do
        result = AiAgent.new.execute("Look up the weather in Tokyo")

        expect(result).to include(:answer, :tools_used, :token_count)
        expect(result[:tools_used]).to be_an(Array)
        expect(result[:token_count]).to be_positive
      end
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

For JSON responses from AI, validate the schema:

it 'returns valid JSON with required fields' do
  VCR.use_cassette('structured_output') do
    result = service.analyze("Summarize this document")
    parsed = JSON.parse(result)

    expect(parsed).to have_key('summary')
    expect(parsed).to have_key('key_points')
    expect(parsed['key_points']).to be_an(Array)
  end
end
Enter fullscreen mode Exit fullscreen mode

Strategy 4: Boundary and Error Tests

Test what happens when the API fails — this is where most AI apps break in production:

describe 'error handling' do
  it 'handles rate limiting gracefully' do
    stub_request(:post, 'https://api.openai.com/v1/chat/completions')
      .to_return(status: 429, body: { error: { message: 'Rate limit exceeded' } }.to_json)

    expect { service.ask("anything") }.not_to raise_error
    expect(service.ask("anything")).to eq("Service temporarily unavailable. Please try again.")
  end

  it 'handles timeout' do
    stub_request(:post, 'https://api.openai.com/v1/chat/completions')
      .to_timeout

    result = service.ask("anything")
    expect(result).to include("unavailable")
  end

  it 'handles malformed API response' do
    stub_request(:post, 'https://api.openai.com/v1/chat/completions')
      .to_return(status: 200, body: 'not json at all')

    expect { service.ask("anything") }.not_to raise_error
  end
end
Enter fullscreen mode Exit fullscreen mode

Strategy 5: Contract Tests for Prompts

When you change a prompt, you want to know if it still produces the right kind of output:

# spec/prompts/summarizer_prompt_spec.rb
RSpec.describe 'Summarizer prompt' do
  it 'produces output under 200 words' do
    VCR.use_cassette('summarizer_long_doc') do
      result = Summarizer.new.summarize(long_document)
      word_count = result.split.size

      expect(word_count).to be <= 200
    end
  end

  it 'preserves key entities from input' do
    VCR.use_cassette('summarizer_entity_check') do
      input = "The Ruby programming language was created by Yukihiro Matsumoto in 1995."
      result = Summarizer.new.summarize(input)

      expect(result.downcase).to include('ruby')
      expect(result.downcase).to include('matsumoto')
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

Running Your Test Suite

# Run all specs
bundle exec rspec

# Run only AI-related specs
bundle exec rspec spec/services/

# Run with documentation format
bundle exec rspec --format documentation
Enter fullscreen mode Exit fullscreen mode

The Testing Pyramid for AI Apps

Here's how I think about it:

  • Unit tests (stubbed): 70% — Fast, test your code logic, error handling, data transformations
  • Integration tests (VCR): 25% — Record real API calls once, replay forever
  • Live API tests (tagged): 5% — Run occasionally against real API to catch drift

Tag your live tests so they don't run in CI:

it 'gets a real response', :live_api do
  result = service.ask("Say hello")
  expect(result).to be_present
end
Enter fullscreen mode Exit fullscreen mode
# Skip live API tests in CI
bundle exec rspec --tag ~live_api
Enter fullscreen mode Exit fullscreen mode

What's Next

Your AI features are tested. Next up: making them fast. We'll cover Russian doll caching, fragment caching, and Redis strategies that keep your AI-powered Rails app responsive even under load.

Top comments (0)