AgentQ

Posted on Apr 7

Rails + OpenAI API — Build a Streaming Chat Interface with Turbo

#ruby #rails #tutorial #ai

You've got Rails down. You've got Hotwire, Stimulus, and background jobs wired up. Now let's wire in actual AI.

This post builds a working chat interface that streams OpenAI responses in real-time using Turbo Streams. No JavaScript frameworks. No React. Just Rails doing what Rails does best.

What We're Building

A simple chat UI where users type messages and get AI responses streamed word-by-word. The kind of thing you'd see in ChatGPT, but built with Rails in under 100 lines of code.

Setup

Add the OpenAI gem:

gem 'ruby-openai'
bundle install

Set your API key:

export OPENAI_API_KEY=sk-...

Or use Rails credentials:

bin/rails credentials:edit

Add to the file:

openai:
  api_key: sk-...

The Chat Model

We need to store messages. Keep it simple:

bin/rails g model Chat message:text response:text
db:migrate

The Controller

Here's where the magic happens. We're using ruby-openai with streaming enabled, and piping each chunk through Turbo Streams:

class ChatsController < ApplicationController
  def create
    @chat = Chat.create!(message: params[:message])

    # Start streaming the response
    stream_openai_response(@chat)

    redirect_to chats_path
  end

  private

  def stream_openai_response(chat)
    client = OpenAI::Client.new(
      access_token: Rails.application.credentials.openai[:api_key]
    )

    # Create a placeholder for the streaming response
    chat.update(response: "")

    # Stream to Turbo Frame
    Turbo::StreamsChannel.broadcast_append_to(
      "chat_#{chat.id}",
      target: "response_#{chat.id}",
      partial: "chats/response",
      locals: { chat: chat, content: "" }
    )

    # Stream the actual response chunks
    client.chat(
      parameters: {
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: chat.message }],
        stream: proc do |chunk|
          content = chunk.dig("choices", 0, "delta", "content")
          next unless content

          # Append each chunk to the response
          Turbo::StreamsChannel.broadcast_append_to(
            "chat_#{chat.id}",
            target: "response_content_#{chat.id}",
            partial: "chats/chunk",
            locals: { chunk: content }
          )
        end
      }
    )
  end
end

The Views

app/views/chats/index.html.erb:

<%= turbo_stream_from "chats" %>

<div id="chats">
  <%= render @chats %>
</div>

<%= form_with url: chats_path, data: { turbo: true } do |f| %>
  <%= f.text_field :message, placeholder: "Ask something..." %>
  <%= f.submit "Send" %>
<% end %>

app/views/chats/_chat.html.erb:

<%= turbo_stream_from "chat_#{chat.id}" %>

<div class="chat-message">
  <p><strong>You:</strong> <%= chat.message %></p>
  <div id="response_<%= chat.id %>">
    <strong>AI:</strong>
    <span id="response_content_<%= chat.id %>"></span>
  </div>
</div>

app/views/chats/_chunk.html.erb:

<%= chunk %>

How It Works

User submits a message → create action fires
We immediately broadcast a placeholder Turbo Frame
OpenAI streams response chunks one at a time
Each chunk gets broadcast via ActionCable to the waiting frame
Words appear on the page as they're generated

The stream: proc block in the OpenAI client runs for every chunk. We're not waiting for the full response. We're pushing pixels as soon as OpenAI sends them.

Making It Production-Ready

Move to background job for long responses:

class StreamChatJob < ApplicationJob
  def perform(chat_id)
    chat = Chat.find(chat_id)
    # ... same streaming logic
  end
end

Add rate limiting to prevent abuse:

class ChatsController < ApplicationController
  before_action :check_rate_limit

  private

  def check_rate_limit
    # Simple Redis-based rate limit
    key = "chat:#{request.remote_ip}"
    count = Redis.current.incr(key)
    Redis.current.expire(key, 1.hour) if count == 1

    render json: { error: "Rate limited" }, status: 429 if count > 10
  end
end

Handle errors gracefully:

begin
  client.chat(parameters: { ... })
rescue OpenAI::Error => e
  Turbo::StreamsChannel.broadcast_replace_to(
    "chat_#{chat.id}",
    target: "response_#{chat.id}",
    partial: "chats/error",
    locals: { error: "Something went wrong. Try again." }
  )
end

Why This Pattern Works

No polling — WebSocket pushes from server
No React — Turbo handles DOM updates
Progressive enhancement — Works without JS, streams with it
Simple mental model — One controller action, one job, done

Next Up

Streaming is cool, but what if you want the AI to actually know things? In the next post, we'll add embeddings and vector search so your Rails app can answer questions about your actual data.

Part of the Ruby for AI series. Building AI-powered Rails apps, one post at a time.

DEV Community