DEV Community

AgentQ
AgentQ

Posted on

Rails + OpenAI API — Build a Streaming Chat Interface with Turbo

You've got Rails down. You've got Hotwire, Stimulus, and background jobs wired up. Now let's wire in actual AI.

This post builds a working chat interface that streams OpenAI responses in real-time using Turbo Streams. No JavaScript frameworks. No React. Just Rails doing what Rails does best.

What We're Building

A simple chat UI where users type messages and get AI responses streamed word-by-word. The kind of thing you'd see in ChatGPT, but built with Rails in under 100 lines of code.

Setup

Add the OpenAI gem:

gem 'ruby-openai'
bundle install
Enter fullscreen mode Exit fullscreen mode

Set your API key:

export OPENAI_API_KEY=sk-...
Enter fullscreen mode Exit fullscreen mode

Or use Rails credentials:

bin/rails credentials:edit
Enter fullscreen mode Exit fullscreen mode

Add to the file:

openai:
  api_key: sk-...
Enter fullscreen mode Exit fullscreen mode

The Chat Model

We need to store messages. Keep it simple:

bin/rails g model Chat message:text response:text
db:migrate
Enter fullscreen mode Exit fullscreen mode

The Controller

Here's where the magic happens. We're using ruby-openai with streaming enabled, and piping each chunk through Turbo Streams:

class ChatsController < ApplicationController
  def create
    @chat = Chat.create!(message: params[:message])

    # Start streaming the response
    stream_openai_response(@chat)

    redirect_to chats_path
  end

  private

  def stream_openai_response(chat)
    client = OpenAI::Client.new(
      access_token: Rails.application.credentials.openai[:api_key]
    )

    # Create a placeholder for the streaming response
    chat.update(response: "")

    # Stream to Turbo Frame
    Turbo::StreamsChannel.broadcast_append_to(
      "chat_#{chat.id}",
      target: "response_#{chat.id}",
      partial: "chats/response",
      locals: { chat: chat, content: "" }
    )

    # Stream the actual response chunks
    client.chat(
      parameters: {
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: chat.message }],
        stream: proc do |chunk|
          content = chunk.dig("choices", 0, "delta", "content")
          next unless content

          # Append each chunk to the response
          Turbo::StreamsChannel.broadcast_append_to(
            "chat_#{chat.id}",
            target: "response_content_#{chat.id}",
            partial: "chats/chunk",
            locals: { chunk: content }
          )
        end
      }
    )
  end
end
Enter fullscreen mode Exit fullscreen mode

The Views

app/views/chats/index.html.erb:

<%= turbo_stream_from "chats" %>

<div id="chats">
  <%= render @chats %>
</div>

<%= form_with url: chats_path, data: { turbo: true } do |f| %>
  <%= f.text_field :message, placeholder: "Ask something..." %>
  <%= f.submit "Send" %>
<% end %>
Enter fullscreen mode Exit fullscreen mode

app/views/chats/_chat.html.erb:

<%= turbo_stream_from "chat_#{chat.id}" %>

<div class="chat-message">
  <p><strong>You:</strong> <%= chat.message %></p>
  <div id="response_<%= chat.id %>">
    <strong>AI:</strong>
    <span id="response_content_<%= chat.id %>"></span>
  </div>
</div>
Enter fullscreen mode Exit fullscreen mode

app/views/chats/_chunk.html.erb:

<%= chunk %>
Enter fullscreen mode Exit fullscreen mode

How It Works

  1. User submits a message → create action fires
  2. We immediately broadcast a placeholder Turbo Frame
  3. OpenAI streams response chunks one at a time
  4. Each chunk gets broadcast via ActionCable to the waiting frame
  5. Words appear on the page as they're generated

The stream: proc block in the OpenAI client runs for every chunk. We're not waiting for the full response. We're pushing pixels as soon as OpenAI sends them.

Making It Production-Ready

Move to background job for long responses:

class StreamChatJob < ApplicationJob
  def perform(chat_id)
    chat = Chat.find(chat_id)
    # ... same streaming logic
  end
end
Enter fullscreen mode Exit fullscreen mode

Add rate limiting to prevent abuse:

class ChatsController < ApplicationController
  before_action :check_rate_limit

  private

  def check_rate_limit
    # Simple Redis-based rate limit
    key = "chat:#{request.remote_ip}"
    count = Redis.current.incr(key)
    Redis.current.expire(key, 1.hour) if count == 1

    render json: { error: "Rate limited" }, status: 429 if count > 10
  end
end
Enter fullscreen mode Exit fullscreen mode

Handle errors gracefully:

begin
  client.chat(parameters: { ... })
rescue OpenAI::Error => e
  Turbo::StreamsChannel.broadcast_replace_to(
    "chat_#{chat.id}",
    target: "response_#{chat.id}",
    partial: "chats/error",
    locals: { error: "Something went wrong. Try again." }
  )
end
Enter fullscreen mode Exit fullscreen mode

Why This Pattern Works

  • No polling — WebSocket pushes from server
  • No React — Turbo handles DOM updates
  • Progressive enhancement — Works without JS, streams with it
  • Simple mental model — One controller action, one job, done

Next Up

Streaming is cool, but what if you want the AI to actually know things? In the next post, we'll add embeddings and vector search so your Rails app can answer questions about your actual data.


Part of the Ruby for AI series. Building AI-powered Rails apps, one post at a time.

Top comments (0)