DEV Community

Cover image for Working with LLMs in Ruby on Rails: A Simple Guide
JetThoughts Dev for JetThoughts

Posted on • Edited on • Originally published at jetthoughts.com

1 1 1 1 1

Working with LLMs in Ruby on Rails: A Simple Guide

Why You Need to Work with LLMs Today

Large Language Models (LLMs) are reshaping how we build apps. Knowing how to use LLMs lets you create smart, interactive tools that understand and generate text. This skill is now key in modern development. Whether you build chatbots or text analyzers, LLMs can add value. So, let’s dive into how to run an LLM server locally and use it in a Ruby on Rails (RoR) project.

Running LLM Locally with Docker

We will run the Llama 3.1 model using Docker. Llama 3.1 is popular for personal use, and Docker simplifies the setup.

Install Docker: Use the official Docker client with a UI.

  • Run the LLM Server: Use the following command to start the Llama 3.1 with API server
  docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama 
  ollama/ollama
Enter fullscreen mode Exit fullscreen mode
  • Select LLM model:
  docker exec -it ollama ollama run llama3
Enter fullscreen mode Exit fullscreen mode
  • Test the server with:
  curl http://localhost:11434/api/generate -d '{"model": "llama3", "prompt":"Why is the sky blue? Answer with 10 words"}'
Enter fullscreen mode Exit fullscreen mode

If the result looks something like this, then the server has started successfully:

{"model":"llama3","created_at":"2024-08-28T15:01:07.826076294Z","response":"Short","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.154276586Z","response":" wavelength","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.314917461Z","response":" blue","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.490800211Z","response":" light","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.661478628Z","response":" sc","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.83101417Z","response":"atters","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.002102128Z","response":" more","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.175030712Z","response":" in","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.34067667Z","response":" Earth","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.512882962Z","response":"'s","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.685311962Z","response":" atmosphere","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.87469392Z","response":".","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:10.089219045Z","response":"","done":true,"done_reason":"stop","context":[128006,882,128007,271,10445,374,279,13180,6437,30,22559,449,220,605,4339,128009,128006,78191,128007,271,12755,46406,6437,3177,1156,10385,810,304,9420,596,16975,13],"total_duration":12195522088,"load_duration":7132571086,"prompt_eval_count":21,"prompt_eval_duration":2754452000,"eval_count":13,"eval_duration":2263609000}
Enter fullscreen mode Exit fullscreen mode

Llama server API documentation.

Building a Ruby on Rails App

Let’s create a simple RoR app that connects to our LLM server.

  • Create a New Ruby on Rails Project:
rails new llm-chat
Enter fullscreen mode Exit fullscreen mode
  • Generate a Controller with actions:
rails g controller chat index create
Enter fullscreen mode Exit fullscreen mode
  • Add a Routes for Chat: In config/routes.rb, add:
root "chat#index"
post "/", to: "chat#create", controller: :chat
Enter fullscreen mode Exit fullscreen mode
  • Add WebSocket Route: In config/routes.rb, add:
mount ActionCable.server => '/cable'
Enter fullscreen mode Exit fullscreen mode
  • Generate a WebSocket Channel:
rails generate channel Chat
Enter fullscreen mode Exit fullscreen mode
  • Update the Chat Channel: In app/channels/chat_channel.rb, update the code:
class ChatChannel < ApplicationCable::Channel
  def subscribed
    stream_from "chat_channel"
  end

  def unsubscribed
  end
end
Enter fullscreen mode Exit fullscreen mode
  • Update the Controller: In app/controllers/chat_controller.rb, modify the create method:
class ChatController < ApplicationController
  def index; end

  def create
    LlmJob.perform_later("http://localhost:11434/api/generate", params[:chat][:query])

    head :ok
  end
end
Enter fullscreen mode Exit fullscreen mode
  • Create LlmJob:
╰─ $ rails generate job Llm
      invoke  test_unit
      create    test/jobs/llm_job_test.rb
      create  app/jobs/llm_job.rb
Enter fullscreen mode Exit fullscreen mode
  • LlmJob code:
require 'net/http'

class LlmJob < ApplicationJob
  queue_as :default

  def perform(api_endpoint, prompt)
    uri = URI(api_endpoint)
    req = Net::HTTP::Post.new(uri, 'Content-Type' => 'application/json')
    req.body = { model: "llama3", prompt: prompt }.to_json

    Net::HTTP.start(uri.hostname, uri.port) do |http|
      http.request(req) do |response|
        response.read_body do |chunk|
          parsed_response = JSON.parse(chunk)
          ActionCable.server.broadcast(
            "chat_channel",
            { message: parsed_response['response'], done: parsed_response['done'] }
          )
        end
      end
    end
  end
end
Enter fullscreen mode Exit fullscreen mode
  • Frontend Chat Channel: In app/javascript/channels/chat_channel.js, add:
import { createConsumer } from "@rails/actioncable"

const consumer = createConsumer()

consumer.subscriptions.create("ChatChannel", {
  received(data) {
    document.getElementById("send-request").disabled = true;
    const chatBox = document.getElementById('chat-box');

    let botMessageElement = chatBox.querySelector('div[data-status="pending"]');

    if (!botMessageElement) {
      botMessageElement = document.createElement('div');
      botMessageElement.className = 'message bot';
      botMessageElement.setAttribute('data-status', 'pending');
      chatBox.appendChild(botMessageElement);
    }

    botMessageElement.textContent += ` ${data.message}`;

    if (data.done) {
      botMessageElement.setAttribute('data-status', 'done');
      document.getElementById("send-request").disabled = false;
    }

    chatBox.scrollTop = chatBox.scrollHeight;
  }
});
Enter fullscreen mode Exit fullscreen mode

Image description

Conclusion

Now, you have a basic RoR app that interacts with an LLM server. The server sends responses in chunks, and the app displays them in real-time. This setup is a powerful way to integrate AI into your apps.

Full code you can find here: Github repo

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (1)

Collapse
 
nickisnoble profile image
Nick Noble • Edited

This is wonderfully straightforward, thank you for writing.

What if you want to save the messages?

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay