<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Carmine Paolino</title>
    <description>The latest articles on DEV Community by Carmine Paolino (@crmne).</description>
    <link>https://dev.to/crmne</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1861137%2Ff47aebaa-8631-4352-a600-ce151703cc9f.jpg</url>
      <title>DEV Community: Carmine Paolino</title>
      <link>https://dev.to/crmne</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/crmne"/>
    <language>en</language>
    <item>
      <title>Async Ruby is the Future of AI Apps (And It's Already Here)</title>
      <dc:creator>Carmine Paolino</dc:creator>
      <pubDate>Wed, 09 Jul 2025 12:11:37 +0000</pubDate>
      <link>https://dev.to/crmne/async-ruby-is-the-future-of-ai-apps-and-its-already-here-5g28</link>
      <guid>https://dev.to/crmne/async-ruby-is-the-future-of-ai-apps-and-its-already-here-5g28</guid>
      <description>&lt;p&gt;After a decade as an ML engineer/scientist immersed in Python's async ecosystem, returning to Ruby felt like stepping back in time. Where was the async revolution? Why was everyone still using threads for everything? SolidQueue, Sidekiq, GoodJob -- all thread-based. Even newer solutions defaulted to the same concurrency model.&lt;/p&gt;

&lt;p&gt;Coming from Python, where the entire community had reorganized around &lt;code&gt;asyncio&lt;/code&gt;, this seemed bizarre. FastAPI replaced Flask. Every library spawned an async twin. The transformation was total and necessary.&lt;/p&gt;

&lt;p&gt;Then, building &lt;a href="https://rubyllm.com" rel="noopener noreferrer"&gt;RubyLLM&lt;/a&gt; and &lt;a href="https://chatwitwork.com" rel="noopener noreferrer"&gt;Chat with Work&lt;/a&gt;, I noticed that &lt;em&gt;LLM communication is async Ruby's killer app&lt;/em&gt;. The unique demands of streaming AI responses -- long-lived connections, token-by-token delivery, thousands of concurrent conversations -- expose exactly why async matters.&lt;/p&gt;

&lt;p&gt;Here's the exciting bit: once I understood Ruby's approach to async, I realized it's actually &lt;em&gt;superior&lt;/em&gt; to Python's. While Python forced everyone to rewrite their entire stack, Ruby quietly built something better. Your existing code just works. No syntax changes. No library migrations. Just better performance when you need it.&lt;/p&gt;

&lt;p&gt;The async ecosystem that &lt;a href="https://github.com/ioquatix" rel="noopener noreferrer"&gt;Samuel Williams&lt;/a&gt; and others have been building for years suddenly makes perfect sense. We just needed the right use case to see it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why LLM Communication Breaks Everything
&lt;/h2&gt;

&lt;p&gt;LLM applications create a perfect storm of challenges that expose every weakness in thread-based concurrency:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Slot Starvation
&lt;/h3&gt;

&lt;p&gt;Configure any thread-based job queue with 25 workers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StreamAIResponseJob&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationJob&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;perform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# This job occupies 1 of your 25 slots...&lt;/span&gt;
    &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="c1"&gt;# ...for the ENTIRE streaming duration (30-60 seconds)&lt;/span&gt;
      &lt;span class="n"&gt;broadcast_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="c1"&gt;# Thread is 99% idle, just waiting for tokens&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="c1"&gt;# Slot only freed here, after full response&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your 26th user? They're waiting in line. Not because your server is busy, but because all your workers are occupied by jobs waiting for tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Resource Multiplication
&lt;/h3&gt;

&lt;p&gt;Each thread needs its own:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database connection (25 threads = 25 connections minimum)&lt;/li&gt;
&lt;li&gt;Stack memory allocation&lt;/li&gt;
&lt;li&gt;OS thread management overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For 1000 concurrent conversations, you'd need 1000 threads. Each thread needs its database connection. That's 1000 database connections for threads that are 99% idle.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Performance Overhead
&lt;/h3&gt;

&lt;p&gt;Real benchmarks show&lt;sup id="fnref1"&gt;1&lt;/sup&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creating a thread: ~80μs&lt;/li&gt;
&lt;li&gt;Thread context switch: ~1.3μs&lt;/li&gt;
&lt;li&gt;Maximum throughput: ~5,000 requests/second&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you're handling thousands of streaming connections, these microseconds add up to real latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Scalability Challenges
&lt;/h3&gt;

&lt;p&gt;Try creating 10,000 threads and the OS scheduler starts to struggle. The overhead becomes crushing. Yet modern AI apps need to handle thousands of concurrent conversations.&lt;/p&gt;

&lt;p&gt;These aren't separate issues -- they're all symptoms of the same architectural mismatch. LLM communication is fundamentally different from traditional background jobs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Concurrency: Threads vs Async
&lt;/h2&gt;

&lt;p&gt;To understand why LLM applications are async's perfect use case -- and why Ruby's implementation is so elegant -- we need to build up from first principles.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Hierarchy: Processes, Threads, and Fibers
&lt;/h3&gt;

&lt;p&gt;Think of your computer as an office building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Processes&lt;/strong&gt; are like separate offices -- each with its own locked door, furniture, and files. They can't see into each other's spaces (memory isolation).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Threads&lt;/strong&gt; are like workers sharing the same office -- they can access the same filing cabinets (shared memory) but need to coordinate to avoid collisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fibers&lt;/strong&gt; are like multiple tasks juggled by one worker at their desk -- switching between them manually when waiting for something (like a phone call).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scheduling: The Core Difference
&lt;/h3&gt;

&lt;p&gt;The fundamental question in concurrency is: who decides when to switch between tasks?&lt;/p&gt;

&lt;h4&gt;
  
  
  Threads: Preemptive Multitasking
&lt;/h4&gt;

&lt;p&gt;With threads, the operating system is the boss. It forcibly interrupts running threads to give others a turn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# You start threads, but the OS controls them&lt;/span&gt;
&lt;span class="n"&gt;threads&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="no"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="c1"&gt;# This might be interrupted at ANY point&lt;/span&gt;
    &lt;span class="n"&gt;expensive_calculation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fetch_from_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Each thread blocks individually here&lt;/span&gt;
    &lt;span class="n"&gt;process_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each thread:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gets scheduled by the OS kernel&lt;/li&gt;
&lt;li&gt;Can be interrupted mid-execution (in Ruby, after 100ms)&lt;/li&gt;
&lt;li&gt;Blocks individually on I/O operations&lt;/li&gt;
&lt;li&gt;Requires OS resources and kernel data structures&lt;/li&gt;
&lt;li&gt;Needs its own resources (like database connections)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Fibers: Cooperative Concurrency
&lt;/h4&gt;

&lt;p&gt;With fibers, switching is voluntary -- they only yield at I/O boundaries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fibers yield control cooperatively&lt;/span&gt;
&lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;fibers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="n"&gt;expensive_calculation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Runs to completion&lt;/span&gt;
      &lt;span class="n"&gt;fetch_from_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c1"&gt;# Yields here, other fibers run&lt;/span&gt;
      &lt;span class="n"&gt;process_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c1"&gt;# Continues after I/O completes&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each fiber:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Schedules itself by yielding during I/O&lt;/li&gt;
&lt;li&gt;Never gets interrupted mid-calculation&lt;/li&gt;
&lt;li&gt;Managed entirely in userspace (no kernel involvement)&lt;/li&gt;
&lt;li&gt;Shares resources through the event loop&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Ruby's GVL: Why Fibers Make Even More Sense
&lt;/h3&gt;

&lt;p&gt;Ruby's Global VM Lock (GVL) means only one thread can execute Ruby code at a time. Threads are preempted after a 100ms time quantum.&lt;/p&gt;

&lt;p&gt;This creates an interesting dynamic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# CPU work: Threads don't help much due to GVL&lt;/span&gt;
&lt;span class="n"&gt;threads&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;calculate_fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="c1"&gt;# Takes about the same time as sequential execution!&lt;/span&gt;

&lt;span class="c1"&gt;# I/O work: Threads do parallelize (GVL released during I/O)&lt;/span&gt;
&lt;span class="n"&gt;threads&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="no"&gt;Net&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;HTTP&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="c1"&gt;# Takes 1/4 the time of sequential execution&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But here's the thing: if threads only help with I/O anyway, &lt;em&gt;why pay their overhead&lt;/em&gt;?&lt;/p&gt;

&lt;h3&gt;
  
  
  The I/O Multiplexing Advantage
&lt;/h3&gt;

&lt;p&gt;This is where fibers truly shine. Threads use a "one thread, one I/O operation" model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Traditional threading approach&lt;/span&gt;
&lt;span class="n"&gt;thread1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;socket1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;# Blocks this thread&lt;/span&gt;
&lt;span class="n"&gt;thread2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;socket2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;# Blocks this thread&lt;/span&gt;
&lt;span class="n"&gt;thread3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;socket3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;# Blocks this thread&lt;/span&gt;
&lt;span class="c1"&gt;# Need 3 threads for 3 concurrent I/O operations&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fibers use I/O multiplexing -- one thread monitors &lt;em&gt;all&lt;/em&gt; I/O:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Async's approach (simplified)&lt;/span&gt;
&lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="c1"&gt;# One thread, many I/O operations&lt;/span&gt;
  &lt;span class="n"&gt;task1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;socket1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;# Registers with selector&lt;/span&gt;
  &lt;span class="n"&gt;task2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;socket2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;# Registers with selector&lt;/span&gt;
  &lt;span class="n"&gt;task3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;socket3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;# Registers with selector&lt;/span&gt;

  &lt;span class="c1"&gt;# Event loop uses epoll/kqueue to monitor ALL sockets&lt;/span&gt;
  &lt;span class="c1"&gt;# Resumes fibers as data becomes available&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The kernel (via &lt;code&gt;epoll&lt;/code&gt;, &lt;code&gt;kqueue&lt;/code&gt;, or &lt;code&gt;io_uring&lt;/code&gt;) can monitor thousands of file descriptors with a single system call. No thread-per-connection needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Fibers Win: The Complete Picture
&lt;/h3&gt;

&lt;p&gt;Let's look at real benchmark data comparing fibers to threads&lt;sup id="fnref1"&gt;1&lt;/sup&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance Advantages (Ruby 3.4 data)&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;20x faster allocation&lt;/strong&gt;: Creating a fiber takes ~3μs vs ~80μs for a thread&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10x faster context switching&lt;/strong&gt;: Fiber switches in ~0.1μs vs ~1.3μs for threads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;15x higher throughput&lt;/strong&gt;: ~80,000 vs ~5,000 requests/second&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the real advantage is &lt;strong&gt;scalability&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fewer OS Resources&lt;/strong&gt;: Fibers are managed in userspace, avoiding kernel overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient Scheduling&lt;/strong&gt;: No kernel involvement means less overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I/O Multiplexing&lt;/strong&gt;: One thread monitors thousands of I/O operations via &lt;code&gt;epoll&lt;/code&gt;/&lt;code&gt;kqueue&lt;/code&gt;/&lt;code&gt;io_uring&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GVL-Friendly&lt;/strong&gt;: Cooperative scheduling works naturally with Ruby's concurrency model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Sharing&lt;/strong&gt;: Database connections and memory pools are naturally shared&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;While memory usage between fibers and threads is comparable, fibers don't depend on OS resources. You can create vastly more fibers than threads, switch between them faster, and manage them more efficiently while monitoring thousands of connections -- all from userspace.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Async Solves Every LLM Challenge
&lt;/h2&gt;

&lt;p&gt;Remember those four problems? Here's how async addresses each one:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No More Slot Starvation&lt;/strong&gt;: Fibers are created on-demand and destroyed immediately. No fixed worker pools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared Resources&lt;/strong&gt;: One process with a few pooled database connections can handle thousands of conversations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved Performance&lt;/strong&gt;: 20x faster to create, 10x faster to switch, 15x less scheduling overhead (synthetic upper bound).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Massively Improved Scalability&lt;/strong&gt;: 10,000+ concurrent fibers? No problem. The OS doesn't even know they exist.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Ruby's Async Ecosystem
&lt;/h2&gt;

&lt;p&gt;The beauty of Ruby's &lt;a href="https://github.com/socketry/async" rel="noopener noreferrer"&gt;async&lt;/a&gt; lies in its transparency. Unlike Python's requirement to use &lt;code&gt;async&lt;/code&gt;/&lt;code&gt;await&lt;/code&gt; everywhere, Ruby code just works:&lt;/p&gt;

&lt;h3&gt;
  
  
  The Foundation: The &lt;a href="https://github.com/socketry/async" rel="noopener noreferrer"&gt;async&lt;/a&gt; Gem
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'async'&lt;/span&gt;
&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'net/http'&lt;/span&gt;

&lt;span class="c1"&gt;# This code handles 1000 concurrent requests&lt;/span&gt;
&lt;span class="c1"&gt;# Using ONE thread and minimal memory&lt;/span&gt;
&lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;URI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"https://api.openai.com/v1/chat/completions"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="c1"&gt;# Net::HTTP automatically yields during I/O&lt;/span&gt;
      &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Net&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;HTTP&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="no"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:wait&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c1"&gt;# All 1000 requests complete concurrently&lt;/span&gt;
  &lt;span class="n"&gt;process_responses&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No callbacks. No promises. No async/await keywords. Just Ruby code that scales.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why RubyLLM Just Works™
&lt;/h3&gt;

&lt;p&gt;Here's the thing that made me smile when I discovered it: &lt;a href="https://rubyllm.com" rel="noopener noreferrer"&gt;RubyLLM&lt;/a&gt; gets async performance &lt;em&gt;for free&lt;/em&gt;. No special RubyLLM-async version needed. No code changes to the library. No configuration. Nothing.&lt;/p&gt;

&lt;p&gt;Why? Because RubyLLM uses &lt;code&gt;Net::HTTP&lt;/code&gt; under the hood. When you wrap RubyLLM calls in an Async block, &lt;code&gt;Net::HTTP&lt;/code&gt; automatically yields during network I/O, allowing thousands of concurrent LLM conversations to happen on a single thread.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is all you need for concurrent LLM calls&lt;/span&gt;
&lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;Async&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="c1"&gt;# RubyLLM automatically becomes non-blocking&lt;/span&gt;
      &lt;span class="c1"&gt;# because Net::HTTP knows how to yield to fibers&lt;/span&gt;
      &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Explain quantum computing"&lt;/span&gt;
      &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:wait&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is Ruby at its best. Libraries that follow conventions get superpowers without even trying. It just works because it was built on solid foundations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Rest of the Ecosystem
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/socketry/falcon" rel="noopener noreferrer"&gt;Falcon&lt;/a&gt;&lt;/strong&gt;: Multi-process, multi-fiber web server built for streaming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/socketry/async-job" rel="noopener noreferrer"&gt;async-job&lt;/a&gt;&lt;/strong&gt;: Background job processing using fibers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/socketry/async-cable" rel="noopener noreferrer"&gt;async-cable&lt;/a&gt;&lt;/strong&gt;: ActionCable replacement with fiber-based concurrency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/socketry/async-http" rel="noopener noreferrer"&gt;async-http&lt;/a&gt;&lt;/strong&gt;: Full-featured HTTP client with streaming support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;... and many more available from &lt;a href="https://github.com/orgs/socketry/repositories" rel="noopener noreferrer"&gt;Socketry&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrate your Rails app to Async
&lt;/h2&gt;

&lt;p&gt;The migration requires almost no code changes:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Update Your Gemfile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Gemfile&lt;/span&gt;
&lt;span class="c1"&gt;# Comment out thread-based gems&lt;/span&gt;
&lt;span class="c1"&gt;# gem "puma"&lt;/span&gt;
&lt;span class="c1"&gt;# gem "sidekiq" / "good_job" / "solid_queue"&lt;/span&gt;
&lt;span class="c1"&gt;# gem "solid_cable"&lt;/span&gt;

&lt;span class="c1"&gt;# Add async gems&lt;/span&gt;
&lt;span class="n"&gt;gem&lt;/span&gt; &lt;span class="s2"&gt;"falcon"&lt;/span&gt;
&lt;span class="n"&gt;gem&lt;/span&gt; &lt;span class="s2"&gt;"async-job-adapter-active_job"&lt;/span&gt;
&lt;span class="n"&gt;gem&lt;/span&gt; &lt;span class="s2"&gt;"async-cable"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: One Configuration Line
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config/application.rb&lt;/span&gt;
&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"async/cable"&lt;/span&gt;

&lt;span class="c1"&gt;# config/environments/production.rb&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;active_job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queue_adapter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:async_job&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: There's No Step 3!
&lt;/h3&gt;

&lt;p&gt;Your existing jobs work unchanged. Your channels don't need updates.&lt;/p&gt;

&lt;p&gt;Just deploy and watch. You'll get more performance, more capacity, and better response times.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use What
&lt;/h2&gt;

&lt;p&gt;Let's be practical -- async isn't always the answer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use threads for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU-intensive work&lt;/li&gt;
&lt;li&gt;Tasks needing true isolation&lt;/li&gt;
&lt;li&gt;Legacy C extensions that aren't fiber-safe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use async for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I/O-bound operations&lt;/li&gt;
&lt;li&gt;API calls&lt;/li&gt;
&lt;li&gt;WebSockets, SSE, and other forms of streaming&lt;/li&gt;
&lt;li&gt;LLM applications&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A New Chapter for Ruby
&lt;/h2&gt;

&lt;p&gt;After years in Python's async world, I've seen what happens when a language forces a syntax change to access the benefits of async concurrency on its community. Libraries fragment. Codebases split. Developers struggle with new syntax and concepts.&lt;/p&gt;

&lt;p&gt;Ruby chose a different path -- and it's the right one.&lt;/p&gt;

&lt;p&gt;We're witnessing Ruby's next evolution. Not through breaking changes or ecosystem splits, but through thoughtful additions that make our existing code better. The async ecosystem that seemed unnecessary when compared to traditional threading suddenly becomes essential when you hit the right use case.&lt;/p&gt;

&lt;p&gt;LLM applications are that use case. The combination of long-lived connections, streaming responses, and massive concurrency creates the perfect storm where async's benefits become undeniable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ioquatix" rel="noopener noreferrer"&gt;Samuel Williams&lt;/a&gt; and the &lt;a href="https://github.com/socketry/async" rel="noopener noreferrer"&gt;async&lt;/a&gt; community have given us incredible tools. Unlike Python, you don't have to rewrite everything to use it.&lt;/p&gt;

&lt;p&gt;For those building the next generation of AI-powered applications, &lt;a href="https://github.com/socketry/async" rel="noopener noreferrer"&gt;async&lt;/a&gt; Ruby isn't just an option -- it's a competitive advantage. Lower costs, better performance, simpler operations, and you keep your existing codebase.&lt;/p&gt;

&lt;p&gt;The future is concurrent. The future is streaming. The future is &lt;a href="https://github.com/socketry/async" rel="noopener noreferrer"&gt;async&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And in Ruby, that future works with the code you already have.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://rubyllm.com" rel="noopener noreferrer"&gt;RubyLLM&lt;/a&gt; powers &lt;a href="https://chatwitwork.com" rel="noopener noreferrer"&gt;Chat with Work&lt;/a&gt; in production with thousands of concurrent AI conversations using &lt;a href="https://github.com/socketry/async" rel="noopener noreferrer"&gt;async&lt;/a&gt;. Want elegant AI integration in Ruby? Check out &lt;a href="https://rubyllm.com" rel="noopener noreferrer"&gt;RubyLLM&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Special thanks to &lt;a href="https://github.com/ioquatix" rel="noopener noreferrer"&gt;Samuel Williams&lt;/a&gt; for reviewing this post and providing the &lt;a href="https://github.com/socketry/performance/tree/adfd780c6b4842b9534edfa15e383e5dfd4b4137/fiber-vs-thread" rel="noopener noreferrer"&gt;fiber-vs-thread benchmarks&lt;/a&gt; that substantiate these performance claims.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Join the conversation:&lt;/strong&gt; I'll be speaking about async Ruby and AI at &lt;a href="https://2025.euruko.org/" rel="noopener noreferrer"&gt;EuRuKo 2025&lt;/a&gt;, &lt;a href="https://sfruby.com/" rel="noopener noreferrer"&gt;San Francisco Ruby Conference 2025&lt;/a&gt;, and &lt;a href="https://rubyconfth.com/" rel="noopener noreferrer"&gt;RubyConf Thailand 2026&lt;/a&gt;. Let's build the future together.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://github.com/ioquatix" rel="noopener noreferrer"&gt;Samuel Williams&lt;/a&gt;' &lt;a href="https://github.com/socketry/performance/tree/adfd780c6b4842b9534edfa15e383e5dfd4b4137/fiber-vs-thread" rel="noopener noreferrer"&gt;fiber-vs-thread performance comparison&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ruby</category>
      <category>rails</category>
      <category>ai</category>
      <category>performance</category>
    </item>
    <item>
      <title>RubyLLM 1.3.0: Just When You Thought the Developer Experience Couldn't Get Any Better 🎉</title>
      <dc:creator>Carmine Paolino</dc:creator>
      <pubDate>Tue, 03 Jun 2025 16:03:37 +0000</pubDate>
      <link>https://dev.to/crmne/rubyllm-130-just-when-you-thought-the-developer-experience-couldnt-get-any-better-1b98</link>
      <guid>https://dev.to/crmne/rubyllm-130-just-when-you-thought-the-developer-experience-couldnt-get-any-better-1b98</guid>
      <description>&lt;p&gt;RubyLLM 1.3.0 is here, and just when you thought the developer experience couldn't get any better, we've made attachments ridiculously simple, added isolated configuration contexts, and officially ended the era of manual model tracking.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Attachment Revolution: From Complex to Magical
&lt;/h2&gt;

&lt;p&gt;The biggest transformation in 1.3.0 is how stupidly simple attachments have become. Before, you had to categorize every file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The old way (still works, but why would you?)&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"What's in this image?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;image: &lt;/span&gt;&lt;span class="s2"&gt;"diagram.png"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Describe this meeting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;audio: &lt;/span&gt;&lt;span class="s2"&gt;"meeting.wav"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Summarize this document"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;pdf: &lt;/span&gt;&lt;span class="s2"&gt;"contract.pdf"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now? Just throw files at it and RubyLLM figures out the rest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The new way - pure magic ✨&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"What's in this file?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="s2"&gt;"diagram.png"&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Describe this meeting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="s2"&gt;"meeting.wav"&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Summarize this document"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="s2"&gt;"contract.pdf"&lt;/span&gt;

&lt;span class="c1"&gt;# Multiple files? Mix and match without thinking&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Analyze these files"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="s2"&gt;"quarterly_report.pdf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"sales_chart.jpg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"customer_interview.wav"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"meeting_notes.txt"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# URLs work too&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"What's in this image?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="s2"&gt;"https://example.com/chart.png"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what the Ruby way looks like: you shouldn't have to think about file types when the computer can figure it out for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration Contexts: Multi-Tenancy Made Trivial
&lt;/h2&gt;

&lt;p&gt;The global configuration pattern works beautifully for simple applications. But the moment you need different configurations for different customers, environments, or features, that simplicity becomes a liability.&lt;/p&gt;

&lt;p&gt;We could have forced everyone to pass configuration objects around. We could have built some complex dependency injection system. Instead, we built contexts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Each tenant gets their own isolated configuration&lt;/span&gt;
&lt;span class="n"&gt;tenant_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;context&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openai_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openai_key&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;anthropic_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;anthropic_key&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request_timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;180&lt;/span&gt; &lt;span class="c1"&gt;# This tenant needs more time&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# Use it without polluting the global namespace&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tenant_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Process this customer request..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Global configuration remains untouched&lt;/span&gt;
&lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"This still uses your default settings"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple, elegant, Ruby-like. Your multi-tenant application doesn't need architectural gymnastics. Each context is isolated, thread-safe, and garbage-collected when you're done with it.&lt;/p&gt;

&lt;p&gt;Perfect for multi-tenancy, A/B testing different providers, environment targeting, or any situation where you need temporary configuration changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local Models with Ollama
&lt;/h2&gt;

&lt;p&gt;Your development machine shouldn't need to phone home to OpenAI every time you want to test something:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ollama_api_base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'http://localhost:11434/v1'&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# Same API, different model&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s1"&gt;'mistral'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;provider: &lt;/span&gt;&lt;span class="s1"&gt;'ollama'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Explain Ruby's eigenclass"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perfect for privacy-sensitive applications, offline development, or just experimenting with local models. This matters for development, for testing, for compliance, for costs. Sometimes the best model is the one running on your own hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hundreds of Models via OpenRouter
&lt;/h2&gt;

&lt;p&gt;Access models from dozens of providers through a single API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openrouter_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'OPENROUTER_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# Access any model through OpenRouter&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s1"&gt;'anthropic/claude-3.5-sonnet'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;provider: &lt;/span&gt;&lt;span class="s1"&gt;'openrouter'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One API key, hundreds of models. Simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  The End of Manual Model Tracking
&lt;/h2&gt;

&lt;p&gt;Here's where things get revolutionary. We've partnered with &lt;a href="https://parsera.org" rel="noopener noreferrer"&gt;Parsera&lt;/a&gt; to create a single source of truth for LLM capabilities and pricing. When you run &lt;code&gt;RubyLLM.models.refresh!&lt;/code&gt;, you're now pulling from the &lt;a href="https://api.parsera.org/v1/llm-specs" rel="noopener noreferrer"&gt;Parsera API&lt;/a&gt; - a continuously updated registry that scrapes model information directly from provider documentation.&lt;/p&gt;

&lt;p&gt;No more manually updating capabilities files every time OpenAI changes their pricing. No more hunting through documentation to find context windows. Context windows, pricing, capabilities, supported modalities - it's all there, always current.&lt;/p&gt;

&lt;p&gt;However, providers don't always document everything perfectly. We discovered plenty of older models still available through their APIs but missing from official docs. That's why we kept our &lt;code&gt;capabilities.rb&lt;/code&gt; files - they fill in the gaps for models the Parsera API doesn't cover yet. Between the two sources, we support virtually every model worth using.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/standard-api-llm-capabilities-pricing-live/"&gt;Read more about this revolution in my previous blog post&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rails Integration That Finally Feels Like Rails
&lt;/h2&gt;

&lt;p&gt;The Rails integration now works seamlessly with ActiveStorage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Enable attachment support in your Message model&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Message&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;acts_as_message&lt;/span&gt;
  &lt;span class="n"&gt;has_many_attached&lt;/span&gt; &lt;span class="ss"&gt;:attachments&lt;/span&gt; &lt;span class="c1"&gt;# Add this line&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# Handle file uploads directly from forms&lt;/span&gt;
&lt;span class="n"&gt;chat_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Analyze this upload"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:uploaded_file&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Work with existing ActiveStorage attachments&lt;/span&gt;
&lt;span class="n"&gt;chat_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"What's in my document?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;profile_document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Process multiple uploads at once&lt;/span&gt;
&lt;span class="n"&gt;chat_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Review these files"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:files&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We've brought the Rails attachment handling to complete parity with the plain Ruby implementation. No more "it works in Ruby but not in Rails" friction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fine-Tuned Embeddings
&lt;/h2&gt;

&lt;p&gt;Custom embedding dimensions let you optimize for your specific use case:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Generate compact embeddings for memory-constrained environments&lt;/span&gt;
&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;"Ruby is a programmer's best friend"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s2"&gt;"text-embedding-3-small"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;dimensions: &lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;  &lt;span class="c1"&gt;# Instead of the default 1536&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Enterprise OpenAI Support
&lt;/h2&gt;

&lt;p&gt;Organization and project IDs are now supported for enterprise deployments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openai_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'OPENAI_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openai_organization_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'OPENAI_ORG_ID'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openai_project_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'OPENAI_PROJECT_ID'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Rock-Solid Foundation
&lt;/h2&gt;

&lt;p&gt;We now officially support and test against:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Ruby 3.1 to 3.4&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rails 7.1 to 8.0&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your favorite Ruby version is covered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ship It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;gem&lt;/span&gt; &lt;span class="s1"&gt;'ruby_llm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1.3.0'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As always, we've maintained full backward compatibility. Your existing code continues to work exactly as before, but now with magical attachment handling and powerful new capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Growing Community
&lt;/h2&gt;

&lt;p&gt;This release includes contributions from 13 new contributors, with merged PRs covering everything from foreign key improvements to HTTP proxy support. The Ruby community continues to amaze me with its thoughtfulness and attention to detail.&lt;/p&gt;

&lt;p&gt;Special thanks to &lt;a class="mentioned-user" href="https://dev.to/papgmez"&gt;@papgmez&lt;/a&gt;, @timaro, @rhys117, @bborn, @xymbol, &lt;a class="mentioned-user" href="https://dev.to/roelbondoc"&gt;@roelbondoc&lt;/a&gt;, @max-power, @itstheraj, &lt;a class="mentioned-user" href="https://dev.to/stadia"&gt;@stadia&lt;/a&gt;, &lt;a class="mentioned-user" href="https://dev.to/tpaulshippy"&gt;@tpaulshippy&lt;/a&gt;, @Sami-Tanquary, and @seemiller for making this release possible. [mentions are based on GitHub handles and may not be accurate on dev.to]&lt;/p&gt;

&lt;h2&gt;
  
  
  This Is Just The Beginning
&lt;/h2&gt;

&lt;p&gt;Want to shape RubyLLM's future? &lt;a href="https://github.com/crmne/ruby_llm" rel="noopener noreferrer"&gt;Join us on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The future of AI development in Ruby has never been brighter. ✨&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>rails</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Introducing RubyLLM 1.0: A Beautiful Way to Work with AI</title>
      <dc:creator>Carmine Paolino</dc:creator>
      <pubDate>Tue, 11 Mar 2025 10:19:48 +0000</pubDate>
      <link>https://dev.to/crmne/introducing-rubyllm-10-a-beautiful-way-to-work-with-ai-5p0</link>
      <guid>https://dev.to/crmne/introducing-rubyllm-10-a-beautiful-way-to-work-with-ai-5p0</guid>
      <description>&lt;p&gt;I released &lt;a href="https://rubyllm.com" rel="noopener noreferrer"&gt;RubyLLM&lt;/a&gt; 1.0 today. It's a library that makes working with AI in Ruby feel natural, elegant, and enjoyable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;AI should be accessible to Ruby developers without ceremony or complexity. When building &lt;a href="https://chatwithwork.com" rel="noopener noreferrer"&gt;Chat with Work&lt;/a&gt;, I wanted to simply write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"What's the best way to learn Ruby?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And have it work - regardless of which model I'm using, whether I'm streaming responses, or which provider I've chosen. The API should get out of the way and let me focus on building my product.&lt;/p&gt;

&lt;h2&gt;
  
  
  The RubyLLM Philosophy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Beautiful interfaces matter.&lt;/strong&gt; Ruby has always been about developer happiness. Your AI code should reflect that same elegance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Global methods for core operations - simple and expressive&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;
&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Ruby is elegant"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;paint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"a sunset over mountains"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Method chaining that reads like English&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'gpt-4o-mini'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_temperature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"What's your favorite gem?"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Convention over configuration.&lt;/strong&gt; You shouldn't need to think about providers or remember multiple APIs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Don't care which model? We'll use a sensible default&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;

&lt;span class="c1"&gt;# Want a specific model? Just say so&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s1"&gt;'claude-3-5-sonnet'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Switch to GPT mid-conversation? Just as easy&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'gpt-4o-mini'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Practical tools for real work.&lt;/strong&gt; Function calling should be Ruby-like, not JSON Schema gymnastics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Search&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Tool&lt;/span&gt;
  &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="s2"&gt;"Searches our knowledge base"&lt;/span&gt;
  &lt;span class="n"&gt;param&lt;/span&gt; &lt;span class="ss"&gt;:query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;desc: &lt;/span&gt;&lt;span class="s2"&gt;"Search query"&lt;/span&gt;
  &lt;span class="n"&gt;param&lt;/span&gt; &lt;span class="ss"&gt;:limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;type: :integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;desc: &lt;/span&gt;&lt;span class="s2"&gt;"Max results"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;required: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="ss"&gt;limit: &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="no"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# Clean, practical, Ruby-like&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Find our product documentation"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Streaming done right.&lt;/strong&gt; No need to parse different formats for different providers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Write a story about Ruby"&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="c1"&gt;# No provider-specific parsing - we handle that for you&lt;/span&gt;
  &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Token tracking by default.&lt;/strong&gt; Cost management should be built-in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Explain Ruby modules"&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"This cost &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;output_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; tokens"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Meaningful error handling.&lt;/strong&gt; Production apps need proper error types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
  &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Question"&lt;/span&gt;
&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;RateLimitError&lt;/span&gt;
  &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Rate limited - backing off"&lt;/span&gt;
&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;UnauthorizedError&lt;/span&gt;
  &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"API key issue - check configuration"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Rails as a first-class citizen.&lt;/strong&gt; Because most of us are building Rails apps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Chat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;acts_as_chat&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;model_id: &lt;/span&gt;&lt;span class="s1"&gt;'gemini-2.0-flash'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Hello"&lt;/span&gt;  &lt;span class="c1"&gt;# Everything persisted automatically&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Built for Real Applications
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://rubyllm.com" rel="noopener noreferrer"&gt;RubyLLM&lt;/a&gt; supports the features you actually need in production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Vision&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"What's in this image?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;image: &lt;/span&gt;&lt;span class="s2"&gt;"photo.jpg"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# PDFs&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Summarize this document"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;pdf: &lt;/span&gt;&lt;span class="s2"&gt;"contract.pdf"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Audio&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Transcribe this recording"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;audio: &lt;/span&gt;&lt;span class="s2"&gt;"meeting.wav"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Multiple files&lt;/span&gt;
&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt; &lt;span class="s2"&gt;"Compare these diagrams"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;image: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"chart1.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"chart2.png"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Minimal Dependencies
&lt;/h2&gt;

&lt;p&gt;Just Faraday, Zeitwerk, and a tiny event parser. No dependency hell.&lt;/p&gt;

&lt;h2&gt;
  
  
  Used in Production Today
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://rubyllm.com" rel="noopener noreferrer"&gt;RubyLLM&lt;/a&gt; powers &lt;a href="https://chatwithwork.com" rel="noopener noreferrer"&gt;Chat with Work&lt;/a&gt; in production. It's battle-tested with real-world AI integrations and built for serious applications.&lt;/p&gt;

&lt;p&gt;Give it a try today: &lt;code&gt;gem install ruby_llm&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;More details at &lt;a href="https://rubyllm.com" rel="noopener noreferrer"&gt;rubyllm.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>rails</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
