<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Stokry</title>
    <description>The latest articles on DEV Community by Stokry (@stokry).</description>
    <link>https://dev.to/stokry</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F160180%2Fbe40180c-377f-457b-a5ae-0ec95a66fe2b.png</url>
      <title>DEV Community: Stokry</title>
      <link>https://dev.to/stokry</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stokry"/>
    <language>en</language>
    <item>
      <title>In a world where AI can fake anything....</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Wed, 04 Feb 2026 08:01:39 +0000</pubDate>
      <link>https://dev.to/stokry/in-a-world-where-ai-can-fake-anything-24d</link>
      <guid>https://dev.to/stokry/in-a-world-where-ai-can-fake-anything-24d</guid>
      <description>&lt;p&gt;In a world where AI can fake anything...&lt;/p&gt;

&lt;p&gt;How do you prove what's real?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;We're building the answer.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coming soon.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Semantic Caching for RubyLLM: Cut Your AI Costs by 70%</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Tue, 03 Feb 2026 07:39:36 +0000</pubDate>
      <link>https://dev.to/stokry/semantic-caching-for-rubyllm-cut-your-ai-costs-by-70-2ibn</link>
      <guid>https://dev.to/stokry/semantic-caching-for-rubyllm-cut-your-ai-costs-by-70-2ibn</guid>
      <description>&lt;p&gt;If you're using &lt;a href="https://github.com/crmne/ruby_llm" rel="noopener noreferrer"&gt;RubyLLM&lt;/a&gt; to build AI-powered applications in Ruby, you're already enjoying a clean, unified API across multiple providers. But there's one problem RubyLLM doesn't solve: &lt;strong&gt;every API call costs money&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Enter &lt;strong&gt;SemanticCache&lt;/strong&gt; — a semantic caching layer that integrates seamlessly with RubyLLM to dramatically reduce your AI costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Redundant API Calls
&lt;/h2&gt;

&lt;p&gt;Consider a typical chatbot or AI assistant. Users often ask variations of the same question:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What's the capital of France?"&lt;/li&gt;
&lt;li&gt;"What is France's capital city?"&lt;/li&gt;
&lt;li&gt;"Tell me the capital of France"&lt;/li&gt;
&lt;li&gt;"Capital of France?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without caching, each of these triggers a separate API call. With traditional caching, you'd need an exact string match — which almost never happens with natural language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic caching solves this.&lt;/strong&gt; It understands that these questions are semantically identical and returns the cached response instantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why SemanticCache + RubyLLM?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Provider Flexibility
&lt;/h3&gt;

&lt;p&gt;RubyLLM already gives you the freedom to switch between OpenAI, Gemini, Mistral, Ollama, Bedrock, and more. SemanticCache extends this flexibility to embeddings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Use Gemini for embeddings (cheaper) while using GPT-4 for completions&lt;/span&gt;
&lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gemini_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"GEMINI_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openai_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"OPENAI_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="no"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedding_adapter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:ruby_llm&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"text-embedding-004"&lt;/span&gt;  &lt;span class="c1"&gt;# Gemini's embedding model&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# Now cache expensive GPT-4 calls using cheap Gemini embeddings&lt;/span&gt;
&lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Explain quantum computing"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;messages: &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="ss"&gt;role: &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="s2"&gt;"Explain quantum computing"&lt;/span&gt; &lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Local Embeddings with Ollama
&lt;/h3&gt;

&lt;p&gt;Running Ollama locally? You can generate embeddings without any API costs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ollama_api_base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"http://localhost:11434/v1"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="no"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedding_adapter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:ruby_llm&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"nomic-embed-text"&lt;/span&gt;  &lt;span class="c1"&gt;# Free, local embeddings&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Development environments (no API costs while testing)&lt;/li&gt;
&lt;li&gt;Privacy-sensitive applications (embeddings never leave your server)&lt;/li&gt;
&lt;li&gt;High-volume applications where embedding costs add up&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Enterprise-Ready with AWS Bedrock
&lt;/h3&gt;

&lt;p&gt;For enterprise deployments, you might need to keep everything within AWS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bedrock_region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="no"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedding_adapter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:ruby_llm&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"amazon.titan-embed-text-v1"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No data leaves your AWS environment, and you get the compliance benefits of Bedrock.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Impact
&lt;/h2&gt;

&lt;p&gt;Let's do some quick math. Assume:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10,000 queries per day&lt;/li&gt;
&lt;li&gt;Average query: 50 tokens&lt;/li&gt;
&lt;li&gt;GPT-4o pricing: ~$5/1M input tokens, ~$15/1M output tokens&lt;/li&gt;
&lt;li&gt;Average response: 200 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Without caching:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily cost: ~$35/day&lt;/li&gt;
&lt;li&gt;Monthly cost: ~$1,050/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With SemanticCache (assuming 70% hit rate):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily cost: ~$10.50/day + minimal embedding costs&lt;/li&gt;
&lt;li&gt;Monthly cost: ~$315/month&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: $735/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that's a conservative estimate. Applications with repetitive queries (FAQ bots, customer support, documentation assistants) often see 80-90% hit rates.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Query comes in&lt;/strong&gt; → SemanticCache generates an embedding vector using your configured RubyLLM provider&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Similarity search&lt;/strong&gt; → The cache searches for stored entries with high cosine similarity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache hit?&lt;/strong&gt; → If similarity exceeds your threshold (default 0.85), return the cached response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache miss?&lt;/strong&gt; → Execute the LLM call, store the result with its embedding for future queries
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;similarity_threshold: &lt;/span&gt;&lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# First call - cache miss, calls the API&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"What is Ruby?"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;messages: &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="ss"&gt;role: &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="s2"&gt;"What is Ruby?"&lt;/span&gt; &lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# Second call - cache hit! Returns instantly&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Tell me about the Ruby programming language"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;messages: &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="ss"&gt;role: &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="s2"&gt;"Tell me about Ruby"&lt;/span&gt; &lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="c1"&gt;# =&amp;gt; Returns cached response, no API call made&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Add both gems to your Gemfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;gem&lt;/span&gt; &lt;span class="s2"&gt;"ruby_llm"&lt;/span&gt;
&lt;span class="n"&gt;gem&lt;/span&gt; &lt;span class="s2"&gt;"semantic-cache"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure your providers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config/initializers/ruby_llm.rb&lt;/span&gt;
&lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openai_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"OPENAI_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# config/initializers/semantic_cache.rb&lt;/span&gt;
&lt;span class="no"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedding_adapter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:ruby_llm&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"text-embedding-3-small"&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:redis&lt;/span&gt;  &lt;span class="c1"&gt;# For production&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;url: &lt;/span&gt;&lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"REDIS_URL"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start caching:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;messages: &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="ss"&gt;role: &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# Check your savings&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savings_report&lt;/span&gt;
&lt;span class="c1"&gt;# =&amp;gt; Total saved: $23.45 (156 cached calls)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tag-Based Invalidation
&lt;/h3&gt;

&lt;p&gt;Group related cache entries for bulk invalidation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Cache with tags&lt;/span&gt;
&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Ruby version?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;tags: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:ruby&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:versions&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;messages: &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="ss"&gt;role: &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="s2"&gt;"What's the latest Ruby version?"&lt;/span&gt; &lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# When Ruby 3.4 releases, invalidate all version-related caches&lt;/span&gt;
&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invalidate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;tags: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:versions&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  TTL for Time-Sensitive Data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# News summaries should expire quickly&lt;/span&gt;
&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Latest tech news?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;ttl: &lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;  &lt;span class="c1"&gt;# 1 hour&lt;/span&gt;
  &lt;span class="no"&gt;RubyLLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;messages: &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="ss"&gt;role: &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="s2"&gt;"Summarize today's tech news"&lt;/span&gt; &lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Per-User Namespacing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Each user gets their own cache namespace&lt;/span&gt;
&lt;span class="no"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;namespace: &lt;/span&gt;&lt;span class="s2"&gt;"user_&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="c1"&gt;# Cached responses are isolated per user&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;SemanticCache and RubyLLM are a natural fit. RubyLLM gives you provider flexibility for LLM calls; SemanticCache extends that flexibility to embeddings while dramatically cutting your costs.&lt;/p&gt;

&lt;p&gt;The combination is particularly powerful because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No vendor lock-in&lt;/strong&gt; — Switch embedding providers without changing your caching logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost optimization&lt;/strong&gt; — Use cheaper providers for embeddings, premium providers for completions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local development&lt;/strong&gt; — Use Ollama for free local embeddings during development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise compliance&lt;/strong&gt; — Keep everything within AWS using Bedrock&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Stop paying for the same answers twice. Add SemanticCache to your RubyLLM project today.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/stokry/semantic-cache" rel="noopener noreferrer"&gt;SemanticCache on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/crmne/ruby_llm" rel="noopener noreferrer"&gt;RubyLLM on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rubyllm.com" rel="noopener noreferrer"&gt;RubyLLM Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ruby</category>
      <category>rails</category>
      <category>ai</category>
    </item>
    <item>
      <title>Why Ruby is Excellent for AI Development</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Thu, 29 Jan 2026 08:55:53 +0000</pubDate>
      <link>https://dev.to/stokry/why-ruby-is-excellent-for-ai-development-3hj5</link>
      <guid>https://dev.to/stokry/why-ruby-is-excellent-for-ai-development-3hj5</guid>
      <description>&lt;p&gt;Ruby gets overlooked in AI discussions, but developers who use it know it offers a unique combination of productivity and elegance that makes it an excellent choice for many AI projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Token Efficiency Advantage
&lt;/h2&gt;

&lt;p&gt;Here's something most developers miss: &lt;strong&gt;Ruby is ~40% more token-efficient than Python for equivalent code&lt;/strong&gt;. In the LLM era where context windows and API costs matter, this is a massive advantage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world measurements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Ruby class: ~83 tokens vs Python: ~140 tokens&lt;/li&gt;
&lt;li&gt;  Rails controller: ~134 tokens vs FastAPI endpoint: ~216 tokens&lt;/li&gt;
&lt;li&gt;  Rails model: ~90 tokens vs Django model: ~199 tokens&lt;/li&gt;
&lt;li&gt;  JavaScript consistently uses the most tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why Ruby wins on token efficiency:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Less boilerplate&lt;/strong&gt; - no &lt;code&gt;self.&lt;/code&gt;, no &lt;code&gt;__init__&lt;/code&gt;, no &lt;code&gt;def __init__(self)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Ruby blocks&lt;/strong&gt; are more concise than Python comprehensions&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Rails DSL&lt;/strong&gt; is brutally efficient (validations, scopes, associations)&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Symbols&lt;/strong&gt; (&lt;code&gt;:email&lt;/code&gt;) vs strings (&lt;code&gt;'email'&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Idiomatic Ruby&lt;/strong&gt; is naturally more compact&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;What this means for AI/LLM development:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Fit ~40% more code in the same context window&lt;/li&gt;
&lt;li&gt;  Cheaper API calls (fewer input tokens)&lt;/li&gt;
&lt;li&gt;  Faster generation (fewer output tokens)&lt;/li&gt;
&lt;li&gt;  More context available for reasoning and examples&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Expressiveness That Accelerates Experimentation
&lt;/h2&gt;

&lt;p&gt;Ruby's "optimized for developer happiness" philosophy pays dividends in AI development where rapid iteration makes the difference. Clean syntax lets you focus on algorithms instead of boilerplate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production-Ready AI Stack
&lt;/h2&gt;

&lt;p&gt;The ecosystem is mature. &lt;strong&gt;Rumale&lt;/strong&gt; brings scikit-learn quality - SVMs, Random Forests, neural networks, all optimized with C extensions. &lt;strong&gt;TensorStream&lt;/strong&gt; runs TensorFlow models directly from Ruby. &lt;strong&gt;Numo/NArray&lt;/strong&gt; offers NumPy-like performance for numerical operations. &lt;strong&gt;LangChain.rb&lt;/strong&gt; provides idiomatic Ruby access to LLMs. &lt;strong&gt;ruby-openai&lt;/strong&gt; gem makes API integration trivial.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rails Advantage for AI Products
&lt;/h2&gt;

&lt;p&gt;The biggest secret: Ruby on Rails is killer for AI applications. While Python devs cobble together Flask/FastAPI with a frontend framework, Rails developers ship complete AI apps with authentication, background jobs, and real-time updates in 30% less code. Hotwire/Turbo enables interactive AI interfaces without JavaScript complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Production Examples
&lt;/h2&gt;

&lt;p&gt;GitHub uses Ruby for ML-based code suggestions. Shopify runs AI recommendations in Rails apps processing billions of transactions. GitLab integrates AI features directly into their Rails monolith. Discourse uses Ruby for spam detection and content moderation AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Code Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'openai'&lt;/span&gt;
&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'vectra'&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AIAssistant&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;
    &lt;span class="vi"&gt;@client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;OpenAI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;
    &lt;span class="vi"&gt;@db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;path: &lt;/span&gt;&lt;span class="s2"&gt;"embeddings.json"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;k: &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="vi"&gt;@client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="ss"&gt;parameters: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;messages: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;role: &lt;/span&gt;&lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="n"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;role: &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;content: &lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="kp"&gt;private&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="ss"&gt;parameters: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="s2"&gt;"text-embedding-3-small"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;input: &lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"embedding"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:metadata&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:text&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean. Readable. Production-ready. &lt;strong&gt;And 40% fewer tokens than the Python equivalent.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Reality Check
&lt;/h2&gt;

&lt;p&gt;Yes, Python has faster numerical libraries. But most AI apps spend 90% of time waiting on API calls or GPU operations. Ruby's "slow" execution doesn't matter when OpenAI API latency is 500ms. For heavy compute, call out to C extensions or microservices.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Underrated Benefits
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Metaprogramming&lt;/strong&gt;: DSLs for defining neural network architectures feel natural in Ruby. &lt;strong&gt;Concurrency&lt;/strong&gt;: Ractors and async gems handle parallel API calls elegantly. &lt;strong&gt;Testing culture&lt;/strong&gt;: RSpec makes testing AI behavior straightforward. &lt;strong&gt;Gem ecosystem&lt;/strong&gt;: 175,000+ gems mean you're never starting from scratch. &lt;strong&gt;Token efficiency&lt;/strong&gt;: Ship more functionality in fewer tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Ruby Makes Sense
&lt;/h2&gt;

&lt;p&gt;Ruby excels for: AI-powered web apps, RAG applications, LLM orchestration, chatbots, content generation tools, AI automation scripts, and rapid prototyping.&lt;/p&gt;

&lt;p&gt;Ruby struggles with: Training large models from scratch, real-time computer vision, and hardcore numerical optimization (use Python/Julia for these).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Ruby won't replace Python for research or deep learning. But for building and shipping AI products - especially web applications with AI features - Ruby offers a compelling developer experience that deserves serious consideration.&lt;/p&gt;

&lt;p&gt;The token efficiency advantage alone makes Ruby worth evaluating for any project involving LLMs. When you're paying per token and fighting context window limits, &lt;strong&gt;40% efficiency gains aren't just nice to have - they're a competitive advantage.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The AI gold rush needs builders who ship fast. Ruby gives you that superpower.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ruby</category>
      <category>rails</category>
      <category>python</category>
    </item>
    <item>
      <title>Rack-style middleware for vector databases in Ruby: vectra-client 1.1.0</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Thu, 15 Jan 2026 09:04:03 +0000</pubDate>
      <link>https://dev.to/stokry/rack-style-middleware-for-vector-databases-in-ruby-vectra-client-110-2jh3</link>
      <guid>https://dev.to/stokry/rack-style-middleware-for-vector-databases-in-ruby-vectra-client-110-2jh3</guid>
      <description>&lt;p&gt;If you’re building serious AI features in Ruby (semantic search, RAG, recommendations), you quickly realize you don’t just need a &lt;em&gt;client&lt;/em&gt; for your vector database — you need a &lt;strong&gt;production toolkit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;vectra-client 1.1.0&lt;/strong&gt;, the gem gets exactly that: &lt;strong&gt;Rack-style middleware for vector databases&lt;/strong&gt; plus production patterns Ruby/Rails developers already know.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Vectra is aiming to be the most complete production toolkit in the Ruby vector DB ecosystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production patterns&lt;/strong&gt;: 7+ built-in patterns (batch upsert, health check, ActiveRecord integration, reindex, caching…)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Middleware&lt;/strong&gt;: 5 built-in middlewares (the only gem with this approach)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-provider&lt;/strong&gt;: 5 providers behind a unified API (&lt;code&gt;:pinecone&lt;/code&gt;, &lt;code&gt;:qdrant&lt;/code&gt;, &lt;code&gt;:weaviate&lt;/code&gt;, &lt;code&gt;:pgvector&lt;/code&gt;, &lt;code&gt;:memory&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core idea: everything you already know from Rack/Faraday land, now applied to &lt;strong&gt;AI / vector workloads&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Positioning:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;“The only Ruby gem with Rack-style middleware for vector databases — production patterns you already know, applied to AI workloads.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What’s new in vectra-client 1.1.0?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Rack-style middleware stack for all operations
&lt;/h3&gt;

&lt;p&gt;All public operations on &lt;code&gt;Vectra::Client&lt;/code&gt; now go through a middleware stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;upsert&lt;/code&gt;, &lt;code&gt;query&lt;/code&gt;, &lt;code&gt;fetch&lt;/code&gt;, &lt;code&gt;update&lt;/code&gt;, &lt;code&gt;delete&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;list_indexes&lt;/code&gt;, &lt;code&gt;describe_index&lt;/code&gt;, &lt;code&gt;stats&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;create_index&lt;/code&gt;, &lt;code&gt;delete_index&lt;/code&gt;, &lt;code&gt;list_namespaces&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which means you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add logging / retry / instrumentation globally&lt;/li&gt;
&lt;li&gt;add per-client middleware (e.g. PII redaction only for a specific index)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"vectra-client"&lt;/span&gt;

&lt;span class="c1"&gt;# Global middleware&lt;/span&gt;
&lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Middleware&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Logging&lt;/span&gt;
&lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Middleware&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Retry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;max_attempts: &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Middleware&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;CostTracker&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="ss"&gt;provider: :qdrant&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;index: &lt;/span&gt;&lt;span class="s2"&gt;"products"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;namespace: &lt;/span&gt;&lt;span class="s2"&gt;"tenant-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;middleware: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Middleware&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;PIIRedaction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Middleware&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Instrumentation&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every call (&lt;code&gt;client.upsert&lt;/code&gt;, &lt;code&gt;client.query&lt;/code&gt;, …) now flows through the same, consistent pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Built-in middleware (production-ready)
&lt;/h3&gt;

&lt;p&gt;Version &lt;strong&gt;1.1.0&lt;/strong&gt; ships with 5 ready-to-use middlewares:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Logging&lt;/code&gt; – structured logs for all operations + duration in ms&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Retry&lt;/code&gt; – retry logic for rate limit / network / timeout errors&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Instrumentation&lt;/code&gt; – hooks for metrics and APM (Datadog, New Relic, Prometheus…)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PIIRedaction&lt;/code&gt; – PII redaction in metadata before sending to the vector DB&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CostTracker&lt;/code&gt; – simple cost estimation per operation (per provider, per vector)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point: you can &lt;strong&gt;add observability and safety without forking the gem&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. A client that understands “context”
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Vectra::Client&lt;/code&gt; now has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;default index &amp;amp; namespace&lt;/strong&gt; (via &lt;code&gt;index:&lt;/code&gt; and &lt;code&gt;namespace:&lt;/code&gt; on init)&lt;/li&gt;
&lt;li&gt;helpers:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"shadow-products"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;vectors: &lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_namespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tenant-42"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;vector: &lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;top_k: &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_index_and_namespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"products"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"tenant-1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;ids: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"products_123"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes multi-tenant, shadow indexes, migrations and A/B testing much easier without duplicating a ton of boilerplate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using this in a Rails app
&lt;/h2&gt;

&lt;p&gt;Fastest path:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add the gem:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;gem&lt;/span&gt; &lt;span class="s2"&gt;"vectra-client"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Run the install generator:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;rails generate vectra:install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Run the index generator for your model (e.g. &lt;code&gt;Product&lt;/code&gt;):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;rails generate vectra:index Product embedding dimension:1536 provider:qdrant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Include the generated concern and use &lt;code&gt;has_vector&lt;/code&gt; + the &lt;code&gt;reindex_vectors&lt;/code&gt; helper:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Product&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="kp"&gt;include&lt;/span&gt; &lt;span class="no"&gt;ProductVector&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;automatically index embeddings after &lt;code&gt;save&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;run semantic search via &lt;code&gt;Product.vector_search(...)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;reindex the entire table via &lt;code&gt;Product.reindex_vectors&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When does vectra-client make sense?
&lt;/h2&gt;

&lt;p&gt;If you’re building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;e‑commerce search (semantic + filters)&lt;/li&gt;
&lt;li&gt;blog / docs search (hybrid: keyword + vectors)&lt;/li&gt;
&lt;li&gt;RAG chatbot pulling context from PostgreSQL + pgvector&lt;/li&gt;
&lt;li&gt;multi-tenant SaaS with namespace isolation&lt;/li&gt;
&lt;li&gt;recommendation systems (similar products / articles)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…then &lt;strong&gt;production features&lt;/strong&gt; (middleware, health check, batch, reindex, provider-agnostic API) matter a lot more than a barebones “client”.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;The Ruby ecosystem finally gets a &lt;strong&gt;serious, production-grade toolkit&lt;/strong&gt; for working with vector databases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rack-style middleware you already understand
&lt;/li&gt;
&lt;li&gt;production patterns mapped onto AI workloads
&lt;/li&gt;
&lt;li&gt;a multi-provider API that doesn’t lock you into a single vendor
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to see it in action, check out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://vectra-docs.netlify.app/" rel="noopener noreferrer"&gt;Docs homepage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://vectra-docs.netlify.app/guides/middleware/" rel="noopener noreferrer"&gt;Middleware guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://vectra-docs.netlify.app/examples/" rel="noopener noreferrer"&gt;Examples overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’d like an example for a specific stack (Sidekiq, Hanami, dry-rb, …), open an issue — this middleware layer is designed exactly for those kinds of extensions.&lt;/p&gt;

&lt;p&gt;GitHub repo: &lt;a href="https://github.com/stokry/vectra" rel="noopener noreferrer"&gt;stokry/vectra&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>rails</category>
      <category>ai</category>
      <category>database</category>
    </item>
    <item>
      <title>Building Enterprise Vector Search in Rails (Part 2/3): Production Resilience &amp; Monitoring</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Tue, 13 Jan 2026 14:03:54 +0000</pubDate>
      <link>https://dev.to/stokry/building-enterprise-vector-search-in-rails-part-23-production-resilience-monitoring-3fpj</link>
      <guid>https://dev.to/stokry/building-enterprise-vector-search-in-rails-part-23-production-resilience-monitoring-3fpj</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;*&lt;strong&gt;&lt;em&gt;This is Part 2 of a 3-part series&lt;/em&gt;&lt;/strong&gt;* on building production-ready vector search for enterprise SaaS.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://dev.to/stokry/building-enterprise-vector-search-in-rails-part-13-architecture-multi-tenant-implementation-4opf"&gt;Part 1: Architecture &amp;amp; Implementation&lt;/a&gt; - Multi-tenant document processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Part 2: Production Resilience &amp;amp; Monitoring&lt;/em&gt;&lt;/strong&gt;* 👈 You are here&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Part 3: Cost Optimization &amp;amp; Lessons Learned (Coming Friday)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;*&lt;strong&gt;&lt;em&gt;TL;DR&lt;/em&gt;&lt;/strong&gt;*: Production-ready means more than just working code. This part covers circuit breakers, rate limiting, health checks, and monitoring that keep the system running through Black Friday traffic spikes and Qdrant outages.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Black Friday Incident 🚨
&lt;/h2&gt;

&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Date:&lt;/em&gt;&lt;/strong&gt;* November 24, 2023, 2:47 PM EST&lt;/p&gt;

&lt;p&gt;Our monitoring dashboard lit up red:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
⚠️  Qdrant CPU: 98%

⚠️  Memory: 95%

⚠️  Query latency: 12,000ms (P95)

🔥 Error rate: 45%

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;*&lt;strong&gt;&lt;em&gt;What happened:&lt;/em&gt;&lt;/strong&gt;* A major client launched their compliance platform to 5,000 users simultaneously. Search traffic spiked from 800 req/min → 4,200 req/min. Our Qdrant cluster couldn't handle it.&lt;/p&gt;

&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Without circuit breakers:&lt;/em&gt;&lt;/strong&gt;* We would have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Hammered the failing Qdrant cluster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Exhausted connection pools&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Brought down the entire Rails app&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;100% outage for all 150 clients&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;*&lt;strong&gt;&lt;em&gt;With circuit breakers:&lt;/em&gt;&lt;/strong&gt;*&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Detected 5 failures in 30 seconds&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Opened the circuit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Served cached results (42% hit rate)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fell back to PostgreSQL full-text search&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;99.2% of searches still worked&lt;/em&gt;&lt;/strong&gt;*&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MTTR: 12 minutes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let me show you how we built this resilience.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: Circuit Breaker Pattern
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# lib/circuit_breaker_registry.rb&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreakerRegistry&lt;/span&gt;

&lt;span class="vi"&gt;@breakers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Concurrent&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="vi"&gt;@breakers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compute_if_absent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tenant_&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;

&lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;name: &lt;/span&gt;&lt;span class="s2"&gt;"tenant_&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;failure_threshold: &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Open after 5 failures&lt;/span&gt;

&lt;span class="ss"&gt;recovery_timeout: &lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Try again after 60s&lt;/span&gt;

&lt;span class="ss"&gt;success_threshold: &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="c1"&gt;# Close after 3 successes&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stats&lt;/span&gt;

&lt;span class="vi"&gt;@breakers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each_pair&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaker&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;

&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_h&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_all!&lt;/span&gt;

&lt;span class="vi"&gt;@breakers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:reset!&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Circuit Breakers in Vector Search
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# app/services/vector_indexing_service.rb&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VectorIndexingService&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="ss"&gt;filters: &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="ss"&gt;limit: &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;EmbeddingService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;namespace_for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="c1"&gt;# Get circuit breaker for this tenant&lt;/span&gt;

&lt;span class="n"&gt;breaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;CircuitBreakerRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="c1"&gt;# Execute with fallback&lt;/span&gt;

&lt;span class="n"&gt;breaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;fallback: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;fallback_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;

&lt;span class="vi"&gt;@client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;index: &lt;/span&gt;&lt;span class="s1"&gt;'compliance_documents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;vector: &lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;top_k: &lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;namespace: &lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;filter: &lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;include_metadata: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="kp"&gt;private&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fallback_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Fallback to PostgreSQL full-text search&lt;/span&gt;

&lt;span class="no"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"to_tsvector('english', title || ' ' || content) @@ plainto_tsquery(?)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;convert_to_vector_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;convert_to_vector_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Convert Document to QueryResult format&lt;/span&gt;

&lt;span class="no"&gt;OpenStruct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;id: &lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;score: &lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Fallback score&lt;/span&gt;

&lt;span class="ss"&gt;metadata: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="s1"&gt;'document_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="s1"&gt;'tenant_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="s1"&gt;'title'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="s1"&gt;'chunk_text'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Circuit Breaker States
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
┌──────────┐

│  CLOSED  │  ← Normal operation (requests pass through)

└──────────┘

│

│ (5 failures detected)

↓

┌──────────┐

│ OPEN │  ← Requests fail immediately, use fallback

└──────────┘

│

│ (60 seconds elapsed)

↓

┌──────────┐

│ HALF_OPEN│  ← Limited requests allowed to test recovery

└──────────┘

│

│ (3 successes) → Back to CLOSED

│ (1 failure) → Back to OPEN

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Monitoring Circuit Breaker States
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# app/controllers/admin/circuit_breakers_controller.rb&lt;/span&gt;

&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;Admin&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreakersController&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;AdminController&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index&lt;/span&gt;

&lt;span class="vi"&gt;@breakers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;CircuitBreakerRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stats&lt;/span&gt;



&lt;span class="n"&gt;render&lt;/span&gt; &lt;span class="ss"&gt;json: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;breakers: &lt;/span&gt;&lt;span class="vi"&gt;@breakers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;

&lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;name: &lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;state: &lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:state&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

&lt;span class="ss"&gt;failures: &lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:failures&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

&lt;span class="ss"&gt;successes: &lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:successes&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

&lt;span class="ss"&gt;last_failure_at: &lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:last_failure_at&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

&lt;span class="ss"&gt;opened_at: &lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:opened_at&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;summary: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;total: &lt;/span&gt;&lt;span class="vi"&gt;@breakers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;open: &lt;/span&gt;&lt;span class="vi"&gt;@breakers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:state&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="ss"&gt;:open&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;

&lt;span class="ss"&gt;half_open: &lt;/span&gt;&lt;span class="vi"&gt;@breakers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:state&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="ss"&gt;:half_open&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reset&lt;/span&gt;

&lt;span class="no"&gt;CircuitBreakerRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_all!&lt;/span&gt;

&lt;span class="n"&gt;render&lt;/span&gt; &lt;span class="ss"&gt;json: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;message: &lt;/span&gt;&lt;span class="s1"&gt;'All circuit breakers reset'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Rate Limiting (Per-Tenant)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Per-Tenant Rate Limiting?
&lt;/h3&gt;

&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Problem:&lt;/em&gt;&lt;/strong&gt;* A single tenant running a poorly-configured script could DOS the entire platform.&lt;/p&gt;

&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Solution:&lt;/em&gt;&lt;/strong&gt;* Isolate rate limits per tenant.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# lib/rate_limiter_registry.rb&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RateLimiterRegistry&lt;/span&gt;

&lt;span class="vi"&gt;@limiters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Concurrent&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="vi"&gt;@limiters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compute_if_absent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tenant_&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;

&lt;span class="c1"&gt;# 500 requests per minute per tenant&lt;/span&gt;

&lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;RateLimiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;requests_per_second: &lt;/span&gt;&lt;span class="mf"&gt;8.33&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# 500/60&lt;/span&gt;

&lt;span class="ss"&gt;burst_size: &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stats&lt;/span&gt;

&lt;span class="vi"&gt;@limiters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each_pair&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;

&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_h&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Rate Limiting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# app/services/vector_indexing_service.rb&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="ss"&gt;filters: &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="ss"&gt;limit: &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get rate limiter for this tenant&lt;/span&gt;

&lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RateLimiterRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="c1"&gt;# Acquire rate limit token&lt;/span&gt;

&lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;acquire&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;

&lt;span class="c1"&gt;# Get circuit breaker&lt;/span&gt;

&lt;span class="n"&gt;breaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;CircuitBreakerRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="c1"&gt;# Execute search with resilience&lt;/span&gt;

&lt;span class="n"&gt;breaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;fallback: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;fallback_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;

&lt;span class="n"&gt;perform_vector_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;RateLimitError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;

&lt;span class="c1"&gt;# Return 429 Too Many Requests&lt;/span&gt;

&lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="no"&gt;SearchRateLimitExceeded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;retry_after: &lt;/span&gt;&lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retry_after&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rate Limit Headers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# app/controllers/api/v1/search_controller.rb&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;

&lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RateLimiterRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="c1"&gt;# Add rate limit headers&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'X-RateLimit-Limit'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'500'&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'X-RateLimit-Remaining'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remaining&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_s&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'X-RateLimit-Reset'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_at&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_s&lt;/span&gt;



&lt;span class="c1"&gt;# ... search logic ...&lt;/span&gt;

&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;SearchRateLimitExceeded&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Retry-After'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retry_after&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_s&lt;/span&gt;

&lt;span class="n"&gt;render&lt;/span&gt; &lt;span class="ss"&gt;json: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;error: &lt;/span&gt;&lt;span class="s1"&gt;'Rate limit exceeded'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;retry_after: &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retry_after&lt;/span&gt;

&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="ss"&gt;status: :too_many_requests&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Health Checks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Comprehensive Health Check Endpoint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# app/controllers/health_controller.rb&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HealthController&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationController&lt;/span&gt;

&lt;span class="n"&gt;skip_before_action&lt;/span&gt; &lt;span class="ss"&gt;:authenticate_user!&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;show&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'healthy'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;timestamp: &lt;/span&gt;&lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iso8601&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;version: &lt;/span&gt;&lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;VERSION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;services: &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;



&lt;span class="c1"&gt;# Check Vector DB&lt;/span&gt;

&lt;span class="k"&gt;begin&lt;/span&gt;

&lt;span class="n"&gt;vectra_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;qdrant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;host: &lt;/span&gt;&lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'QDRANT_HOST'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;vectra_health&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectra_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;health_check&lt;/span&gt;



&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:vector_db&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="n"&gt;vectra_health&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;healthy?&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'up'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'down'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;latency_ms: &lt;/span&gt;&lt;span class="n"&gt;vectra_health&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;latency_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;provider: &lt;/span&gt;&lt;span class="n"&gt;vectra_health&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;indexes: &lt;/span&gt;&lt;span class="n"&gt;vectra_health&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;indexes_available&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:vector_db&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'down'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;error: &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'degraded'&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="c1"&gt;# Check Database&lt;/span&gt;

&lt;span class="k"&gt;begin&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;

&lt;span class="no"&gt;ActiveRecord&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'SELECT 1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;db_latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:database&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'up'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;latency_ms: &lt;/span&gt;&lt;span class="n"&gt;db_latency&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:database&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'down'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;error: &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'unhealthy'&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="c1"&gt;# Check Redis&lt;/span&gt;

&lt;span class="k"&gt;begin&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;

&lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ping&lt;/span&gt;

&lt;span class="n"&gt;redis_latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:redis&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'up'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;latency_ms: &lt;/span&gt;&lt;span class="n"&gt;redis_latency&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:redis&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'down'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;error: &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'degraded'&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="c1"&gt;# Check Embedding Service&lt;/span&gt;

&lt;span class="k"&gt;begin&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;

&lt;span class="no"&gt;EmbeddingService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'test'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;embedding_latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:embeddings&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'up'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;latency_ms: &lt;/span&gt;&lt;span class="n"&gt;embedding_latency&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:embeddings&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'down'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;error: &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'degraded'&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="c1"&gt;# Check Sidekiq&lt;/span&gt;

&lt;span class="k"&gt;begin&lt;/span&gt;

&lt;span class="n"&gt;queue_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Sidekiq&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'document_processing'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;



&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:sidekiq&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="n"&gt;queue_size&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'up'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'degraded'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;queue_size: &lt;/span&gt;&lt;span class="n"&gt;queue_size&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;

&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:services&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:sidekiq&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'down'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;error: &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="s1"&gt;'healthy'&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

&lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="s1"&gt;'degraded'&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

&lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="s1"&gt;'unhealthy'&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="n"&gt;render&lt;/span&gt; &lt;span class="ss"&gt;json: &lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Kubernetes Liveness &amp;amp; Readiness Probes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# kubernetes/deployment.yaml&lt;/span&gt;

&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;

&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;

&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vector-search-api&lt;/span&gt;

&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rails&lt;/span&gt;

&lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vector-search-api:latest&lt;/span&gt;

&lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3000&lt;/span&gt;



&lt;span class="c1"&gt;# Liveness probe (restart if unhealthy)&lt;/span&gt;

&lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health&lt;/span&gt;

&lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3000&lt;/span&gt;

&lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;

&lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;

&lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;

&lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;



&lt;span class="c1"&gt;# Readiness probe (remove from load balancer if not ready)&lt;/span&gt;

&lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health&lt;/span&gt;

&lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3000&lt;/span&gt;

&lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;

&lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;

&lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;

&lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Prometheus Metrics &amp;amp; Monitoring
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Setting Up Prometheus
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# config/initializers/prometheus.rb&lt;/span&gt;

&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'prometheus/client'&lt;/span&gt;

&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'prometheus/client/formats/text'&lt;/span&gt;



&lt;span class="c1"&gt;# Global prometheus registry&lt;/span&gt;

&lt;span class="no"&gt;PROMETHEUS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Prometheus&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registry&lt;/span&gt;



&lt;span class="c1"&gt;# Vector search metrics&lt;/span&gt;

&lt;span class="no"&gt;VECTOR_SEARCH_DURATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PROMETHEUS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;:vector_search_duration_seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;docstring: &lt;/span&gt;&lt;span class="s1"&gt;'Vector search duration in seconds'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

&lt;span class="ss"&gt;buckets: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="no"&gt;VECTOR_SEARCH_TOTAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PROMETHEUS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;:vector_search_total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;docstring: &lt;/span&gt;&lt;span class="s1"&gt;'Total vector searches'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="no"&gt;VECTOR_SEARCH_CACHE_HIT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PROMETHEUS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;:vector_search_cache_hit_total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;docstring: &lt;/span&gt;&lt;span class="s1"&gt;'Cache hits for vector searches'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:tenant_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="c1"&gt;# Circuit breaker metrics&lt;/span&gt;

&lt;span class="no"&gt;CIRCUIT_BREAKER_STATE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PROMETHEUS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;:circuit_breaker_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;docstring: &lt;/span&gt;&lt;span class="s1"&gt;'Circuit breaker state (0=closed, 1=open, 2=half_open)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:tenant_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="no"&gt;CIRCUIT_BREAKER_FAILURES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PROMETHEUS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;:circuit_breaker_failures_total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;docstring: &lt;/span&gt;&lt;span class="s1"&gt;'Total circuit breaker failures'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:tenant_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="c1"&gt;# Document processing metrics&lt;/span&gt;

&lt;span class="no"&gt;DOCUMENT_PROCESSING_DURATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PROMETHEUS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;:document_processing_duration_seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;docstring: &lt;/span&gt;&lt;span class="s1"&gt;'Document processing duration in seconds'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

&lt;span class="ss"&gt;buckets: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="no"&gt;DOCUMENT_PROCESSING_TOTAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PROMETHEUS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;:document_processing_total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;docstring: &lt;/span&gt;&lt;span class="s1"&gt;'Total documents processed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:status&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="c1"&gt;# Sidekiq queue metrics&lt;/span&gt;

&lt;span class="no"&gt;SIDEKIQ_QUEUE_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PROMETHEUS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="ss"&gt;:sidekiq_queue_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;docstring: &lt;/span&gt;&lt;span class="s1"&gt;'Current Sidekiq queue size'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:queue&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Instrumenting Vector Search
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# app/services/vector_indexing_service.rb&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="ss"&gt;filters: &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="ss"&gt;limit: &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;



&lt;span class="k"&gt;begin&lt;/span&gt;

&lt;span class="c1"&gt;# Check cache first&lt;/span&gt;

&lt;span class="n"&gt;cache_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generate_cache_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cached_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached_result&lt;/span&gt;

&lt;span class="no"&gt;VECTOR_SEARCH_CACHE_HIT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cached_result&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="c1"&gt;# Perform search with resilience&lt;/span&gt;

&lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RateLimiterRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;breaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;CircuitBreakerRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;acquire&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;

&lt;span class="n"&gt;breaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;fallback: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;fallback_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;

&lt;span class="n"&gt;perform_vector_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="c1"&gt;# Cache result&lt;/span&gt;

&lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;expires_in: &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="c1"&gt;# Record success&lt;/span&gt;

&lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;

&lt;span class="no"&gt;VECTOR_SEARCH_DURATION&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="no"&gt;VECTOR_SEARCH_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;



&lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;

&lt;span class="c1"&gt;# Record failure&lt;/span&gt;

&lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;

&lt;span class="no"&gt;VECTOR_SEARCH_DURATION&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="no"&gt;VECTOR_SEARCH_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;



&lt;span class="k"&gt;raise&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Metrics Endpoint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# config/routes.rb&lt;/span&gt;

&lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;application&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;routes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;draw&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;

&lt;span class="n"&gt;get&lt;/span&gt; &lt;span class="s1"&gt;'/metrics'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;to: &lt;/span&gt;&lt;span class="s1"&gt;'metrics#show'&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="c1"&gt;# app/controllers/metrics_controller.rb&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MetricsController&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationController&lt;/span&gt;

&lt;span class="n"&gt;skip_before_action&lt;/span&gt; &lt;span class="ss"&gt;:authenticate_user!&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;show&lt;/span&gt;

&lt;span class="c1"&gt;# Update circuit breaker metrics&lt;/span&gt;

&lt;span class="no"&gt;CircuitBreakerRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;

&lt;span class="n"&gt;state_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:state&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="ss"&gt;:closed&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="ss"&gt;:open&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="ss"&gt;:half_open&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="no"&gt;CIRCUIT_BREAKER_STATE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="c1"&gt;# Update Sidekiq queue metrics&lt;/span&gt;

&lt;span class="no"&gt;Sidekiq&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;

&lt;span class="no"&gt;SIDEKIQ_QUEUE_SIZE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;labels: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;queue: &lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;



&lt;span class="c1"&gt;# Return Prometheus format&lt;/span&gt;

&lt;span class="n"&gt;render&lt;/span&gt; &lt;span class="ss"&gt;plain: &lt;/span&gt;&lt;span class="no"&gt;Prometheus&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Client&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Formats&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;marshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;PROMETHEUS&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

&lt;span class="ss"&gt;content_type: &lt;/span&gt;&lt;span class="no"&gt;Prometheus&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Client&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Formats&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Text&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;CONTENT_TYPE&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;end&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Grafana Dashboards
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Panels
&lt;/h3&gt;

&lt;p&gt;*&lt;strong&gt;&lt;em&gt;1. Search Performance&lt;/em&gt;&lt;/strong&gt;*&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
# P50 latency

histogram_quantile(0.50,

rate(vector_search_duration_seconds_bucket[5m]))



# P95 latency

histogram_quantile(0.95,

rate(vector_search_duration_seconds_bucket[5m]))



# P99 latency

histogram_quantile(0.99,

rate(vector_search_duration_seconds_bucket[5m]))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;*&lt;strong&gt;&lt;em&gt;2. Throughput&lt;/em&gt;&lt;/strong&gt;*&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
# Requests per second

rate(vector_search_total[1m])



# By tenant

sum(rate(vector_search_total[5m])) by (tenant_id)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;*&lt;strong&gt;&lt;em&gt;3. Cache Hit Rate&lt;/em&gt;&lt;/strong&gt;*&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
# Cache hit rate percentage

100 * (

rate(vector_search_cache_hit_total[5m]) /

rate(vector_search_total[5m])

)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;*&lt;strong&gt;&lt;em&gt;4. Circuit Breaker Status&lt;/em&gt;&lt;/strong&gt;*&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
# Count of open circuits

count(circuit_breaker_state == 1)



# Circuit state by tenant

circuit_breaker_state

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;*&lt;strong&gt;&lt;em&gt;5. Error Rate&lt;/em&gt;&lt;/strong&gt;*&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
# Error percentage

100 * (

rate(vector_search_total{status="error"}[5m]) /

rate(vector_search_total[5m])

)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dashboard JSON
&lt;/h3&gt;

&lt;p&gt;See full dashboard configuration: &lt;code&gt;examples/grafana-dashboard.json&lt;/code&gt; in the Vectra repo.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production Metrics (Real Numbers)
&lt;/h2&gt;

&lt;p&gt;After implementing resilience patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Search Latency:

- P50: 45ms (cached: 3ms)

- P95: 120ms (cached: 8ms)

- P99: 250ms



Availability:

- Uptime: 99.94% (exceeding 99.9% SLA)

- MTTR: 8 minutes average

- Longest incident: 45 minutes (Qdrant upgrade)



Circuit Breaker Stats:

- Opened: 12 times in 2024

- Average recovery time: 2.5 minutes

- Prevented outages: 8 incidents



Cache Performance:

- Hit rate: 42%

- Memory usage: 2.1GB

- Evictions: 145/day



Rate Limiting:

- Triggered: 234 times/month

- Top abuser: 1,847 requests in 1 minute

- No successful DOS attacks

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What We've Covered
&lt;/h2&gt;

&lt;p&gt;✅ *&lt;strong&gt;&lt;em&gt;Circuit Breakers&lt;/em&gt;&lt;/strong&gt;* - Prevented 8 outages in 2024&lt;/p&gt;

&lt;p&gt;✅ *&lt;strong&gt;&lt;em&gt;Rate Limiting&lt;/em&gt;&lt;/strong&gt;* - Per-tenant isolation, no DOS attacks&lt;/p&gt;

&lt;p&gt;✅ *&lt;strong&gt;&lt;em&gt;Health Checks&lt;/em&gt;&lt;/strong&gt;* - Kubernetes-ready monitoring&lt;/p&gt;

&lt;p&gt;✅ *&lt;strong&gt;&lt;em&gt;Prometheus Metrics&lt;/em&gt;&lt;/strong&gt;* - Real-time observability&lt;/p&gt;

&lt;p&gt;✅ *&lt;strong&gt;&lt;em&gt;Grafana Dashboards&lt;/em&gt;&lt;/strong&gt;* - Beautiful visualizations&lt;/p&gt;




&lt;h2&gt;
  
  
  Coming in Part 3: Cost Optimization 💰
&lt;/h2&gt;

&lt;p&gt;We've built a resilient system. Now let's talk money.&lt;/p&gt;

&lt;p&gt;In *&lt;strong&gt;&lt;em&gt;Part 3&lt;/em&gt;&lt;/strong&gt;* (Friday), we'll cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Infrastructure Cost Breakdown&lt;/em&gt;&lt;/strong&gt;* - $600/month vs $3,200/month (81% savings)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Cache Optimization&lt;/em&gt;&lt;/strong&gt;* - 42% hit rate = 70% latency reduction&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Chunking Strategy&lt;/em&gt;&lt;/strong&gt;* - Why 512 tokens with 128 overlap beats 1000 tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Self-Hosting Embeddings&lt;/em&gt;&lt;/strong&gt;* - $200/month savings vs OpenAI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;5 Key Lessons Learned&lt;/em&gt;&lt;/strong&gt;* - What worked, what didn't&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Spoiler:&lt;/em&gt;&lt;/strong&gt;* The biggest savings came from caching, not from switching providers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Vectra Gem&lt;/em&gt;&lt;/strong&gt;*: &lt;a href="https://github.com/stokry/vectra" rel="noopener noreferrer"&gt;github.com/stokry/vectra&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Grafana Dashboard&lt;/em&gt;&lt;/strong&gt;*: &lt;code&gt;examples/grafana-dashboard.json&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;*&lt;strong&gt;&lt;em&gt;Prometheus Exporter&lt;/em&gt;&lt;/strong&gt;*: &lt;code&gt;examples/prometheus-exporter.rb&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Questions about production resilience? Share your own war stories in the comments!&lt;/p&gt;

</description>
      <category>rails</category>
      <category>ruby</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>Building Enterprise Vector Search in Rails (Part 1/3): Architecture &amp; Multi-Tenant Implementation</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Sun, 11 Jan 2026 15:38:35 +0000</pubDate>
      <link>https://dev.to/stokry/building-enterprise-vector-search-in-rails-part-13-architecture-multi-tenant-implementation-4opf</link>
      <guid>https://dev.to/stokry/building-enterprise-vector-search-in-rails-part-13-architecture-multi-tenant-implementation-4opf</guid>
      <description>&lt;p&gt;&lt;strong&gt;This is Part 1 of a 3-part series&lt;/strong&gt; on building production-ready vector search for enterprise SaaS.&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 1: Architecture &amp;amp; Implementation&lt;/strong&gt; 👈 You are here&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/stokry/building-enterprise-vector-search-in-rails-part-23-production-resilience-monitoring-3fpj"&gt;Part 2: Production Resilience &amp;amp; Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: Cost Optimization &amp;amp; Lessons Learned (Coming Friday)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: Deep dive into building an enterprise SaaS platform that processes 2M+ compliance documents monthly using vector search. This part covers the architecture, Rails implementation, and multi-tenant isolation patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Business Problem
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The scenario:&lt;/strong&gt; An enterprise SaaS platform serving Fortune 500 companies managing regulatory compliance - banks, healthcare, and pharma companies dealing with complex regulatory requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pain point:&lt;/strong&gt; Organizations receive thousands of regulatory documents monthly (SEC filings, FDA guidelines, ISO standards, GDPR updates). Compliance teams spend 60+ hours/week manually searching through PDFs to find relevant sections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The challenge:&lt;/strong&gt; Build AI-powered semantic search that understands regulatory language and returns precise results across millions of documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The constraints:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;150+ enterprise clients (multi-tenant architecture)&lt;/li&gt;
&lt;li&gt;2.1M documents total, growing 50k/month&lt;/li&gt;
&lt;li&gt;Average document: 200 pages, 500KB&lt;/li&gt;
&lt;li&gt;SOC2 + GDPR compliant (audit logs, data isolation)&lt;/li&gt;
&lt;li&gt;99.9% uptime SLA ($10k/hour penalties)&lt;/li&gt;
&lt;li&gt;Budget: $2,500/month for infrastructure&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Architecture: What We Built
&lt;/h2&gt;

&lt;h3&gt;
  
  
  High-Level Overview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmd9czp0jyxl7r2nl3se.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmd9czp0jyxl7r2nl3se.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Components:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rails API&lt;/strong&gt; - Multi-tenant document processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vectra&lt;/strong&gt; - Unified vector DB client (provider-agnostic)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant&lt;/strong&gt; - Self-hosted vector database (cost + GDPR compliance)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sidekiq&lt;/strong&gt; - Background job processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sentence-Transformers&lt;/strong&gt; - Self-hosted embedding generation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Multi-Tenant Document Processing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Challenge: Processing PDFs at Scale
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt; Client uploads a 300-page PDF (SEC 10-K filing)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extract text from PDF&lt;/li&gt;
&lt;li&gt;Split into searchable chunks (with overlap for context)&lt;/li&gt;
&lt;li&gt;Generate embeddings for each chunk&lt;/li&gt;
&lt;li&gt;Index with tenant isolation&lt;/li&gt;
&lt;li&gt;Track processing status&lt;/li&gt;
&lt;li&gt;Audit trail for compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Implementation
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Document Model (Rails)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/models/document.rb&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:tenant&lt;/span&gt;
  &lt;span class="n"&gt;has_many&lt;/span&gt; &lt;span class="ss"&gt;:document_chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;dependent: :destroy&lt;/span&gt;

  &lt;span class="c1"&gt;# Store embeddings as binary&lt;/span&gt;
  &lt;span class="n"&gt;serialize&lt;/span&gt; &lt;span class="ss"&gt;:metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;JSON&lt;/span&gt;

  &lt;span class="n"&gt;enum&lt;/span&gt; &lt;span class="ss"&gt;status: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="ss"&gt;pending: &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;processing: &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;indexed: &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;failed: &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;# Validations&lt;/span&gt;
  &lt;span class="n"&gt;validates&lt;/span&gt; &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:file_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;presence: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
  &lt;span class="n"&gt;validates&lt;/span&gt; &lt;span class="ss"&gt;:tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;presence: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;

  &lt;span class="c1"&gt;# Callbacks&lt;/span&gt;
  &lt;span class="n"&gt;after_create&lt;/span&gt; &lt;span class="ss"&gt;:schedule_processing&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;schedule_processing&lt;/span&gt;
    &lt;span class="no"&gt;DocumentProcessingJob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perform_later&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this model?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tenant_id&lt;/code&gt; ensures every document belongs to a tenant&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;status&lt;/code&gt; enum tracks processing pipeline&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;document_chunks&lt;/code&gt; stores split chunks with embeddings&lt;/li&gt;
&lt;li&gt;Background job keeps upload fast (&amp;lt; 300ms)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Chunk Splitting Strategy
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; You can't embed an entire 300-page document. You need to split it intelligently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Sliding window with overlap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/services/document_chunker.rb&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DocumentChunker&lt;/span&gt;
  &lt;span class="no"&gt;CHUNK_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;        &lt;span class="c1"&gt;# tokens (~400 words)&lt;/span&gt;
  &lt;span class="no"&gt;CHUNK_OVERLAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;     &lt;span class="c1"&gt;# tokens for context continuity&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="vi"&gt;@document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;
    &lt;span class="vi"&gt;@text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;split&lt;/span&gt;
    &lt;span class="c1"&gt;# Tokenize text&lt;/span&gt;
    &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="vi"&gt;@text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Sliding window&lt;/span&gt;
    &lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;length&lt;/span&gt;
      &lt;span class="n"&gt;chunk_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="no"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

      &lt;span class="c1"&gt;# Convert back to text&lt;/span&gt;
      &lt;span class="n"&gt;chunk_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

      &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="ss"&gt;text: &lt;/span&gt;&lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;position: &lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;page_start: &lt;/span&gt;&lt;span class="n"&gt;calculate_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="ss"&gt;page_end: &lt;/span&gt;&lt;span class="n"&gt;calculate_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="no"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="c1"&gt;# Slide window with overlap&lt;/span&gt;
      &lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;CHUNK_SIZE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="no"&gt;CHUNK_OVERLAP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="n"&gt;chunks&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="kp"&gt;private&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Using pdf-reader gem&lt;/span&gt;
    &lt;span class="n"&gt;pdf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PDF&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:text&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Simple whitespace tokenization (use tiktoken for production)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\s+/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token_position&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Average 300 tokens per page&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token_position&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;300.0&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ceil&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why overlap?&lt;/strong&gt; Without it, search misses context at chunk boundaries. With 128-token overlap, we improved precision from &lt;strong&gt;0.73 → 0.89&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Background Processing Job
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/jobs/document_processing_job.rb&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DocumentProcessingJob&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationJob&lt;/span&gt;
  &lt;span class="n"&gt;queue_as&lt;/span&gt; &lt;span class="ss"&gt;:document_processing&lt;/span&gt;

  &lt;span class="c1"&gt;# Retry with exponential backoff&lt;/span&gt;
  &lt;span class="n"&gt;retry_on&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;wait: :exponentially_longer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;attempts: &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;perform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;status: :processing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 1: Split document into chunks&lt;/span&gt;
    &lt;span class="n"&gt;chunker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;DocumentChunker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Generate embeddings for each chunk&lt;/span&gt;
    &lt;span class="n"&gt;embedder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;EmbeddingService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;
    &lt;span class="n"&gt;chunk_records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each_with_index&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:text&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

      &lt;span class="n"&gt;chunk_record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;DocumentChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="ss"&gt;document: &lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;text: &lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:text&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="ss"&gt;position: &lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:position&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="ss"&gt;page_start: &lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:page_start&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="ss"&gt;page_end: &lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:page_end&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="ss"&gt;embedding: &lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;

      &lt;span class="n"&gt;chunk_records&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;chunk_record&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Batch index to vector DB (tenant-isolated)&lt;/span&gt;
    &lt;span class="no"&gt;VectorIndexingService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;index_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: Mark as indexed&lt;/span&gt;
    &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="ss"&gt;status: :indexed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;chunk_count: &lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;indexed_at: &lt;/span&gt;&lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 5: Audit log&lt;/span&gt;
    &lt;span class="no"&gt;AuditLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;event_type: &lt;/span&gt;&lt;span class="s1"&gt;'document_indexed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;resource_type: &lt;/span&gt;&lt;span class="s1"&gt;'Document'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;resource_id: &lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;metadata: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="ss"&gt;chunk_count: &lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;file_size: &lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;duration_ms: &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
    &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;status: :failed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;error_message: &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="no"&gt;Sentry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;capture_exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;extra: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;document_id: &lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Performance:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processes 15 documents/minute&lt;/li&gt;
&lt;li&gt;Average processing time: 4 seconds per document&lt;/li&gt;
&lt;li&gt;Failure rate: 0.3% (mostly PDF parsing issues)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. Embedding Service (Self-Hosted)
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Why self-hosted?&lt;/strong&gt; GDPR compliance - we can't send client data to OpenAI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/services/embedding_service.rb&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingService&lt;/span&gt;
  &lt;span class="no"&gt;EMBEDDING_ENDPOINT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'EMBEDDING_SERVICE_URL'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'http://localhost:8080'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="no"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'all-mpnet-base-v2'&lt;/span&gt;  &lt;span class="c1"&gt;# 768 dimensions&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Add retry logic&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Faraday&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;url: &lt;/span&gt;&lt;span class="no"&gt;EMBEDDING_ENDPOINT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt; &lt;span class="ss"&gt;:json&lt;/span&gt;
      &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;response&lt;/span&gt; &lt;span class="ss"&gt;:json&lt;/span&gt;
      &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;adapter&lt;/span&gt; &lt;span class="no"&gt;Faraday&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default_adapter&lt;/span&gt;
      &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
      &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open_timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/embeddings'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="ss"&gt;text: &lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="no"&gt;MODEL&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="s2"&gt;"Embedding failed: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success?&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'embedding'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;Faraday&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Error&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
    &lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Embedding service error: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cost savings:&lt;/strong&gt; Self-hosting saves &lt;strong&gt;$200/month&lt;/strong&gt; vs OpenAI API (2M embeddings/month).&lt;/p&gt;




&lt;h2&gt;
  
  
  Vector Search with Multi-Tenant Isolation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Challenge: Tenant Data Isolation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Critical requirement:&lt;/strong&gt; Client A must NEVER see Client B's documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach:&lt;/strong&gt; Qdrant namespaces + application-level verification (defense in depth)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/services/vector_indexing_service.rb&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VectorIndexingService&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;
    &lt;span class="vi"&gt;@client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;build_client&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="c1"&gt;# Index entire document (batched)&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="ss"&gt;id: &lt;/span&gt;&lt;span class="n"&gt;vector_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="ss"&gt;values: &lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;metadata: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="ss"&gt;document_id: &lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;title: &lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;page_start: &lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;page_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;page_end: &lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;page_end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;chunk_text: &lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;# Preview only&lt;/span&gt;
          &lt;span class="ss"&gt;indexed_at: &lt;/span&gt;&lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iso8601&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="c1"&gt;# Batch upsert with tenant namespace&lt;/span&gt;
    &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="vi"&gt;@client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="ss"&gt;index: &lt;/span&gt;&lt;span class="s1"&gt;'compliance_documents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;vectors: &lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;namespace: &lt;/span&gt;&lt;span class="n"&gt;namespace_for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;concurrency: &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;"Indexed document &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; for tenant &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: "&lt;/span&gt; &lt;span class="p"&gt;\&lt;/span&gt;
      &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:success&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; chunks"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="c1"&gt;# Search within tenant (isolated)&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="ss"&gt;filters: &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="ss"&gt;limit: &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Generate query embedding&lt;/span&gt;
    &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;EmbeddingService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Ensure tenant isolation&lt;/span&gt;
    &lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;namespace_for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Query vector DB&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="ss"&gt;index: &lt;/span&gt;&lt;span class="s1"&gt;'compliance_documents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;vector: &lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;top_k: &lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;namespace: &lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# CRITICAL: tenant isolation&lt;/span&gt;
      &lt;span class="ss"&gt;filter: &lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;include_metadata: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Verify tenant_id in results (defense in depth)&lt;/span&gt;
    &lt;span class="n"&gt;verified_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'tenant_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="c1"&gt;# Log potential security issue&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verified_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;
      &lt;span class="no"&gt;SecurityAlert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="ss"&gt;severity: &lt;/span&gt;&lt;span class="s1"&gt;'critical'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;message: &lt;/span&gt;&lt;span class="s2"&gt;"Tenant isolation breach detected"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;details: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;expected: &lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;verified: &lt;/span&gt;&lt;span class="n"&gt;verified_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="n"&gt;verified_results&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="kp"&gt;private&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_client&lt;/span&gt;
    &lt;span class="c1"&gt;# Cached client with resilience&lt;/span&gt;
    &lt;span class="n"&gt;base_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;qdrant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="ss"&gt;host: &lt;/span&gt;&lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'QDRANT_HOST'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;api_key: &lt;/span&gt;&lt;span class="no"&gt;ENV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'QDRANT_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="ss"&gt;timeout: &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;max_retries: &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add caching layer&lt;/span&gt;
    &lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="ss"&gt;ttl: &lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# 1 hour for search results&lt;/span&gt;
      &lt;span class="ss"&gt;max_size: &lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;  &lt;span class="c1"&gt;# Top 5000 queries cached&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;CachedClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;cache: &lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;namespace_for_tenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="s2"&gt;"tenant_&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;vector_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="s2"&gt;"chunk_&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Security Best Practice: Never Trust, Always Verify
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NEVER trust the namespace alone&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;namespace: &lt;/span&gt;&lt;span class="s2"&gt;"tenant_&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ALWAYS verify in application&lt;/span&gt;
&lt;span class="n"&gt;verified&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'tenant_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verified&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;
  &lt;span class="c1"&gt;# SECURITY BREACH - alert immediately&lt;/span&gt;
  &lt;span class="no"&gt;SecurityAlert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;critical!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Tenant isolation breach detected"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Search API with Enterprise Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Controller
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/controllers/api/v1/search_controller.rb&lt;/span&gt;
&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;Api&lt;/span&gt;
  &lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;V1&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SearchController&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApiController&lt;/span&gt;
      &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:authenticate_user!&lt;/span&gt;
      &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:rate_limit_check&lt;/span&gt;

      &lt;span class="c1"&gt;# POST /api/v1/search&lt;/span&gt;
      &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Validate query&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blank?&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;render&lt;/span&gt; &lt;span class="ss"&gt;json: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;error: &lt;/span&gt;&lt;span class="s1"&gt;'Query too short'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="ss"&gt;status: :bad_request&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;

        &lt;span class="c1"&gt;# Build filters from params&lt;/span&gt;
        &lt;span class="n"&gt;filters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;build_filters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:filters&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# Perform search (with timing)&lt;/span&gt;
        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;VectorIndexingService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;current_tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;query: &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;filters: &lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;limit: &lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:limit&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;duration_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Hydrate results (load Document records)&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hydrate_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Log search for analytics&lt;/span&gt;
        &lt;span class="no"&gt;SearchLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;current_tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;user_id: &lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;query: &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;result_count: &lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;duration_ms: &lt;/span&gt;&lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;filters: &lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Audit log for compliance&lt;/span&gt;
        &lt;span class="no"&gt;AuditLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;current_tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;user_id: &lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;event_type: &lt;/span&gt;&lt;span class="s1"&gt;'document_search'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;metadata: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="ss"&gt;query: &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="ss"&gt;result_count: &lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="ss"&gt;duration_ms: &lt;/span&gt;&lt;span class="n"&gt;duration_ms&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;render&lt;/span&gt; &lt;span class="ss"&gt;json: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="ss"&gt;results: &lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;DocumentSerializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;as_json&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="ss"&gt;metadata: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="ss"&gt;total: &lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="ss"&gt;duration_ms: &lt;/span&gt;&lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="ss"&gt;cached: &lt;/span&gt;&lt;span class="n"&gt;cache_hit?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;Vectra&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Error&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
        &lt;span class="c1"&gt;# Handle vector DB errors gracefully&lt;/span&gt;
        &lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Vector search error: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="no"&gt;Sentry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;capture_exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Fallback to SQL search&lt;/span&gt;
        &lt;span class="n"&gt;fallback_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fallback_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;render&lt;/span&gt; &lt;span class="ss"&gt;json: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="ss"&gt;results: &lt;/span&gt;&lt;span class="n"&gt;fallback_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="ss"&gt;metadata: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="ss"&gt;fallback: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="ss"&gt;error: &lt;/span&gt;&lt;span class="s1"&gt;'Vector search unavailable'&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="ss"&gt;status: :partial_content&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;

      &lt;span class="kp"&gt;private&lt;/span&gt;

      &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_filters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filter_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;filter_params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt;

        &lt;span class="n"&gt;filters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:document_type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;filter_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:document_type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;filter_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:document_type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:year&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;filter_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:year&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_i&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;filter_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:year&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:regulatory_body&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;filter_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:regulatory_body&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;filter_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:regulatory_body&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;filters&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;

      &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hydrate_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Extract document IDs from vector search results&lt;/span&gt;
        &lt;span class="n"&gt;document_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'document_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;uniq&lt;/span&gt;

        &lt;span class="c1"&gt;# Load documents from DB&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;id: &lt;/span&gt;&lt;span class="n"&gt;document_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;tenant_id: &lt;/span&gt;&lt;span class="n"&gt;current_tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                           &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;index_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Attach scores to documents&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
          &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'document_id'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
          &lt;span class="k"&gt;next&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;

          &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;instance_variable_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:@search_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;instance_variable_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:@matched_chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'chunk_text'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
          &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;instance_variable_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:@matched_pages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'page_start'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'page_end'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

          &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;define_singleton_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:search_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="vi"&gt;@search_score&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;define_singleton_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:matched_chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="vi"&gt;@matched_chunk&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;define_singleton_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:matched_pages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="vi"&gt;@matched_pages&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

          &lt;span class="n"&gt;doc&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compact&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;

      &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rate_limit_check&lt;/span&gt;
        &lt;span class="c1"&gt;# 100 searches per hour per user&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"search_rate_limit:&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;expires_in: &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
          &lt;span class="n"&gt;render&lt;/span&gt; &lt;span class="ss"&gt;json: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;error: &lt;/span&gt;&lt;span class="s1"&gt;'Rate limit exceeded'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="ss"&gt;status: :too_many_requests&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;API Response Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SEC 10-K Filing 2024"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"search_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.89&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"matched_chunk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The company faces significant regulatory risks..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"matched_pages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"45-46"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cached"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What We've Built So Far
&lt;/h2&gt;

&lt;p&gt;✅ &lt;strong&gt;Multi-tenant document processing pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF → Text extraction&lt;/li&gt;
&lt;li&gt;Intelligent chunking with overlap&lt;/li&gt;
&lt;li&gt;Self-hosted embeddings (GDPR compliant)&lt;/li&gt;
&lt;li&gt;Background job processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ &lt;strong&gt;Secure vector search&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Qdrant namespace isolation&lt;/li&gt;
&lt;li&gt;Application-level verification&lt;/li&gt;
&lt;li&gt;Audit logging&lt;/li&gt;
&lt;li&gt;Graceful fallback to SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ &lt;strong&gt;Production-ready API&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast response times (45ms P50)&lt;/li&gt;
&lt;li&gt;Comprehensive error handling&lt;/li&gt;
&lt;li&gt;Rate limiting&lt;/li&gt;
&lt;li&gt;Search analytics&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Coming in Part 2: Production Resilience 🛡️
&lt;/h2&gt;

&lt;p&gt;Building the search feature is only half the battle. In &lt;strong&gt;Part 2&lt;/strong&gt; (Wednesday), we'll cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Circuit Breakers&lt;/strong&gt; - How they saved us during Black Friday when Qdrant overloaded&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate Limiting&lt;/strong&gt; - Per-tenant throttling to prevent abuse&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health Checks&lt;/strong&gt; - Kubernetes-ready monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus Metrics&lt;/strong&gt; - Real-time observability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grafana Dashboards&lt;/strong&gt; - Visualizing search performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real incident story:&lt;/strong&gt; Our Qdrant cluster hit 98% CPU during a traffic spike. Without circuit breakers, we would have had a complete outage. Instead, 99.2% of searches still worked using cached results and PostgreSQL fallback.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vectra Gem&lt;/strong&gt;: &lt;a href="https://github.com/stokry/vectra" rel="noopener noreferrer"&gt;github.com/stokry/vectra&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt;: &lt;a href="https://vectra-docs.netlify.app/" rel="noopener noreferrer"&gt;vectra-docs.netlify.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example Code&lt;/strong&gt;: &lt;code&gt;examples/comprehensive_demo.rb&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Questions about the architecture or implementation? Drop a comment below!&lt;/p&gt;




</description>
      <category>rails</category>
      <category>ruby</category>
      <category>ai</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>Vectra — The Unified Vector Database Client for Ruby</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Fri, 09 Jan 2026 17:59:04 +0000</pubDate>
      <link>https://dev.to/stokry/vectra-the-unified-vector-database-client-for-ruby-3mj6</link>
      <guid>https://dev.to/stokry/vectra-the-unified-vector-database-client-for-ruby-3mj6</guid>
      <description>&lt;p&gt;If you've ever built AI applications in Ruby — semantic search, RAG, recommendation engines, or embedding queries — you know how quickly it becomes tedious to support multiple vector databases. Each provider has its own API, client, and quirks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vectra&lt;/strong&gt; solves this problem by providing a &lt;strong&gt;single, unified API for all popular vector DBs&lt;/strong&gt;, freeing you from vendor lock-in. Write your code once and switch backends effortlessly — Pinecone, Qdrant, Weaviate, and even &lt;strong&gt;PostgreSQL with pgvector&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/stokry/vectra/" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Why Vectra?
&lt;/h2&gt;

&lt;p&gt;Vector databases are becoming a standard in modern AI applications, yet the Ruby ecosystem lacks a consistent way to work with them. Vectra changes that with:&lt;/p&gt;

&lt;p&gt;✨ &lt;strong&gt;Provider-agnostic API&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One set of methods: &lt;code&gt;upsert&lt;/code&gt;, &lt;code&gt;query&lt;/code&gt;, &lt;code&gt;delete&lt;/code&gt;, and more — no need to learn each vendor's SDK.
&lt;a href="https://vectra-docs.netlify.app/" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⚙️ &lt;strong&gt;Support for multiple databases&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pinecone
&lt;/li&gt;
&lt;li&gt;Qdrant
&lt;/li&gt;
&lt;li&gt;Weaviate
&lt;/li&gt;
&lt;li&gt;pgvector
All through the same client.
&lt;a href="https://github.com/stokry/vectra/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💪 &lt;strong&gt;Production-ready&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retry logic, configurable backoff, observability through metrics, and ready-to-use classes for production workloads.
&lt;a href="https://vectra-docs.netlify.app/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📦 &lt;strong&gt;Rails Integration&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;has_vector&lt;/code&gt; DSL for ActiveRecord models — embedding fields can be automatically indexed and searched.
&lt;a href="https://github.com/stokry/vectra/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📚 &lt;strong&gt;Well-documented gem&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guides, YARD documentation, and examples for each provider.
&lt;a href="https://vectra-docs.netlify.app/" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Example usage:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;require 'vectra'

# Initialize client for any provider
client = Vectra::Client.new(
  provider: :pinecone,
  api_key: ENV['PINECONE_API_KEY'],
  environment: 'us-west-4'
)

# Upsert an embedding
client.upsert(
  vectors: [
    { id: 'doc-1', values: [0.1,0.2,0.3], metadata: { title: 'Hello' } }
  ]
)

# Query for similar vectors
results = client.query(vector: [0.1,0.2,0.3], top_k: 5)
results.each { |m| puts "#{m.id}: #{m.score}" }

# Delete
client.delete(ids: ['doc-1'])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The &lt;strong&gt;same API works without changes&lt;/strong&gt; if you switch the client to Qdrant, Weaviate, or pgvector.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/stokry/vectra/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;💡 When Vectra is Useful&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Building applications with semantic search&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implementing RAG (retrieval-augmented generation) with embeddings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Avoiding vendor lock-in across vector databases&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Having a consistent Ruby API surface for all vector databases&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;📦 Where to Find It&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; [&lt;a href="https://github.com/stokry/vectra" rel="noopener noreferrer"&gt;https://github.com/stokry/vectra&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Documentation:&lt;/strong&gt; [&lt;a href="https://vectra-docs.netlify.app/" rel="noopener noreferrer"&gt;https://vectra-docs.netlify.app/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you’re working with embeddings and vector search in Ruby (especially in Rails apps), &lt;strong&gt;Vectra can save you hours of frustration&lt;/strong&gt; and make your architecture future-proof — letting you switch backends without rewriting code&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>ai</category>
      <category>rails</category>
    </item>
    <item>
      <title>I Found a Copy of My Blog Post Online — Here’s the Tool I Built to Track Every Plagiarized Version</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Fri, 05 Dec 2025 07:58:55 +0000</pubDate>
      <link>https://dev.to/stokry/i-found-a-copy-of-my-blog-post-online-heres-the-tool-i-built-to-track-every-plagiarized-version-18je</link>
      <guid>https://dev.to/stokry/i-found-a-copy-of-my-blog-post-online-heres-the-tool-i-built-to-track-every-plagiarized-version-18je</guid>
      <description>&lt;p&gt;Have you ever googled a paragraph from your own article…&lt;/p&gt;

&lt;p&gt;…and found &lt;strong&gt;5 other websites publishing it as if they wrote it&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;That happened to me.&lt;/p&gt;

&lt;p&gt;More than once.&lt;/p&gt;

&lt;p&gt;After dealing with DMCA requests, unanswered emails, and plagiarism detectors that give you a “score” but no actual proof, I decided to build something of my own.&lt;/p&gt;

&lt;p&gt;Today I’m sharing the tool I built — and use daily — to &lt;strong&gt;find duplicated content across the web and see exactly who copied what&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why I Built It&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I wanted three things that existing tools weren’t giving me:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Actual sources&lt;/strong&gt; — not vague percentages&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visual diff&lt;/strong&gt; — sentence-by-sentence similarities&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Evidence&lt;/strong&gt; — something I could send to a hosting provider or editor&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And I wanted it to be instant: paste text → get proof.&lt;/p&gt;

&lt;p&gt;That’s how &lt;strong&gt;&lt;a href="//nooth.dev"&gt;Nooth.dev&lt;/a&gt;&lt;/strong&gt; was born.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;How It Works (Demo)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You paste any piece of content — a blog post, newsletter, documentation, product description — and Nooth.dev searches the web for matches.&lt;/p&gt;

&lt;p&gt;Within a few seconds, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A list of websites containing similar or identical content&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A similarity score&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A split-screen diff view (left = your original, right = the found source)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Highlighted overlaps so you can see the exact copied fragments&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Domain metadata (age, authority, region)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can try it here (3 free analyses/day):&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="//nooth.dev"&gt;nooth.dev&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No login required.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;A Real Example: 74% Similarity Match&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A few days ago I tested one of my older tutorials.&lt;/p&gt;

&lt;p&gt;&lt;a href="//nooth.dev"&gt;Nooth&lt;/a&gt; found a website hosting a barely modified version of it at &lt;strong&gt;74% similarity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The diff view clearly showed block-by-block copying.&lt;/p&gt;

&lt;p&gt;That was the moment I realized: &lt;em&gt;okay — this thing is genuinely useful.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What Makes It Different&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most plagiarism tools give you:&lt;/p&gt;

&lt;p&gt;❌ No real source&lt;/p&gt;

&lt;p&gt;❌ No detailed diff&lt;/p&gt;

&lt;p&gt;❌ No transparency&lt;/p&gt;

&lt;p&gt;❌ No usable proof&lt;/p&gt;

&lt;p&gt;Nooth is more like a &lt;strong&gt;forensic tool&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;It doesn’t just say “67% copied”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It shows &lt;em&gt;exactly which sentences&lt;/em&gt; match&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;And &lt;em&gt;from which site&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r3241aa3eano1fdq2ew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r3241aa3eano1fdq2ew.png" alt="Nooth UI" width="800" height="505"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you publish anything online — especially technical content — this helps you see where your work ends up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwg9bbs2jqq83xs1ei5k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwg9bbs2jqq83xs1ei5k.png" alt="Nooth UI" width="800" height="770"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Free Tier (Current MVP Stage)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Right now, during early access:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;3 free analyses per day&lt;/strong&gt; (IP-based, no login)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The first results are fully visible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Additional matches are behind the upcoming registration&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything is fully functional — just limited to prevent abuse.&lt;/p&gt;

&lt;p&gt;Registration + unlimited mode is coming soon.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Want to Test It?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Try the live tool here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://nooth.dev" rel="noopener noreferrer"&gt;https://nooth.dev&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you find an interesting match, I’d love to hear — feel free to DM me on X or leave a comment.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What’s Next&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In the next weeks, I plan to add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;User accounts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Saved reports&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Email alerts when new copies appear&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automatic DMCA template&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API access&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ll share weekly progress updates here on Dev.to.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>productivity</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>I Built an AI That Finds Every Article Similar to Your Text — Here’s How It Works</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Tue, 02 Dec 2025 12:34:19 +0000</pubDate>
      <link>https://dev.to/stokry/i-built-an-ai-that-finds-every-article-similar-to-your-text-heres-how-it-works-bhb</link>
      <guid>https://dev.to/stokry/i-built-an-ai-that-finds-every-article-similar-to-your-text-heres-how-it-works-bhb</guid>
      <description>&lt;p&gt;Most search engines will show you pages that &lt;em&gt;mention&lt;/em&gt; the same keywords.&lt;/p&gt;

&lt;p&gt;But what if you want to find articles that express the &lt;strong&gt;same ideas&lt;/strong&gt;, even if they’re written completely differently?&lt;/p&gt;

&lt;p&gt;That’s the problem that pushed me into building something new.&lt;/p&gt;

&lt;p&gt;A few months ago, I needed to verify whether parts of my content were being reused online. Google wasn’t helpful — keyword search missed half of the semantically similar pages. Rephrased paragraphs were completely invisible.&lt;/p&gt;

&lt;p&gt;So I built my own engine.&lt;/p&gt;

&lt;p&gt;Today, I’m excited to introduce &lt;strong&gt;&lt;a href="https://nooth.dev/" rel="noopener noreferrer"&gt;Nooth&lt;/a&gt;&lt;/strong&gt;, an AI-powered system that takes &lt;em&gt;any text you enter&lt;/em&gt; and searches the web for content with similar themes, ideas, and semantic meaning.&lt;/p&gt;

&lt;p&gt;Drop in a paragraph → get URLs, similarity percentages, and deep analytical insights.&lt;/p&gt;

&lt;p&gt;Here’s how it works under the hood.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;🚀 The Idea&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Traditional search relies on keyword overlap.&lt;/p&gt;

&lt;p&gt;If two texts don’t share the same words, the engine assumes they’re unrelated.&lt;/p&gt;

&lt;p&gt;But humans don’t think that way.&lt;/p&gt;

&lt;p&gt;We recognize similarity through &lt;strong&gt;meaning&lt;/strong&gt; — the structure of ideas, not characters.&lt;/p&gt;

&lt;p&gt;So the goal was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Build an engine that understands what your text is about, then find everything on the internet that expresses the same ideas.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That meant two pieces were crucial:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A way to measure meaning&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A way to scan the web, extract clean text, and compare it&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;🧠 Step 1: Turning Text Into Meaning (Embeddings)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;At the core of Nooth is a semantic model that converts text into embeddings — numerical vectors that represent meaning.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How to speed up your Node.js API”&lt;/p&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;p&gt;“Optimizing performance in a JavaScript backend”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These look different, but in vector space, their embeddings are neighbors.&lt;/p&gt;

&lt;p&gt;Every time you paste text into Nooth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;it cleans and normalizes it&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;creates sentence-level embeddings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;composes them into a unified “semantic fingerprint”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That fingerprint becomes the reference point for the search.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;🌐 Step 2: Finding Content Across the Web&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Nooth crawls and indexes content using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;lightweight scrapers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;structured metadata&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;boilerplate removal&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;language detection&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;canonical URL resolution&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;and chunking for long-form content&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal isn’t to store the entire HTML — only the &lt;strong&gt;meaningful part&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;titles, headings, paragraphs, and contextual metadata.&lt;/p&gt;

&lt;p&gt;Each chunk gets its own embedding.&lt;/p&gt;

&lt;p&gt;That allows highly precise, paragraph-level similarity detection.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;📊 Step 3: Computing Semantic Similarity&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Once all candidates are scraped, Nooth performs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;cosine similarity&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;conceptual overlap scoring&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;syntactic variance weighting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;partial-match scoring (for rewritten content)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This combination is what lets Nooth detect even heavily rephrased articles.&lt;/p&gt;

&lt;p&gt;It doesn’t just ask:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Do these texts share words?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It asks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Do these texts express the same thing?”&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;🔍 Step 4: Presenting Clear Evidence&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The final result looks like this (example flow):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;You paste a paragraph&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Engine breaks it into meaning blocks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It fetches semantically closest content across the web&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You get:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-   URL

-   domain

-   similarity percentage

-   highlighted overlapping ideas

-   reasoning on _why_ the match is close
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxc7fbzsdrckb2qtfmd1s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxc7fbzsdrckb2qtfmd1s.png" alt="Ui Nooth" width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This makes it useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;discovering who’s copying your content&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;researching related topics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;finding competitors&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;understanding what else exists on the same theme&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;generating new ideas based on existing clusters&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsyj9aklbsfzco6g1pq9x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsyj9aklbsfzco6g1pq9x.png" alt="UI NOOTH" width="800" height="748"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;🧭 Why I Built It&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I didn’t want another Google-like keyword search.&lt;/p&gt;

&lt;p&gt;I wanted a lens into the &lt;strong&gt;semantic web&lt;/strong&gt; — the world of ideas, not strings.&lt;/p&gt;

&lt;p&gt;Whether you’re a researcher, writer, builder, or SEO nerd, sometimes you need to find content that’s &lt;em&gt;conceptually&lt;/em&gt; similar, not textually identical.&lt;/p&gt;

&lt;p&gt;Nooth solves that by combining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;modern embeddings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;optimized scraping&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;semantic search&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;scoring heuristics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;and clean UI on top of everything&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It shows you what’s related on an idea level, not just word level.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;🧪 Try It Out&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you want to test it:&lt;/p&gt;

&lt;p&gt;👉 Paste any text.&lt;/p&gt;

&lt;p&gt;👉 See what the web holds that’s semantically connected.&lt;/p&gt;

&lt;p&gt;👉 Explore clusters of related ideas you didn’t even know existed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nooth.dev/" rel="noopener noreferrer"&gt;https://nooth.dev/&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;🔮 What’s Next&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I’m currently working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;deeper plagiarism detection&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;topic extraction&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;content clustering&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;semantic timelines&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;alerts when new similar content appears&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;full API for developers&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If any of this sounds interesting, I’d love feedback.&lt;/p&gt;

&lt;p&gt;Thanks for reading — and welcome to the world of semantic discovery.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>showdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Django Has Never Let Me Down</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Wed, 13 Aug 2025 12:06:12 +0000</pubDate>
      <link>https://dev.to/stokry/django-has-never-let-me-down-19b8</link>
      <guid>https://dev.to/stokry/django-has-never-let-me-down-19b8</guid>
      <description>&lt;p&gt;I’ve been working with Django for years, and honestly — it has never once let me down.&lt;/p&gt;

&lt;p&gt;It doesn’t matter if it’s a &lt;strong&gt;huge, complex project&lt;/strong&gt; with a million moving parts, a &lt;strong&gt;small personal app&lt;/strong&gt;, or just a &lt;strong&gt;CMS for a simple website, blog, or magazine&lt;/strong&gt; — Django delivers every single time.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Big or Small, It Just Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Whether I’m building a &lt;strong&gt;full-fledged web application&lt;/strong&gt; or using Django purely as an &lt;strong&gt;API backend&lt;/strong&gt;, the experience is the same:&lt;/p&gt;

&lt;p&gt;I know &lt;strong&gt;exactly what I need to do and where to do it&lt;/strong&gt;. The structure is predictable, the patterns are clear, and there’s a kind of quiet confidence that comes with using it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Help Is Always There&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Django community is one of the friendliest and most resourceful I’ve ever encountered.&lt;/p&gt;

&lt;p&gt;If you get stuck, there’s always an answer online — whether it’s in the official docs, Stack Overflow, or countless blog posts and tutorials.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Secure by Default&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Security is not something you have to “bolt on” later with Django.&lt;/p&gt;

&lt;p&gt;It’s there from the start — CSRF protection, SQL injection prevention, secure authentication, and a track record of taking security seriously.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Easy Deployment&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Deploying Django is straightforward.&lt;/p&gt;

&lt;p&gt;Whether it’s on a small VPS, a cloud service, or containerized in Docker, it’s a smooth process. No magic tricks required.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why Look Anywhere Else?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I genuinely don’t see a reason to switch to another framework. For me, Django is the perfect mix of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Stability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Predictability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Flexibility&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s the framework I trust, the one I know inside out, and the one that keeps delivering — no matter the scale or complexity of the project.&lt;/p&gt;




&lt;p&gt;If I had to sum it up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Django is my go-to, and I’m not planning to change that anytime soon.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>django</category>
      <category>python</category>
    </item>
    <item>
      <title>We Built a Django App Serving 10k Mobile Users, 2k Admins, and 30 Companies – Here’s What Worked</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Mon, 11 Aug 2025 10:55:03 +0000</pubDate>
      <link>https://dev.to/stokry/we-built-a-django-app-serving-10k-mobile-users-2k-admins-and-30-companies-heres-what-worked-c2k</link>
      <guid>https://dev.to/stokry/we-built-a-django-app-serving-10k-mobile-users-2k-admins-and-30-companies-heres-what-worked-c2k</guid>
      <description>&lt;p&gt;We started with a single &lt;strong&gt;monolithic Django app&lt;/strong&gt; that was buckling under &lt;strong&gt;API traffic&lt;/strong&gt; and &lt;strong&gt;ERP sync jobs&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Here's how we turned it into a &lt;strong&gt;lean, scalable platform&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Scope
&lt;/h2&gt;

&lt;p&gt;Our platform now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Serves &lt;strong&gt;10,000+ active mobile app users&lt;/strong&gt; daily&lt;/li&gt;
&lt;li&gt;Powers an &lt;strong&gt;admin portal&lt;/strong&gt; for 2,000+ staff&lt;/li&gt;
&lt;li&gt;Runs &lt;strong&gt;30 different company logics&lt;/strong&gt; with their own ERP integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We needed a setup that could &lt;strong&gt;scale independently&lt;/strong&gt;, keep &lt;strong&gt;costs under control&lt;/strong&gt;, and &lt;strong&gt;securely isolate tenants&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture That Worked
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. API/Web separated from Workers
&lt;/h3&gt;

&lt;p&gt;We split the API/web layer from background workers.&lt;br&gt;&lt;br&gt;
Web pods scale on &lt;strong&gt;traffic spikes&lt;/strong&gt;, workers scale on &lt;strong&gt;ERP sync queue depth&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
This means heavy data processing &lt;strong&gt;never slows down API responses&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Multi-tenant PostgreSQL (RLS)
&lt;/h3&gt;

&lt;p&gt;We chose &lt;strong&gt;Row-Level Security&lt;/strong&gt; for tenant isolation.&lt;br&gt;&lt;br&gt;
It’s secure, easier to manage than schema-per-tenant in our case, and allows shared migrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Celery + Redis
&lt;/h3&gt;

&lt;p&gt;Our ERP syncs run hourly per tenant. Celery workers handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Heavy data imports&lt;/li&gt;
&lt;li&gt;Parsing and bulk upserts&lt;/li&gt;
&lt;li&gt;Notifications&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. API Gateway
&lt;/h3&gt;

&lt;p&gt;We placed an API gateway in front of Django for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auth&lt;/li&gt;
&lt;li&gt;Rate limits&lt;/li&gt;
&lt;li&gt;Request shaping&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. S3 + CDN
&lt;/h3&gt;

&lt;p&gt;For media and assets. Redis is also used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session store&lt;/li&gt;
&lt;li&gt;Caching hot queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Observability
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sentry&lt;/strong&gt; for errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus + Grafana&lt;/strong&gt; for metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ELK stack&lt;/strong&gt; for logs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Zero-downtime deploys
&lt;/h3&gt;

&lt;p&gt;Feature flags let us roll out new ERP parsers without breaking production.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results After 6 Months
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;~30% lower cloud spend&lt;/strong&gt; compared to monolithic scaling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5–7× faster ERP syncs&lt;/strong&gt; thanks to bulk upserts &amp;amp; isolated worker scaling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zero cross-tenant data leaks&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New company onboarding in &amp;lt; 2 days&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Separate workloads early&lt;/strong&gt; – Scaling API and workers independently saves money and avoids performance bottlenecks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invest in observability&lt;/strong&gt; – Catching issues early is cheaper than firefighting later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-tenancy design is critical&lt;/strong&gt; – Choose your isolation model carefully; it’s hard to change later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature flags&lt;/strong&gt; are invaluable for safe rollouts.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;Curious how others have scaled Django for multi-tenant apps — what’s your go-to trick?&lt;/p&gt;

&lt;p&gt;Need a hand with Django? Drop by &lt;a href="https://djangofixer.com/" rel="noopener noreferrer"&gt;djangofixer.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>django</category>
      <category>scalability</category>
      <category>casestudy</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Launched DjangoFixer — Here’s Why</title>
      <dc:creator>Stokry</dc:creator>
      <pubDate>Thu, 07 Aug 2025 11:22:14 +0000</pubDate>
      <link>https://dev.to/stokry/i-launched-djangofixer-heres-why-39p0</link>
      <guid>https://dev.to/stokry/i-launched-djangofixer-heres-why-39p0</guid>
      <description>&lt;p&gt;Over the past few years, I’ve been working with Django on a daily basis — both in teams and solo. And no matter how experienced the developer is, &lt;strong&gt;problems still happen&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Async views breaking in production&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Strange deployment errors out of nowhere&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API endpoints returning 500s&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migrations gone rogue&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Can you just take a look?” messages at 11pm 😅&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After getting &lt;strong&gt;countless messages&lt;/strong&gt; from fellow devs and friends like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Hey, I’ve got this Django bug — can you fix it?”&lt;/p&gt;

&lt;p&gt;I started to realize: &lt;strong&gt;there’s a pattern here&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;People don’t always need a full-time Django consultant.&lt;/p&gt;

&lt;p&gt;They just want &lt;strong&gt;a fast, no-BS fix&lt;/strong&gt; for their current problem — without having to hire a whole agency or dig through StackOverflow for hours.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;So I built DjangoFixer.com ⚡&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://djangofixer.com" rel="noopener noreferrer"&gt;DjangoFixer&lt;/a&gt; is a small service I launched to help developers and teams &lt;strong&gt;fix Django problems quickly and professionally&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Think of it as a &lt;strong&gt;Django emergency room&lt;/strong&gt; — focused only on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Bug fixing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deployment troubleshooting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Async + API issues&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Performance slowdowns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DB migrations, and more&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I don’t do everything. I just fix Django issues — &lt;strong&gt;fast&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No delays, no confusion, no vague consulting contracts.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why this makes sense (at least to me):&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;🧠 Specialization means I get better at these problems every week.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🛠️ Most teams don’t need long-term help — they just need &lt;strong&gt;this one thing fixed&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;⏱️ Turnaround time matters. When stuff is broken, you don’t want to wait 2 weeks for an agency call.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Want to try it out?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Check it out here → &lt;a href="https://djangofixer.com" rel="noopener noreferrer"&gt;DjangoFixer.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’re dealing with Django bugs that slow you down — I’d be happy to help.&lt;/p&gt;

&lt;p&gt;And if you’ve built something similar or are running a microservice business, let’s connect! 🙌&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>python</category>
      <category>django</category>
    </item>
  </channel>
</rss>
