<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tilak Raj</title>
    <description>The latest articles on DEV Community by Tilak Raj (@tilak_raj_tech).</description>
    <link>https://dev.to/tilak_raj_tech</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843763%2F7ca40159-02b5-4c8a-af8b-a35ee47a408d.jpeg</url>
      <title>DEV Community: Tilak Raj</title>
      <link>https://dev.to/tilak_raj_tech</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tilak_raj_tech"/>
    <language>en</language>
    <item>
      <title>Why I Switched From GPT-4 to Small Language Models for Two of My Products</title>
      <dc:creator>Tilak Raj</dc:creator>
      <pubDate>Wed, 25 Mar 2026 22:27:16 +0000</pubDate>
      <link>https://dev.to/tilak_raj_tech/why-i-switched-from-gpt-4-to-small-language-models-for-two-of-my-products-4ib4</link>
      <guid>https://dev.to/tilak_raj_tech/why-i-switched-from-gpt-4-to-small-language-models-for-two-of-my-products-4ib4</guid>
      <description>&lt;p&gt;GPT-4 and Claude Sonnet are not always the right model for the job. After 18 months of running AI products in production, I've moved two of my products from frontier models to small language models — and the results have been better latency, lower cost, and in one case, higher accuracy on the specific task. Here is exactly what I did and why. &lt;/p&gt;




&lt;h1&gt;
  
  
  Background: The Two Products That Changed
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Product 1: AgriIntel — Crop recommendation classification
&lt;/h2&gt;

&lt;p&gt;AgriIntel uses AI to classify incoming sensor data events and route them to the appropriate recommendation workflow.&lt;/p&gt;

&lt;p&gt;The classification task is:&lt;/p&gt;

&lt;p&gt;Given a set of sensor readings (soil moisture, temperature, nutrient levels, weather forecast), classify what type of agronomic decision is needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Irrigation&lt;/li&gt;
&lt;li&gt;Fertilization&lt;/li&gt;
&lt;li&gt;Pest management&lt;/li&gt;
&lt;li&gt;Harvest timing&lt;/li&gt;
&lt;li&gt;No action&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a classification task with a fixed taxonomy. GPT-4o was doing it well — but at &lt;strong&gt;$0.005 per classification&lt;/strong&gt;, at &lt;strong&gt;15,000+ classifications per day&lt;/strong&gt;, the cost was significant.&lt;/p&gt;

&lt;p&gt;Latency was also &lt;strong&gt;800ms–1.2s&lt;/strong&gt; for a task where users expect near-instant feedback. &lt;/p&gt;




&lt;h2&gt;
  
  
  Product 2: CanadaCompliance — Regulation change impact classification
&lt;/h2&gt;

&lt;p&gt;CanadaCompliance.ai monitors regulatory changes and classifies each change by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Industry sector affected&lt;/li&gt;
&lt;li&gt;Type of obligation (new requirement, amendment, repeal)&lt;/li&gt;
&lt;li&gt;Urgency level (immediate action, planning horizon, informational)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Again — fixed taxonomy classification with high volume. &lt;/p&gt;




&lt;h1&gt;
  
  
  Why Small Language Models Made Sense
&lt;/h1&gt;

&lt;p&gt;The key insight:&lt;/p&gt;

&lt;p&gt;Frontier models are optimized for general capability. For specific classification tasks, that capability is overkill — and you pay for it in cost and latency.&lt;/p&gt;

&lt;p&gt;Small language models (Phi-3, Mistral 7B, Llama 3.2) are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Much faster (50–200ms vs 800ms–2s)&lt;/li&gt;
&lt;li&gt;Much cheaper (10–100× lower cost)&lt;/li&gt;
&lt;li&gt;Fine-tuneable to specific tasks&lt;/li&gt;
&lt;li&gt;Privately hostable for data residency needs &lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  The Fine-Tuning Process for AgriIntel
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Step 1: Build training dataset
&lt;/h2&gt;

&lt;p&gt;I generated a labeled dataset using GPT-4o — using the model I was replacing to label 3,000 examples.&lt;/p&gt;

&lt;p&gt;This is a common pattern:&lt;br&gt;
Use a strong model to generate training data for a smaller model.&lt;/p&gt;

&lt;p&gt;Example workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate labeled examples&lt;/li&gt;
&lt;li&gt;Format JSONL dataset&lt;/li&gt;
&lt;li&gt;Prepare training pipeline&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 2: Fine-tune the model
&lt;/h2&gt;

&lt;p&gt;I fine-tuned GPT-4o-mini using OpenAI’s fine-tuning API.&lt;/p&gt;

&lt;p&gt;Why GPT-4o-mini?&lt;/p&gt;

&lt;p&gt;It is smaller, cheaper, and performs better on specialized tasks while keeping OpenAI API simplicity. &lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Benchmark results
&lt;/h2&gt;

&lt;p&gt;Before switching production traffic, I tested both models on a 500-example dataset:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GPT-4o:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy: 96.2%&lt;/li&gt;
&lt;li&gt;Latency: 1100ms&lt;/li&gt;
&lt;li&gt;Cost: $0.0048 per call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fine-tuned GPT-4o-mini:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy: 97.1%&lt;/li&gt;
&lt;li&gt;Latency: 280ms&lt;/li&gt;
&lt;li&gt;Cost: $0.00048 per call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Improvements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost reduction: 90%&lt;/li&gt;
&lt;li&gt;Latency reduction: 75%&lt;/li&gt;
&lt;li&gt;Accuracy improvement: +0.9% &lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Why the Fine-Tuned Model Performed Better
&lt;/h1&gt;

&lt;p&gt;GPT-4o tries to be helpful and nuanced, which sometimes adds unnecessary complexity.&lt;/p&gt;

&lt;p&gt;The fine-tuned model learned:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exact taxonomy&lt;/li&gt;
&lt;li&gt;Expected output structure&lt;/li&gt;
&lt;li&gt;Domain edge cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For structured classification tasks, precision beats general capability.&lt;/p&gt;

&lt;p&gt;Fine-tuning teaches the model:&lt;br&gt;
How to apply knowledge to your specific domain. &lt;/p&gt;




&lt;h1&gt;
  
  
  When NOT to Use Small Language Models
&lt;/h1&gt;

&lt;p&gt;This approach does &lt;strong&gt;NOT&lt;/strong&gt; work for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open-ended generation (reports, documents)&lt;/li&gt;
&lt;li&gt;Complex reasoning tasks&lt;/li&gt;
&lt;li&gt;Low-volume workloads&lt;/li&gt;
&lt;li&gt;Rapidly changing taxonomies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use frontier models when flexibility matters more than cost. &lt;/p&gt;




&lt;h1&gt;
  
  
  Decision Framework
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Use fine-tuned SLM when:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Volume &amp;gt; 1,000 calls/day&lt;/li&gt;
&lt;li&gt;Fixed taxonomy&lt;/li&gt;
&lt;li&gt;Stable task definition&lt;/li&gt;
&lt;li&gt;Latency matters&lt;/li&gt;
&lt;li&gt;Cost matters&lt;/li&gt;
&lt;li&gt;You have training data&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use frontier models when:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Volume is low&lt;/li&gt;
&lt;li&gt;Task requires reasoning&lt;/li&gt;
&lt;li&gt;Task changes frequently&lt;/li&gt;
&lt;li&gt;No training data exists&lt;/li&gt;
&lt;li&gt;Output quality variance is risky &lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Results Summary
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;AgriIntel improvements:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cost reduction: &lt;strong&gt;90%&lt;/strong&gt;&lt;br&gt;
Latency reduction: &lt;strong&gt;75%&lt;/strong&gt;&lt;br&gt;
Accuracy improvement: &lt;strong&gt;+0.9%&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Monthly savings:&lt;br&gt;
&lt;strong&gt;$3,100/month (~$37,000/year)&lt;/strong&gt; &lt;/p&gt;




&lt;h1&gt;
  
  
  About the Author
&lt;/h1&gt;

&lt;p&gt;Tilak Raj is CEO &amp;amp; Founder of Brainfy AI, building vertical AI SaaS products across agriculture, insurance, aviation compliance, and real estate.&lt;/p&gt;

&lt;p&gt;Website:&lt;br&gt;
&lt;a href="https://tilakraj.info" rel="noopener noreferrer"&gt;https://tilakraj.info&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Projects:&lt;br&gt;
&lt;a href="https://tilakraj.info/projects" rel="noopener noreferrer"&gt;https://tilakraj.info/projects&lt;/a&gt; &lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>openai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Building AI Agents That Actually Work in Production: My Technical Approach</title>
      <dc:creator>Tilak Raj</dc:creator>
      <pubDate>Wed, 25 Mar 2026 22:24:59 +0000</pubDate>
      <link>https://dev.to/tilak_raj_tech/building-ai-agents-that-actually-work-in-production-my-technical-approach-2oap</link>
      <guid>https://dev.to/tilak_raj_tech/building-ai-agents-that-actually-work-in-production-my-technical-approach-2oap</guid>
      <description>&lt;p&gt;Building an AI agent that works in a demo is easy. Building one that works reliably in production is a completely different engineering challenge.&lt;/p&gt;

&lt;p&gt;Production systems must handle real users, real data, and real consequences when things fail.&lt;/p&gt;

&lt;p&gt;This is the production agent architecture I use across Brainfy AI and Navlyt, along with real code patterns and failure modes I design around. &lt;/p&gt;




&lt;h2&gt;
  
  
  What Makes Production Agents Different From Demo Agents
&lt;/h2&gt;

&lt;p&gt;Demo agents optimize for the happy path.&lt;/p&gt;

&lt;p&gt;Production agents must handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Real data variance&lt;/strong&gt;&lt;br&gt;
Production inputs are messy, ambiguous, and full of edge cases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Concurrent executions&lt;/strong&gt;&lt;br&gt;
Multiple agent instances running simultaneously with shared state.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Long-running tasks&lt;/strong&gt;&lt;br&gt;
Agents that may take minutes or hours requiring durable execution state.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost management&lt;/strong&gt;&lt;br&gt;
Confused agents making unnecessary tool calls can become expensive quickly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;br&gt;
You must understand exactly what the agent decided and why.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Core Architecture: Durable Agent State
&lt;/h2&gt;

&lt;p&gt;The most important production decision:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep agent state in a database — not in memory.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In-memory state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dies with the server&lt;/li&gt;
&lt;li&gt;Cannot scale horizontally&lt;/li&gt;
&lt;li&gt;Cannot be audited&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Database state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Survives restarts&lt;/li&gt;
&lt;li&gt;Enables horizontal scaling&lt;/li&gt;
&lt;li&gt;Provides observability&lt;/li&gt;
&lt;li&gt;Enables debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Agent execution state table&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;agent_executions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;

 &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;agent_type&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;valid_status&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s1"&gt;'running'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s1"&gt;'completed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s1"&gt;'failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s1"&gt;'cancelled'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s1"&gt;'awaiting_review'&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="p"&gt;),&lt;/span&gt;

 &lt;span class="n"&gt;input_data&lt;/span&gt; &lt;span class="n"&gt;JSONB&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="n"&gt;JSONB&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="k"&gt;result&lt;/span&gt; &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;step_count&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;token_count&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

 &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

 &lt;span class="n"&gt;completed_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;

&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Tool call log for observability&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;agent_tool_calls&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;

 &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;execution_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;agent_executions&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;step_number&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;tool_input&lt;/span&gt; &lt;span class="n"&gt;JSONB&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;tool_output&lt;/span&gt; &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;latency_ms&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;called_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Agent Loop With Production Safeguards
&lt;/h2&gt;

&lt;p&gt;Production agents need hard limits.&lt;/p&gt;

&lt;p&gt;Example safeguards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Step limits&lt;/li&gt;
&lt;li&gt;Token limits&lt;/li&gt;
&lt;li&gt;Timeout limits&lt;/li&gt;
&lt;li&gt;Failure conditions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example TypeScript loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/agents/production-agent.ts&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;AGENT_LIMITS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

 &lt;span class="na"&gt;maxSteps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="na"&gt;stepTimeoutMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="na"&gt;totalTimeoutMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;runAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

 &lt;span class="nx"&gt;executionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SupabaseClient&lt;/span&gt;

&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

 &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

 &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;execution&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;loadExecution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="nx"&gt;executionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="nx"&gt;supabase&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="nx"&gt;executionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;running&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="nx"&gt;supabase&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
     &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;

   &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;step_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;AGENT_LIMITS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxSteps&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;

     &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;failWithReason&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="nx"&gt;executionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;MAX_STEPS_EXCEEDED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nx"&gt;supabase&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;

     &lt;span class="k"&gt;return&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;token_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;AGENT_LIMITS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;

     &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;failWithReason&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="nx"&gt;executionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;MAX_TOKENS_EXCEEDED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nx"&gt;supabase&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;

     &lt;span class="k"&gt;return&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;AGENT_LIMITS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;totalTimeoutMs&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;

     &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;failWithReason&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="nx"&gt;executionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TOTAL_TIMEOUT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nx"&gt;supabase&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;

     &lt;span class="k"&gt;return&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
     &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;step_count&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;

   &lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;token_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt;
     &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

   &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;persistState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="nx"&gt;executionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="nx"&gt;supabase&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Human-in-the-Loop Gate
&lt;/h2&gt;

&lt;p&gt;For actions that are difficult to reverse, I require human approval.&lt;/p&gt;

&lt;p&gt;The agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prepares the action&lt;/li&gt;
&lt;li&gt;Sets status to awaiting_review&lt;/li&gt;
&lt;li&gt;Stops execution&lt;/li&gt;
&lt;li&gt;Waits for approval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;APPROVAL_REQUIRED_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;

 &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;send_email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;update_customer_record&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;generate_compliance_document&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;submit_to_regulator&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;executeToolCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

 &lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="nx"&gt;executionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="nx"&gt;supabase&lt;/span&gt;

&lt;span class="p"&gt;){&lt;/span&gt;

 &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;APPROVAL_REQUIRED_TOOLS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)){&lt;/span&gt;

   &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="nx"&gt;executionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;awaiting_review&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="nx"&gt;supabase&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AgentPausedError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Human approval required&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="p"&gt;}&lt;/span&gt;

 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Monitoring: What I Track in Production
&lt;/h2&gt;

&lt;p&gt;Metrics I monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Step efficiency&lt;/li&gt;
&lt;li&gt;Tool success rate&lt;/li&gt;
&lt;li&gt;Human review escalation rate&lt;/li&gt;
&lt;li&gt;Token cost per completion&lt;/li&gt;
&lt;li&gt;Completion rate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example health query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

 &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agent_health_metrics&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="p"&gt;{&lt;/span&gt;

   &lt;span class="na"&gt;agent_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;compliance_document_generator&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

   &lt;span class="na"&gt;since&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;
       &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
     &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

 &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Typical results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Completion rate: &lt;strong&gt;94%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Avg steps: &lt;strong&gt;8.3&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Human review rate: &lt;strong&gt;3.1%&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Lessons
&lt;/h2&gt;

&lt;p&gt;Production agents require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Durable state&lt;/li&gt;
&lt;li&gt;Hard execution limits&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Cost controls&lt;/li&gt;
&lt;li&gt;Human approval gates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most failures come from missing safeguards, not model quality.&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tilak Raj&lt;/strong&gt;&lt;br&gt;
Founder &amp;amp; CEO — Brainfy AI&lt;/p&gt;

&lt;p&gt;Building vertical AI SaaS across compliance, real estate, agriculture, and aviation.&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://www.tilakraj.info" rel="noopener noreferrer"&gt;https://www.tilakraj.info&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Projects: &lt;a href="https://www.tilakraj.info/projects" rel="noopener noreferrer"&gt;https://www.tilakraj.info/projects&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Questions about production agents? Drop a comment — I reply to all of them.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>machinelearning</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Compound AI Systems: How I Connect Multiple Models in a Single Production Product</title>
      <dc:creator>Tilak Raj</dc:creator>
      <pubDate>Wed, 25 Mar 2026 22:22:06 +0000</pubDate>
      <link>https://dev.to/tilak_raj_tech/compound-ai-systems-how-i-connect-multiple-models-in-a-single-production-product-5cca</link>
      <guid>https://dev.to/tilak_raj_tech/compound-ai-systems-how-i-connect-multiple-models-in-a-single-production-product-5cca</guid>
      <description>&lt;h2&gt;
  
  
  Why Single-Model AI Is Not Enough
&lt;/h2&gt;

&lt;p&gt;Single-model AI calls are increasingly insufficient for production AI products.&lt;/p&gt;

&lt;p&gt;The most capable AI systems today combine multiple models, retrievers, validators, and tools working together.&lt;/p&gt;

&lt;p&gt;This is the compound AI architecture I've settled on after building across multiple production products, along with real patterns from systems that have shipped.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a Compound AI System?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;compound AI system&lt;/strong&gt; routes different parts of a task to the most appropriate component instead of sending everything to a single model.&lt;/p&gt;

&lt;p&gt;These components typically include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple language models (different models for different subtasks)&lt;/li&gt;
&lt;li&gt;Retrieval systems (vector databases, search, structured queries)&lt;/li&gt;
&lt;li&gt;Code executors (data analysis, calculations, transformations)&lt;/li&gt;
&lt;li&gt;External tool calls (APIs, databases, file systems)&lt;/li&gt;
&lt;li&gt;Validation and checking components&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The orchestration layer decides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which components handle each part of the task&lt;/li&gt;
&lt;li&gt;How context flows between components&lt;/li&gt;
&lt;li&gt;How outputs are combined into a final response&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Architecture I Use: Orchestrator + Specialist Pattern
&lt;/h2&gt;

&lt;p&gt;Across my products, I've found the &lt;strong&gt;orchestrator + specialist pattern&lt;/strong&gt; to be the most reliable compound architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Orchestrator
&lt;/h3&gt;

&lt;p&gt;A planning model that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receives the full task&lt;/li&gt;
&lt;li&gt;Breaks it into subtasks&lt;/li&gt;
&lt;li&gt;Decides which specialist handles each subtask&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical models I use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-4o&lt;/li&gt;
&lt;li&gt;Claude Sonnet&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Specialists
&lt;/h3&gt;

&lt;p&gt;Purpose-built components for specific subtasks.&lt;/p&gt;

&lt;p&gt;These may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI models&lt;/li&gt;
&lt;li&gt;Deterministic backend code&lt;/li&gt;
&lt;li&gt;Retrieval systems&lt;/li&gt;
&lt;li&gt;Processing pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Validator
&lt;/h3&gt;

&lt;p&gt;A lightweight checking component that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validates outputs&lt;/li&gt;
&lt;li&gt;Prevents hallucinations&lt;/li&gt;
&lt;li&gt;Ensures format correctness&lt;/li&gt;
&lt;li&gt;Confirms requirements before returning results&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Example TypeScript Architecture
&lt;/h2&gt;

&lt;p&gt;Here is a simplified version of how I structure compound AI orchestration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// types/compound-ai.ts&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Task&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;requiredOutputType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;SubTask&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;parentTaskId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;specialistType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SpecialistType&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;dependsOn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;SpecialistType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rag_retrieval&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;document_extraction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;compliance_check&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;draft_generation&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;validation&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;code_execution&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;structured_extraction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;SpecialistResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;subTaskId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;confidenceScore&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why This Architecture Works
&lt;/h2&gt;

&lt;p&gt;This pattern works because it mirrors how real engineering systems scale:&lt;/p&gt;

&lt;p&gt;Instead of forcing one model to do everything, you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Break problems into smaller parts&lt;/li&gt;
&lt;li&gt;Assign the right tool to each task&lt;/li&gt;
&lt;li&gt;Validate before merging results&lt;/li&gt;
&lt;li&gt;Keep orchestration logic separate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This dramatically improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reliability&lt;/li&gt;
&lt;li&gt;Cost efficiency&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Output quality&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Lesson
&lt;/h2&gt;

&lt;p&gt;The biggest improvement in AI systems doesn't come from better prompts.&lt;/p&gt;

&lt;p&gt;It comes from &lt;strong&gt;better architecture&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The teams that win with AI products are not the ones using the newest models.&lt;/p&gt;

&lt;p&gt;They are the ones building &lt;strong&gt;repeatable compound systems&lt;/strong&gt; that combine models, tools, and validation layers effectively.&lt;/p&gt;




&lt;h2&gt;
  
  
  About Me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tilak Raj&lt;/strong&gt;&lt;br&gt;
Founder &amp;amp; CEO — Brainfy AI&lt;/p&gt;

&lt;p&gt;Building vertical AI SaaS across compliance, real estate, agriculture, and aviation.&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://www.tilakraj.info" rel="noopener noreferrer"&gt;https://www.tilakraj.info&lt;/a&gt;&lt;br&gt;
Projects: &lt;a href="https://www.tilakraj.info/projects" rel="noopener noreferrer"&gt;https://www.tilakraj.info/projects&lt;/a&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>openai</category>
      <category>rag</category>
    </item>
    <item>
      <title>Next.js + Supabase + OpenAI. The exact stack I use to ship AI SaaS in 30 days</title>
      <dc:creator>Tilak Raj</dc:creator>
      <pubDate>Wed, 25 Mar 2026 22:14:50 +0000</pubDate>
      <link>https://dev.to/tilak_raj_tech/nextjs-supabase-openai-the-exact-stack-i-use-to-ship-ai-saas-in-30-days-59d9</link>
      <guid>https://dev.to/tilak_raj_tech/nextjs-supabase-openai-the-exact-stack-i-use-to-ship-ai-saas-in-30-days-59d9</guid>
      <description>&lt;h2&gt;
  
  
  Next.js + Supabase + OpenAI. The exact stack I use to ship AI SaaS in 30 days
&lt;/h2&gt;

&lt;p&gt;I have shipped 8 production AI SaaS products using this stack. This is not a beginner tutorial. This is the production architecture I use after learning what fails at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Next.js App Router&lt;/strong&gt;&lt;br&gt;
Server components remove data fetching complexity. API routes stay close to features. TypeScript everywhere. Simple Vercel deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supabase&lt;/strong&gt;&lt;br&gt;
PostgreSQL. Auth. Storage. Realtime. Row Level Security. One platform instead of five services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI&lt;/strong&gt;&lt;br&gt;
Reliable API. Structured outputs. Function calling. I also use Claude and other models but OpenAI patterns remain my baseline.&lt;/p&gt;

&lt;p&gt;The main reason is familiarity. I know the failure points and scaling limits. That removes decision overhead.&lt;/p&gt;
&lt;h2&gt;
  
  
  Project structure
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-ai-saas/

app/
 (auth)/
   login/page.tsx
   signup/page.tsx

 (dashboard)/
   layout.tsx
   page.tsx

 [feature]/
   page.tsx
   _components/

 api/
   ai/
     generate/route.ts
     stream/route.ts

   webhooks/
     stripe/route.ts

lib/
 supabase/
   client.ts
   server.ts

 ai/
   client.ts
   prompts/

types/
 database.types.ts
 api.types.ts

supabase/
 migrations/
 seed.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Supabase patterns that matter
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Row level security from day one
&lt;/h3&gt;

&lt;p&gt;Every table with user data gets RLS before first production data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="n"&gt;ENABLE&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;users_select_own_documents&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;
&lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents future security incidents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server client pattern
&lt;/h3&gt;

&lt;p&gt;Different Supabase clients for server and browser.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

 &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cookieStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;cookies&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;createServerClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NEXT_PUBLIC_SUPABASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NEXT_PUBLIC_SUPABASE_ANON_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="na"&gt;cookies&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;
    &lt;span class="nf"&gt;getAll&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
     &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cookieStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getAll&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  OpenAI patterns I always use
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Structured outputs
&lt;/h3&gt;

&lt;p&gt;Never parse text manually. Always validate with schema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ClaimDataSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;

 &lt;span class="na"&gt;claimant_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
 &lt;span class="na"&gt;policy_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
 &lt;span class="na"&gt;date_of_loss&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
 &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This removed parsing bugs across products.&lt;/p&gt;

&lt;h3&gt;
  
  
  Streaming responses
&lt;/h3&gt;

&lt;p&gt;If generation takes more than 2 seconds I stream.&lt;/p&gt;

&lt;p&gt;Users prefer progressive output instead of waiting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Authentication flow
&lt;/h2&gt;

&lt;p&gt;Supabase middleware protects dashboard routes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nextUrl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/dashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)){&lt;/span&gt;

 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/login&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple and repeatable across products.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production checklist before launch
&lt;/h2&gt;

&lt;p&gt;Before every launch I verify.&lt;/p&gt;

&lt;p&gt;RLS enabled on every table.&lt;br&gt;
Environment variables secured.&lt;br&gt;
React error boundaries added.&lt;br&gt;
AI output validation enabled.&lt;br&gt;
Rate limiting on AI routes.&lt;br&gt;
Logging for prompts and tokens.&lt;br&gt;
Database indexes on foreign keys.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key lesson
&lt;/h2&gt;

&lt;p&gt;Speed comes from repeatable architecture. Not from chasing new tools.&lt;/p&gt;

&lt;p&gt;Using the same stack across products lets me ship faster and with fewer mistakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  About me
&lt;/h2&gt;

&lt;p&gt;Tilak Raj&lt;br&gt;
Founder and CEO of Brainfy AI&lt;br&gt;
Building vertical AI SaaS across compliance, real estate, agriculture, and aviation.&lt;/p&gt;

&lt;p&gt;Website&lt;br&gt;
&lt;a href="https://www.tilakraj.info" rel="noopener noreferrer"&gt;https://www.tilakraj.info&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Projects&lt;br&gt;
&lt;a href="https://www.tilakraj.info/projects" rel="noopener noreferrer"&gt;https://www.tilakraj.info/projects&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>supabase</category>
      <category>openai</category>
      <category>saas</category>
    </item>
    <item>
      <title>How I Built an AI Compliance System for Charter Aviation Using RAG and Pinecone</title>
      <dc:creator>Tilak Raj</dc:creator>
      <pubDate>Wed, 25 Mar 2026 22:06:30 +0000</pubDate>
      <link>https://dev.to/tilak_raj_tech/how-i-built-an-ai-compliance-system-for-charter-aviation-using-rag-and-pinecone-26o9</link>
      <guid>https://dev.to/tilak_raj_tech/how-i-built-an-ai-compliance-system-for-charter-aviation-using-rag-and-pinecone-26o9</guid>
      <description>&lt;h2&gt;
  
  
  How I Built an AI Compliance System for Charter Aviation Using RAG and Pinecone
&lt;/h2&gt;

&lt;p&gt;Building RAG for compliance-critical domains is not the same as building RAG for a general-purpose chatbot. When the outputs affect an operator's certification status, wrong answers have real consequences.&lt;/p&gt;

&lt;p&gt;This is the full architecture walkthrough of &lt;strong&gt;Navlyt&lt;/strong&gt; — the compliance operating system I built for charter aviation operators.&lt;/p&gt;

&lt;p&gt;Navlyt tracks FAA, Transport Canada, and EASA regulatory requirements for small charter aviation operators. It answers compliance questions, monitors obligation status, and generates required documentation.&lt;/p&gt;

&lt;p&gt;The challenge: regulatory documents are dense, cross-referenced, and version-controlled in ways that break standard RAG approaches.&lt;/p&gt;

&lt;p&gt;This article covers the complete technical implementation — chunking strategy, retrieval architecture, answer generation with citations, and the accuracy validation approach I use for a compliance-critical domain.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With Standard RAG for Regulatory Content
&lt;/h2&gt;

&lt;p&gt;Regulatory documents have properties that make standard paragraph-level chunking produce poor retrieval results:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-references&lt;/strong&gt;&lt;br&gt;
A requirement in one section may reference definitions in another. A chunk containing only the requirement produces incomplete context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Applicability conditions&lt;/strong&gt;&lt;br&gt;
Whether a regulation applies depends on conditions defined elsewhere. Standard chunking separates requirements from applicability criteria.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version control&lt;/strong&gt;&lt;br&gt;
Regulatory documents are amended over time. Retrieval must be version-aware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Term definitions&lt;/strong&gt;&lt;br&gt;
Regulatory language uses precise definitions. Example: &lt;em&gt;Air taxi&lt;/em&gt; has a specific legal meaning.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Chunking Strategy
&lt;/h2&gt;

&lt;p&gt;After significant experimentation, I settled on a four-tier chunking approach:&lt;/p&gt;
&lt;h3&gt;
  
  
  Tier 1 — Section level chunks
&lt;/h3&gt;

&lt;p&gt;Complete sections defining terms or applicability remain intact (200-800 tokens).&lt;/p&gt;
&lt;h3&gt;
  
  
  Tier 2 — Paragraph level chunks
&lt;/h3&gt;

&lt;p&gt;Individual requirements are chunked at paragraph level with metadata:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Section number&lt;/li&gt;
&lt;li&gt;Regulation name&lt;/li&gt;
&lt;li&gt;Version&lt;/li&gt;
&lt;li&gt;Applicability category&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Tier 3 — Manual summary chunks
&lt;/h3&gt;

&lt;p&gt;Some requirements span multiple sections.&lt;/p&gt;

&lt;p&gt;For the most queried requirements I created manual summary chunks combining relevant provisions.&lt;/p&gt;

&lt;p&gt;Expensive — but critical for accuracy.&lt;/p&gt;
&lt;h3&gt;
  
  
  Tier 4 — Cross reference chunks
&lt;/h3&gt;

&lt;p&gt;For chunks with cross-references I create composite chunks including referenced content.&lt;/p&gt;

&lt;p&gt;This removes the most common failure:&lt;br&gt;
retrieving a rule without its definition.&lt;/p&gt;


&lt;h2&gt;
  
  
  Pinecone Index Architecture
&lt;/h2&gt;

&lt;p&gt;I use a single Pinecone index with namespace separation by regulation type.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
A Transport Canada operator asking about pilot currency does not need FAR Part 135 results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;NAMESPACES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="na"&gt;transport_canada&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tc_cars&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="na"&gt;faa_part_135&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;faa_135&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="na"&gt;faa_part_91&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;faa_91&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="na"&gt;easa_cs23&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;easa_cs23&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="na"&gt;operator_specific&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ops_spec&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;retrieveCompliance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="nx"&gt;operatorContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;OperatorContext&lt;/span&gt;
&lt;span class="p"&gt;){&lt;/span&gt;
 &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;targetNamespaces&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
 &lt;span class="nf"&gt;resolveApplicableNamespaces&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;operatorContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queryEmbedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
 &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;embedQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;targetNamespaces&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ns&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="nx"&gt;pinecone&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;navlyt-regulations&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
     &lt;span class="na"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;queryEmbedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="na"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="na"&gt;includeMetadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;
      &lt;span class="na"&gt;is_current&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;$eq&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;applicability_categories&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;
       &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;$in&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="nx"&gt;operatorContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;certificateCategories&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
     &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;mergeAndRerankResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Answer Generation Pipeline
&lt;/h2&gt;

&lt;p&gt;Compliance RAG differs from normal RAG.&lt;/p&gt;

&lt;p&gt;Every answer must:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cite regulatory provisions&lt;/li&gt;
&lt;li&gt;State applicability conditions&lt;/li&gt;
&lt;li&gt;Flag ambiguity&lt;/li&gt;
&lt;li&gt;Never speculate&lt;/li&gt;
&lt;li&gt;Admit uncertainty
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;COMPLIANCE_SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
You are a regulatory compliance assistant.

RULES:

1 Cite regulation sections
2 If unclear say regulations do not clearly address this
3 State applicability
4 Never speculate
5 Flag ambiguity
6 Verify requirements with Transport Canada
`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Accuracy Validation
&lt;/h2&gt;

&lt;p&gt;Standard RAG metrics are insufficient.&lt;/p&gt;

&lt;p&gt;I built a regulatory validation framework:&lt;/p&gt;

&lt;h3&gt;
  
  
  Human expert validation
&lt;/h3&gt;

&lt;p&gt;Worked with a Transport Canada aviation consultant.&lt;/p&gt;

&lt;p&gt;Built a 200 question validation set.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current accuracy: 94.2%&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Confidence scoring
&lt;/h3&gt;

&lt;p&gt;Based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval similarity&lt;/li&gt;
&lt;li&gt;Direct relevance&lt;/li&gt;
&lt;li&gt;Regulatory ambiguity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Human review triggers
&lt;/h3&gt;

&lt;p&gt;Automatic review when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confidence &amp;lt; 0.75&lt;/li&gt;
&lt;li&gt;Regulations unclear&lt;/li&gt;
&lt;li&gt;Recently amended sections
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ComplianceAnswer&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;

 &lt;span class="nl"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;

 &lt;span class="nx"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;RegulatoryCitation&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;

 &lt;span class="nx"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;

 &lt;span class="nx"&gt;requiresHumanReview&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;boolean&lt;/span&gt;

 &lt;span class="nx"&gt;applicabilityNote&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;

 &lt;span class="nx"&gt;ambiguityWarning&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;

 &lt;span class="nx"&gt;lastRegUpdateCheck&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Lessons For Building Compliance RAG
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Lesson 1 — Domain experts are mandatory
&lt;/h3&gt;

&lt;p&gt;Could not build this without aviation compliance experts.&lt;/p&gt;

&lt;p&gt;Budget for this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 2 — Chunk quality matters most
&lt;/h3&gt;

&lt;p&gt;Biggest gains came from improving chunk quality.&lt;/p&gt;

&lt;p&gt;Not embedding models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 3 — "I don't know" is correct sometimes
&lt;/h3&gt;

&lt;p&gt;Wrong confident answers are dangerous.&lt;/p&gt;

&lt;p&gt;Build strong non-answer logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 4 — Regulations require maintenance
&lt;/h3&gt;

&lt;p&gt;Regulations change constantly.&lt;/p&gt;

&lt;p&gt;Corpus updates must be part of the system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Accuracy:&lt;/strong&gt; 94.2%&lt;br&gt;
&lt;strong&gt;Latency:&lt;/strong&gt; 1.8s&lt;br&gt;
&lt;strong&gt;Human review rate:&lt;/strong&gt; 6.3%&lt;/p&gt;

&lt;p&gt;Navlyt is live at navlyt.com&lt;/p&gt;

&lt;p&gt;More architecture writing:&lt;br&gt;
tilakraj.info/blog&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;Tilak Raj is CEO &amp;amp; Founder of Brainfy AI.&lt;/p&gt;

&lt;p&gt;Building vertical AI SaaS across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agriculture&lt;/li&gt;
&lt;li&gt;Insurance&lt;/li&gt;
&lt;li&gt;Aviation compliance&lt;/li&gt;
&lt;li&gt;Real estate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Shipped 8 AI products.&lt;/p&gt;

&lt;p&gt;Writing about AI engineering and SaaS architecture.&lt;/p&gt;

&lt;p&gt;Dev.to: dev.to/tilakraj&lt;br&gt;
Website: tilakraj.info&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
