<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sandeep B Kadam</title>
    <description>The latest articles on DEV Community by Sandeep B Kadam (@sandhu93).</description>
    <link>https://dev.to/sandhu93</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3838114%2F9481d8de-7cec-4011-9587-bb8f07e1b8ce.png</url>
      <title>DEV Community: Sandeep B Kadam</title>
      <link>https://dev.to/sandhu93</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sandhu93"/>
    <language>en</language>
    <item>
      <title>Build LangChain Chains Once with Lazy Initialization</title>
      <dc:creator>Sandeep B Kadam</dc:creator>
      <pubDate>Tue, 24 Mar 2026 08:17:33 +0000</pubDate>
      <link>https://dev.to/sandhu93/build-langchain-chains-once-with-lazy-initialization-45bi</link>
      <guid>https://dev.to/sandhu93/build-langchain-chains-once-with-lazy-initialization-45bi</guid>
      <description>&lt;h2&gt;
  
  
  Build LangChain chains once with lazy initialization
&lt;/h2&gt;

&lt;p&gt;Build LangChain chains once, on demand. Guard with a &lt;code&gt;None&lt;/code&gt; check, initialize related singletons together on the first request, and let bad config fail as a clear runtime error instead of a startup crash.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this matters
&lt;/h3&gt;

&lt;p&gt;Building a LangChain chain is not free. Constructing a &lt;code&gt;SQLDatabase&lt;/code&gt; opens a database connection, inspects schema, and may sample rows. Instantiating &lt;code&gt;ChatOpenAI&lt;/code&gt; validates configuration and prepares client state. Calling &lt;code&gt;create_sql_query_chain&lt;/code&gt; wires prompts, models, and parsers into an executable graph.&lt;/p&gt;

&lt;p&gt;In our NL2SQL agent, initialization added several hundred milliseconds.&lt;/p&gt;

&lt;p&gt;If you do that work at import time, a missing DB URL or API key can kill the process before health checks or structured logs help you diagnose it. If you rebuild everything per request, you pay the same setup cost on every query before the model processes a single token.&lt;/p&gt;

&lt;p&gt;Lazy initialization avoids both problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  The problem
&lt;/h3&gt;

&lt;p&gt;Without lazy initialization, you usually end up with two bad options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Eager import-time loading:&lt;/strong&gt; startup fails immediately if config is wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-request initialization:&lt;/strong&gt; identical objects are rebuilt on every call, adding avoidable setup latency.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Naive approach vs production approach
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Naive: eager import-time loading&lt;/th&gt;
&lt;th&gt;Production: lazy singleton&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;❌ Chains built when module loads&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;None&lt;/code&gt; declared at module level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌ Crashes on missing DB or API key&lt;/td&gt;
&lt;td&gt;✅ Initialized on the first request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌ No startup without full environment&lt;/td&gt;
&lt;td&gt;✅ Hot-path guard is effectively negligible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌ Hard to unit-test without live DB&lt;/td&gt;
&lt;td&gt;✅ Bad config fails with a clear error&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌ Config read before app logs start&lt;/td&gt;
&lt;td&gt;✅ Related singletons initialized together&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibvnyf87iq76l32yydq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibvnyf87iq76l32yydq.png" alt="Request guard" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How I implemented it
&lt;/h3&gt;

&lt;p&gt;The NL2SQL agent keeps module-level placeholders for the database, LLM clients, and chains. &lt;code&gt;_get_chain()&lt;/code&gt; is the single initialization point: the first call builds everything, and later calls return the cached objects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Pseudocode
&lt;/span&gt;
&lt;span class="c1"&gt;# Initialized on first request so missing DB config or bad API keys
# fail at runtime with context, not during module import.
&lt;/span&gt;
&lt;span class="n"&gt;_db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_fast_llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;_generate_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_execute_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_rephrase_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_select_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_rewrite_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;_llm_semaphore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_chain&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return all chains, initializing them once on first use.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_fast_llm&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_generate_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_execute_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_rephrase_answer&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_select_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_rewrite_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_llm_semaphore&lt;/span&gt;

    &lt;span class="c1"&gt;# Guard: if the last-built chain exists, the rest do too.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_generate_query&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;_generate_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;_execute_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;_rephrase_answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;_select_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;_rewrite_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;_init_redis&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Redis or in-memory fallback
&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_llm_semaphore&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_llm_semaphore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Semaphore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm_max_concurrency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;_db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SQLDatabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;database_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;sample_rows_in_table_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;_llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_llm_with_fallbacks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;_fast_llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_fast_llm_with_fallbacks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;_generate_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_sql_query_chain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;_execute_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;QuerySQLDataBaseTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;_rephrase_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;answer_prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;_fast_llm&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;_select_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table_prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;_fast_llm&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;split_fn&lt;/span&gt;
    &lt;span class="n"&gt;_rewrite_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rewrite_prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;_fast_llm&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;_generate_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;_execute_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;_rephrase_answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;_select_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;_rewrite_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why check &lt;code&gt;_generate_query&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;_&lt;em&gt;generate_query&lt;/em&gt; is the last object created in the initialization sequence. If it exists, the earlier objects should exist too.&lt;/p&gt;

&lt;p&gt;That makes it a safer sentinel than &lt;code&gt;_db&lt;/code&gt;. If initialization fails midway, &lt;code&gt;_db&lt;/code&gt; might already be set while one or more chains are still missing. Guarding on the last-built object reduces the chance of returning a partially initialized state.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;_init_redis()&lt;/code&gt; uses the same pattern internally: if the client already exists, return immediately. Both guards are idempotent; only the first successful call performs real work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67k9v8ugo52idrzxcm6r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67k9v8ugo52idrzxcm6r.png" alt="Co-initialization order inside _get_chain()" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Concurrency note
&lt;/h3&gt;

&lt;p&gt;In multi-worker or highly concurrent environments, protect first-time initialization with a lock if partial initialization is possible. The pattern is sound, but concurrent first access can still create race conditions if two requests enter the initialization path at the same time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug story: lazy singleton without a TTL
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;This pattern worked well for chains, but failed when I used it for mutable data.&lt;br&gt;
The entity resolver cached the players table behind a simple &lt;code&gt;None&lt;/code&gt; guard and kept it for the lifetime of the process. That was fine until a new player was added mid-season after an IPL auction. The resolver kept serving the stale mapping, and lookups for the new player failed until the backend restarted.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The lesson is simple: lazy singletons are a good fit for resources that are expensive to build and effectively static for the lifetime of the process. They are a poor fit for data that changes over time.&lt;/p&gt;

&lt;p&gt;If the underlying data can change, add a TTL or explicit invalidation. A bare &lt;code&gt;None&lt;/code&gt; guard will cache forever.&lt;/p&gt;

&lt;h3&gt;
  
  
  General pattern
&lt;/h3&gt;

&lt;p&gt;The same idea shows up in many languages under different names: lazy initialization, initialization-on-demand, deferred construction, or memoized setup.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The pattern is always the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Declare the resource as None at the shared scope.&lt;/li&gt;
&lt;li&gt;Add a guard that returns the cached object if it already exists.&lt;/li&gt;
&lt;li&gt;Initialize once, store the result, and reuse it on later calls.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;After the first successful call, the setup cost disappears from the hot path.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it
&lt;/h3&gt;

&lt;p&gt;Use lazy singletons for expensive, effectively immutable resources such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;database connections or pools&lt;/li&gt;
&lt;li&gt;HTTP clients&lt;/li&gt;
&lt;li&gt;LLM clients&lt;/li&gt;
&lt;li&gt;LangChain chains&lt;/li&gt;
&lt;li&gt;semaphores&lt;/li&gt;
&lt;li&gt;model weights loaded once per process&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When not to use it
&lt;/h3&gt;

&lt;p&gt;Do not use a bare lazy singleton for mutable data such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lookup tables that can change&lt;/li&gt;
&lt;li&gt;feature flags&lt;/li&gt;
&lt;li&gt;config that may be reloaded&lt;/li&gt;
&lt;li&gt;caches backed by changing database rows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those cases need TTL-based refresh, invalidation, or a different caching strategy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common mistakes
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Checking the wrong sentinel&lt;/strong&gt;
Guarding on &lt;code&gt;_db&lt;/code&gt; instead of the last-built chain can expose partially initialized state after a mid-init failure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting &lt;code&gt;global&lt;/code&gt;&lt;/strong&gt;
In Python, assigning to a module-level variable inside a function creates a local unless declared `global1. The singleton never persists, and initialization repeats on every call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Splitting related initialization across multiple guards&lt;/strong&gt;
If DB, LLM, and chains are initialized in separate paths, concurrent startup can leave them out of sync. Initialize related objects together behind one guard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using the pattern for mutable data&lt;/strong&gt;
The pattern is fine for process-lifetime resources, not for data that needs refresh or invalidation.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

</description>
      <category>architecture</category>
      <category>llm</category>
      <category>performance</category>
      <category>python</category>
    </item>
    <item>
      <title>Circuit breaker for LLM provider failure</title>
      <dc:creator>Sandeep B Kadam</dc:creator>
      <pubDate>Mon, 23 Mar 2026 06:12:38 +0000</pubDate>
      <link>https://dev.to/sandhu93/circuit-breaker-for-llm-provider-failure-53f6</link>
      <guid>https://dev.to/sandhu93/circuit-breaker-for-llm-provider-failure-53f6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Stop calling a dead API. Shed load fast, recover automatically, and stay consistent across restarts with Redis-backed failure state.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why this matters&lt;/strong&gt;&lt;br&gt;
Every LLM-powered application depends on an external provider - OpenAI, Anthropic, Google, or a self-hosted model. These providers go down. Rate limits spike. Latency balloons. Without a circuit breaker, your application keeps sending requests into a black hole, burning through your budget, stacking up timeouts, and delivering a terrible experience to every user in the queue.&lt;/p&gt;

&lt;p&gt;A circuit breaker detects that the downstream service is failing and stops trying for a cooldown period. This is not about retrying harder - it's about failing fast and deliberately so the rest of your system stays healthy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Without a circuit breaker:&lt;/strong&gt; When your LLM provider starts returning 429s or 500s, every new user request still attempts a full API call. Each call waits for a timeout (often 30-60 seconds). Your concurrency pool fills up. Healthy requests get queued behind doomed ones. Your entire application appears frozen.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Naive approach vs production approach&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Naive: retry and hope&lt;/th&gt;
&lt;th&gt;Production: circuit breaker&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Retry every failed request 3 times&lt;/td&gt;
&lt;td&gt;Track failure count in a window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log the error and move on&lt;/td&gt;
&lt;td&gt;Trip open after N failures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No memory of past failures&lt;/td&gt;
&lt;td&gt;Reject instantly while open&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Each request rediscovers the outage&lt;/td&gt;
&lt;td&gt;Probe with single test request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Timeouts pile up, pool exhausted&lt;/td&gt;
&lt;td&gt;Close when probe succeeds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvc55vytp1k92e23bls5n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvc55vytp1k92e23bls5n.png" alt="Circuit breaker state machine"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How I implemented it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I implemented a circuit breaker in the NL2SQL agent that wraps every LLM provider call. When the failure count within a sliding window exceeds a threshold, the breaker trips open and all subsequent requests return an error immediately - no API call, no timeout, no wasted concurrency slot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Pseudo-code: circuit breaker wrapping an LLM call
# not a production-grade circuit breaker, sliding window not shown
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cooldown_sec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CLOSED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;failure_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cooldown_sec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cooldown_sec&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_failure_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;time_since&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_failure_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cooldown_sec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HALF_OPEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;CircuitOpenError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Provider unavailable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HALF_OPEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ProviderError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_record_failure&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_record_failure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;failure_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_failure_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;failure_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CLOSED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;failure_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key design choice:&lt;/strong&gt; The circuit breaker state is stored in Redis, not in-process memory. This matters because in a multi-replica deployment, one replica discovering the outage should protect all replicas from burning through the same dead endpoint. Without shared state, each pod independently rediscovers the failure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15ogmajpu6xaissrcr1s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15ogmajpu6xaissrcr1s.png" alt="Architecture: Redis-backed circuit breaker across replicas"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug story: the in-process fallback&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Bug:&lt;/strong&gt; During local development, Redis wasn't always running. The circuit breaker tried to read state from Redis, failed, and threw an unhandled exception - crashing the entire request before it even reached the LLM provider. &lt;br&gt;
&lt;strong&gt;The fix:&lt;/strong&gt; detect Redis connection failure and fall back to an in-process circuit breaker with the same interface. This is a classic example of a reliability mechanism introducing its own failure mode.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The lesson is important: every reliability layer must itself have a fallback. If your circuit breaker depends on Redis, and Redis is down, your circuit breaker shouldn't make things worse. The in-process fallback loses cross-replica consistency but keeps the application functional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generalized lesson&lt;/strong&gt;&lt;br&gt;
Circuit breakers aren't specific to LLM applications. They appear anywhere you call an external service that can fail: payment processors, search indices, notification services, databases. The pattern is the same:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The general pattern:&lt;/strong&gt; Track failures within a window. When failures cross a threshold, stop calling the service. After a cooldown, send one probe. If it works, resume. If it doesn't, extend the cooldown. Always degrade gracefully - never let a dead dependency take down your entire system.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;How to apply in other projects&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're wrapping any external API call, you can introduce a circuit breaker in three steps. First, wrap the call in a try/except that increments a failure counter. Second, before each call, check the counter - if it's above your threshold and the cooldown hasn't elapsed, return an error immediately. Third, after the cooldown, allow one request through and reset if it succeeds.&lt;/p&gt;

&lt;p&gt;For single-process applications, an in-memory counter is sufficient. For distributed systems, shared state can be useful when you want replicas to coordinate breaker behavior. Redis is a common choice. A database-backed approach can also work in some systems, while per-instance breakers are still sufficient for many deployments depending on traffic shape and failure tolerance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common mistakes&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;&lt;p&gt;No cooldown backoff. A fixed 60-second cooldown means the breaker reopens and gets punched again immediately during a sustained outage. Use exponential backoff on the cooldown duration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Counting all errors equally. A 429 (rate limit) is different from a 500 (server error). Rate limits often clear within seconds - tripping a 60-second breaker for a 429 is overkill. Differentiate transient vs persistent failures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Forgetting the fallback for the breaker itself. If your circuit breaker state lives in Redis and Redis goes down, you have two things broken instead of one. Always have an in-process fallback.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Notes / production caveats
&lt;/h2&gt;

&lt;p&gt;This post focuses on the pattern, not a fully hardened implementation. The pseudo-code is intentionally simplified: it does not show a true sliding window, concurrency control, single-flight probing in half-open, backoff strategy, or differentiated handling for different error classes.&lt;/p&gt;

&lt;p&gt;A few practical caveats are worth calling out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shared Redis-backed state is useful in multi-replica systems, but half-open coordination needs care. Without guardrails, multiple replicas can probe the dependency at once and create noisy recovery behavior.&lt;/li&gt;
&lt;li&gt;Redis is one valid production design, not the only one. Many systems work well with per-instance breakers combined with load-shedding, jittered retries, and strict client-side timeouts.&lt;/li&gt;
&lt;li&gt;For distributed coordination, Redis is a practical option. A database-backed approach can also work in some systems, but a shared file is usually not a serious production coordination mechanism.&lt;/li&gt;
&lt;li&gt;Failing fast should usually be paired with a fallback path: degraded mode, cached responses, queueing, or explicit messaging that the provider is temporarily unavailable.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;I’m still learning these reliability patterns by applying them in real projects. If you have suggestions, corrections, or better ways to think about this, I’d genuinely appreciate your feedback. Thank You!&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>llm</category>
      <category>redis</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Built an NL2SQL Agent for IPL Cricket While Learning How AI Agents Actually Work</title>
      <dc:creator>Sandeep B Kadam</dc:creator>
      <pubDate>Sun, 22 Mar 2026 13:37:52 +0000</pubDate>
      <link>https://dev.to/sandhu93/i-built-an-nl2sql-agent-for-ipl-cricket-while-learning-how-ai-agents-actually-work-9ec</link>
      <guid>https://dev.to/sandhu93/i-built-an-nl2sql-agent-for-ipl-cricket-while-learning-how-ai-agents-actually-work-9ec</guid>
      <description>&lt;p&gt;IPL 2026 starts this month, so I built something around it.&lt;/p&gt;

&lt;p&gt;I have been learning how to build AI agents, and one thing I kept wanting was a project that felt concrete. Not just a chatbot demo, but something that had to deal with real data, real edge cases, and real mistakes.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;IPL Cricket Analyst&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It lets you ask questions about IPL data in plain English and get back a SQL-backed answer in real time from an SQL database, along with charts, follow-up suggestions, and support for multi-turn questions. The user don't have to know the queries, LLM will generate for you.&lt;/p&gt;

&lt;p&gt;Some example questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"Who has the best death-over economy since 2020?"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Show me a bar chart of the top 10 wicket takers."&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"What is Virat Kohli's strike rate at Eden Gardens after 2022?"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Under the hood, the agent writes the SQL, validates it, runs it against 278,000+ ball-by-ball deliveries, and streams the result back while it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfgczkrikrpw7eq0lupk.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfgczkrikrpw7eq0lupk.jpeg" alt="Architecture" width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tech&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Next.js 14, TypeScript, Tailwind CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI, Python 3.11, LangChain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL, 9 tables, 278k+ rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector store&lt;/td&gt;
&lt;td&gt;ChromaDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache / History&lt;/td&gt;
&lt;td&gt;Redis 7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;GPT-4o for SQL, GPT-4o-mini for rewrite and insights&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Charts&lt;/td&gt;
&lt;td&gt;MCP Chart Server with Vega-Lite v5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The basic goal was simple: take a cricket question, turn it into SQL, run it safely, and make the result feel responsive in the UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;Each question goes through a pipeline inside &lt;code&gt;run_agent_stream()&lt;/code&gt; that looks roughly like this:&lt;/p&gt;

&lt;p&gt;Input validation&lt;br&gt;
→ Response cache check&lt;br&gt;
→ Query rewrite + history summarization&lt;br&gt;
→ Entity resolution&lt;br&gt;
→ [Table selection || Cricket RAG]&lt;br&gt;
→ SQL generation&lt;br&gt;
→ SQL validation + semantic check&lt;br&gt;
→ SQL execution&lt;br&gt;
→ [Answer rephrase || Insights || Viz]&lt;br&gt;
→ Streamed NDJSON to frontend&lt;/p&gt;

&lt;p&gt;The frontend receives events step by step, so the SQL shows up first, then the answer, then the extra pieces like insights and charts.&lt;/p&gt;

&lt;p&gt;That streaming part made a much bigger difference to the feel of the app than I expected.&lt;/p&gt;

&lt;p&gt;What turned out to be harder than I expected&lt;br&gt;
&lt;strong&gt;1. Cricket stats are tricky in ways generic NL2SQL examples do not prepare you for&lt;/strong&gt;&lt;br&gt;
A lot of NL2SQL tutorials work on clean, simple schemas. IPL data is not that.&lt;/p&gt;

&lt;p&gt;For example, a batting average is not just a straightforward aggregation. Dismissals can be subtle because the dismissed player is not always the striker. Ducks are also not a ball-level concept. They have to be computed at the innings level.&lt;/p&gt;

&lt;p&gt;I ran into a lot of cases where the SQL looked reasonable but was still wrong from a cricket point of view.&lt;/p&gt;

&lt;p&gt;To handle that better, I added:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a cricket rules document for retrieval&lt;/li&gt;
&lt;li&gt;IPL-specific few-shot SQL examples&lt;/li&gt;
&lt;li&gt;an extra semantic validation step before execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination helped a lot more than just changing prompts randomly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Accuracy improved only after I started measuring it properly&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At first I was mostly testing with ad hoc questions, which felt fine until I started noticing inconsistencies.&lt;/p&gt;

&lt;p&gt;So I put together a 50-question ground-truth evaluation set and started running the system against it repeatedly.&lt;/p&gt;

&lt;p&gt;The first version was around 82% accurate.&lt;br&gt;
After a lot of iteration, it got to 98% on that eval.&lt;/p&gt;

&lt;p&gt;Most of the improvements did not come from big architectural changes. They came from fixing very specific failure modes, like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;using the wrong grain for aggregation&lt;/li&gt;
&lt;li&gt;getting milestone logic wrong&lt;/li&gt;
&lt;li&gt;small cricket-specific details like death overs being overs 16 to 20, which in this dataset meant handling indexing carefully&lt;/li&gt;
&lt;li&gt;selecting columns that made the answer noisier than it needed to be&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was probably the biggest lesson in the whole project. Evaluation made the work much more grounded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Follow-up questions needed more care than I thought&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of the things I wanted was for follow-ups to feel natural.&lt;/p&gt;

&lt;p&gt;Questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Who scored the most runs in 2023?"&lt;/li&gt;
&lt;li&gt;"What was his strike rate?"&lt;/li&gt;
&lt;li&gt;"What about 2022?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sounds simple from a user perspective, but a lot has to go right for it to work consistently.&lt;/p&gt;

&lt;p&gt;I ended up rewriting follow-up questions into standalone questions before sending them downstream. That made the rest of the pipeline much more reliable.&lt;/p&gt;

&lt;p&gt;It was one of those changes that feels obvious in hindsight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Reliability work matters even in small projects&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I did not want this to be just a cool demo that works once.&lt;/p&gt;

&lt;p&gt;So I added some basic safeguards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;per-IP rate limiting&lt;/li&gt;
&lt;li&gt;a response cache&lt;/li&gt;
&lt;li&gt;a circuit breaker&lt;/li&gt;
&lt;li&gt;request timeouts&lt;/li&gt;
&lt;li&gt;input validation&lt;/li&gt;
&lt;li&gt;SELECT-only SQL enforcement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of that is especially flashy, but it made the project feel much more solid.&lt;/p&gt;

&lt;p&gt;What I learned&lt;/p&gt;

&lt;p&gt;This project taught me a lot about building agents in a way that feels less magical and more engineering-focused.&lt;/p&gt;

&lt;p&gt;A few things stood out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluation matters a lot.&lt;/strong&gt;&lt;br&gt;
Without a fixed eval set, it is very easy to convince yourself the system is getting better when it is just getting different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain grounding matters more than I expected.&lt;/strong&gt;&lt;br&gt;
A strong model can generate convincing SQL, but convincing is not the same as correct. The cricket-specific rules and examples made a huge difference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming helps the UX a lot.&lt;/strong&gt;&lt;br&gt;
Even when the full pipeline takes a few seconds, showing progress step by step makes the app feel much better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hard part is usually not generation.&lt;/strong&gt;&lt;br&gt;
A lot of the work ended up being around validation, edge cases, memory, retries, and handling the weird questions cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I liked building this
&lt;/h2&gt;

&lt;p&gt;I started this mainly as a way to learn more about AI agents, but it turned into a really useful exercise in building around failure cases.&lt;/p&gt;

&lt;p&gt;It is easy to make an agent look smart in a short demo.&lt;br&gt;
It is much harder to make it dependable when the inputs are messy, the domain has tricky rules, and the answer actually needs to be right.&lt;/p&gt;

&lt;p&gt;That is what made this project fun.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it yourself&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The dataset is public on Kaggle:&lt;br&gt;
&lt;a href="https://www.kaggle.com/datasets/sandeepbkadam/ipl-cricket-dataset-20082025-postgresql" rel="noopener noreferrer"&gt;https://www.kaggle.com/datasets/sandeepbkadam/ipl-cricket-dataset-20082025-postgresql&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Sandhu93/nl2sql-agent" rel="noopener noreferrer"&gt;https://github.com/Sandhu93/nl2sql-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What I want to improve next&lt;/p&gt;

&lt;p&gt;Right now I want to get better visibility into how the system behaves in practice.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitoring and observability are next on the list, especially:&lt;/li&gt;
&lt;li&gt;latency by pipeline step&lt;/li&gt;
&lt;li&gt;better failure logging&lt;/li&gt;
&lt;li&gt;more structured evaluation runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have worked on NL2SQL, agent reliability, or evaluation workflows, I would genuinely love to hear what has worked for you.&lt;/p&gt;

&lt;p&gt;Happy to answer questions in the comments. Happy Learning&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Preview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvrigarfiizlscsy86r9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvrigarfiizlscsy86r9.png" alt="Preview 1" width="800" height="943"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmbaae0ggfuyi9vdggcb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmbaae0ggfuyi9vdggcb.png" alt="Preview 2" width="800" height="934"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8suapdtoby4vwc0f9mw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8suapdtoby4vwc0f9mw.png" alt="Preview 3" width="800" height="902"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>multiagent</category>
      <category>python</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
