DEV Community: Sandeep B Kadam

Build LangChain Chains Once with Lazy Initialization

Sandeep B Kadam — Tue, 24 Mar 2026 08:17:33 +0000

Build LangChain chains once with lazy initialization

Build LangChain chains once, on demand. Guard with a None check, initialize related singletons together on the first request, and let bad config fail as a clear runtime error instead of a startup crash.

Why this matters

Building a LangChain chain is not free. Constructing a SQLDatabase opens a database connection, inspects schema, and may sample rows. Instantiating ChatOpenAI validates configuration and prepares client state. Calling create_sql_query_chain wires prompts, models, and parsers into an executable graph.

In our NL2SQL agent, initialization added several hundred milliseconds.

If you do that work at import time, a missing DB URL or API key can kill the process before health checks or structured logs help you diagnose it. If you rebuild everything per request, you pay the same setup cost on every query before the model processes a single token.

Lazy initialization avoids both problems.

The problem

Without lazy initialization, you usually end up with two bad options:

Eager import-time loading: startup fails immediately if config is wrong.
Per-request initialization: identical objects are rebuilt on every call, adding avoidable setup latency.

Naive approach vs production approach

Naive: eager import-time loading	Production: lazy singleton
❌ Chains built when module loads	✅ `None` declared at module level
❌ Crashes on missing DB or API key	✅ Initialized on the first request
❌ No startup without full environment	✅ Hot-path guard is effectively negligible
❌ Hard to unit-test without live DB	✅ Bad config fails with a clear error
❌ Config read before app logs start	✅ Related singletons initialized together

How I implemented it

The NL2SQL agent keeps module-level placeholders for the database, LLM clients, and chains. _get_chain() is the single initialization point: the first call builds everything, and later calls return the cached objects.

# Pseudocode

# Initialized on first request so missing DB config or bad API keys
# fail at runtime with context, not during module import.

_db = None
_llm = None
_fast_llm = None

_generate_query = None
_execute_query = None
_rephrase_answer = None
_select_table = None
_rewrite_query = None

_llm_semaphore = None


def _get_chain():
    """Return all chains, initializing them once on first use."""
    global _db, _llm, _fast_llm
    global _generate_query, _execute_query, _rephrase_answer
    global _select_table, _rewrite_query, _llm_semaphore

    # Guard: if the last-built chain exists, the rest do too.
    if _generate_query is not None:
        return (
            _generate_query,
            _execute_query,
            _rephrase_answer,
            _select_table,
            _rewrite_query,
        )

    _init_redis()  # Redis or in-memory fallback

    if _llm_semaphore is None:
        _llm_semaphore = Semaphore(settings.llm_max_concurrency)

    _db = SQLDatabase.from_uri(
        settings.database_url,
        sample_rows_in_table_info=3,
    )

    _llm = build_llm_with_fallbacks()
    _fast_llm = build_fast_llm_with_fallbacks()

    _generate_query = create_sql_query_chain(_llm, _db, prompt=build_prompt())
    _execute_query = QuerySQLDataBaseTool(db=_db)
    _rephrase_answer = answer_prompt | _fast_llm | StrOutputParser()
    _select_table = table_prompt | _fast_llm | StrOutputParser() | split_fn
    _rewrite_query = rewrite_prompt | _fast_llm | StrOutputParser()

    return (
        _generate_query,
        _execute_query,
        _rephrase_answer,
        _select_table,
        _rewrite_query,
    )

Why check `_generate_query`?

_generate_query is the last object created in the initialization sequence. If it exists, the earlier objects should exist too.

That makes it a safer sentinel than _db. If initialization fails midway, _db might already be set while one or more chains are still missing. Guarding on the last-built object reduces the chance of returning a partially initialized state.

_init_redis() uses the same pattern internally: if the client already exists, return immediately. Both guards are idempotent; only the first successful call performs real work.

Concurrency note

In multi-worker or highly concurrent environments, protect first-time initialization with a lock if partial initialization is possible. The pattern is sound, but concurrent first access can still create race conditions if two requests enter the initialization path at the same time.

Bug story: lazy singleton without a TTL

This pattern worked well for chains, but failed when I used it for mutable data.
The entity resolver cached the players table behind a simple None guard and kept it for the lifetime of the process. That was fine until a new player was added mid-season after an IPL auction. The resolver kept serving the stale mapping, and lookups for the new player failed until the backend restarted.

The lesson is simple: lazy singletons are a good fit for resources that are expensive to build and effectively static for the lifetime of the process. They are a poor fit for data that changes over time.

If the underlying data can change, add a TTL or explicit invalidation. A bare None guard will cache forever.

General pattern

The same idea shows up in many languages under different names: lazy initialization, initialization-on-demand, deferred construction, or memoized setup.

The pattern is always the same:

Declare the resource as None at the shared scope.

Add a guard that returns the cached object if it already exists.

Initialize once, store the result, and reuse it on later calls.

After the first successful call, the setup cost disappears from the hot path.

When to use it

Use lazy singletons for expensive, effectively immutable resources such as:

database connections or pools
HTTP clients
LLM clients
LangChain chains
semaphores
model weights loaded once per process

When not to use it

Do not use a bare lazy singleton for mutable data such as:

lookup tables that can change
feature flags
config that may be reloaded
caches backed by changing database rows

Those cases need TTL-based refresh, invalidation, or a different caching strategy.

Common mistakes

Checking the wrong sentinel Guarding on _db instead of the last-built chain can expose partially initialized state after a mid-init failure.

Forgetting global In Python, assigning to a module-level variable inside a function creates a local unless declared `global1. The singleton never persists, and initialization repeats on every call.

Splitting related initialization across multiple guards If DB, LLM, and chains are initialized in separate paths, concurrent startup can leave them out of sync. Initialize related objects together behind one guard.

Using the pattern for mutable data The pattern is fine for process-lifetime resources, not for data that needs refresh or invalidation.

Circuit breaker for LLM provider failure

Sandeep B Kadam — Mon, 23 Mar 2026 06:12:38 +0000

Stop calling a dead API. Shed load fast, recover automatically, and stay consistent across restarts with Redis-backed failure state.

Why this matters
Every LLM-powered application depends on an external provider - OpenAI, Anthropic, Google, or a self-hosted model. These providers go down. Rate limits spike. Latency balloons. Without a circuit breaker, your application keeps sending requests into a black hole, burning through your budget, stacking up timeouts, and delivering a terrible experience to every user in the queue.

A circuit breaker detects that the downstream service is failing and stops trying for a cooldown period. This is not about retrying harder - it's about failing fast and deliberately so the rest of your system stays healthy.

The problem

Without a circuit breaker: When your LLM provider starts returning 429s or 500s, every new user request still attempts a full API call. Each call waits for a timeout (often 30-60 seconds). Your concurrency pool fills up. Healthy requests get queued behind doomed ones. Your entire application appears frozen.

Naive approach vs production approach

Naive: retry and hope	Production: circuit breaker
Retry every failed request 3 times	Track failure count in a window
Log the error and move on	Trip open after N failures
No memory of past failures	Reject instantly while open
Each request rediscovers the outage	Probe with single test request
Timeouts pile up, pool exhausted	Close when probe succeeds

How I implemented it

I implemented a circuit breaker in the NL2SQL agent that wraps every LLM provider call. When the failure count within a sliding window exceeds a threshold, the breaker trips open and all subsequent requests return an error immediately - no API call, no timeout, no wasted concurrency slot.

# Pseudo-code: circuit breaker wrapping an LLM call
# not a production-grade circuit breaker, sliding window not shown

class CircuitBreaker:
    def __init__(self, threshold=5, cooldown_sec=60):
        self.state = "CLOSED"
        self.failure_count = 0
        self.threshold = threshold
        self.cooldown_sec = cooldown_sec
        self.last_failure_time = None

    async def call(self, fn, *args):
        if self.state == "OPEN":
            if time_since(self.last_failure_time) > self.cooldown_sec:
                self.state = "HALF_OPEN"
            else:
                raise CircuitOpenError("Provider unavailable")

        try:
            result = await fn(*args)
            if self.state == "HALF_OPEN":
                self.reset()
            return result
        except ProviderError:
            self._record_failure()
            raise

    def _record_failure(self):
        self.failure_count += 1
        self.last_failure_time = now()
        if self.failure_count >= self.threshold:
            self.state = "OPEN"

    def reset(self):
        self.state = "CLOSED"
        self.failure_count = 0

Key design choice: The circuit breaker state is stored in Redis, not in-process memory. This matters because in a multi-replica deployment, one replica discovering the outage should protect all replicas from burning through the same dead endpoint. Without shared state, each pod independently rediscovers the failure.

Bug story: the in-process fallback

Bug: During local development, Redis wasn't always running. The circuit breaker tried to read state from Redis, failed, and threw an unhandled exception - crashing the entire request before it even reached the LLM provider.
The fix: detect Redis connection failure and fall back to an in-process circuit breaker with the same interface. This is a classic example of a reliability mechanism introducing its own failure mode.

The lesson is important: every reliability layer must itself have a fallback. If your circuit breaker depends on Redis, and Redis is down, your circuit breaker shouldn't make things worse. The in-process fallback loses cross-replica consistency but keeps the application functional.

Generalized lesson
Circuit breakers aren't specific to LLM applications. They appear anywhere you call an external service that can fail: payment processors, search indices, notification services, databases. The pattern is the same:

The general pattern: Track failures within a window. When failures cross a threshold, stop calling the service. After a cooldown, send one probe. If it works, resume. If it doesn't, extend the cooldown. Always degrade gracefully - never let a dead dependency take down your entire system.

How to apply in other projects

If you're wrapping any external API call, you can introduce a circuit breaker in three steps. First, wrap the call in a try/except that increments a failure counter. Second, before each call, check the counter - if it's above your threshold and the cooldown hasn't elapsed, return an error immediately. Third, after the cooldown, allow one request through and reset if it succeeds.

For single-process applications, an in-memory counter is sufficient. For distributed systems, shared state can be useful when you want replicas to coordinate breaker behavior. Redis is a common choice. A database-backed approach can also work in some systems, while per-instance breakers are still sufficient for many deployments depending on traffic shape and failure tolerance.

Common mistakes

No cooldown backoff. A fixed 60-second cooldown means the breaker reopens and gets punched again immediately during a sustained outage. Use exponential backoff on the cooldown duration.

Counting all errors equally. A 429 (rate limit) is different from a 500 (server error). Rate limits often clear within seconds - tripping a 60-second breaker for a 429 is overkill. Differentiate transient vs persistent failures.

Forgetting the fallback for the breaker itself. If your circuit breaker state lives in Redis and Redis goes down, you have two things broken instead of one. Always have an in-process fallback.

Notes / production caveats

This post focuses on the pattern, not a fully hardened implementation. The pseudo-code is intentionally simplified: it does not show a true sliding window, concurrency control, single-flight probing in half-open, backoff strategy, or differentiated handling for different error classes.

A few practical caveats are worth calling out:

Shared Redis-backed state is useful in multi-replica systems, but half-open coordination needs care. Without guardrails, multiple replicas can probe the dependency at once and create noisy recovery behavior.
Redis is one valid production design, not the only one. Many systems work well with per-instance breakers combined with load-shedding, jittered retries, and strict client-side timeouts.
For distributed coordination, Redis is a practical option. A database-backed approach can also work in some systems, but a shared file is usually not a serious production coordination mechanism.
Failing fast should usually be paired with a fallback path: degraded mode, cached responses, queueing, or explicit messaging that the provider is temporarily unavailable.

I’m still learning these reliability patterns by applying them in real projects. If you have suggestions, corrections, or better ways to think about this, I’d genuinely appreciate your feedback. Thank You!

I Built an NL2SQL Agent for IPL Cricket While Learning How AI Agents Actually Work

Sandeep B Kadam — Sun, 22 Mar 2026 13:37:52 +0000

IPL 2026 starts this month, so I built something around it.

I have been learning how to build AI agents, and one thing I kept wanting was a project that felt concrete. Not just a chatbot demo, but something that had to deal with real data, real edge cases, and real mistakes.

So I built IPL Cricket Analyst.

It lets you ask questions about IPL data in plain English and get back a SQL-backed answer in real time from an SQL database, along with charts, follow-up suggestions, and support for multi-turn questions. The user don't have to know the queries, LLM will generate for you.

Some example questions:

"Who has the best death-over economy since 2020?"
"Show me a bar chart of the top 10 wicket takers."
"What is Virat Kohli's strike rate at Eden Gardens after 2022?"

Under the hood, the agent writes the SQL, validates it, runs it against 278,000+ ball-by-ball deliveries, and streams the result back while it works.

What I built

Layer	Tech
Frontend	Next.js 14, TypeScript, Tailwind CSS
Backend	FastAPI, Python 3.11, LangChain
Database	PostgreSQL, 9 tables, 278k+ rows
Vector store	ChromaDB
Cache / History	Redis 7
LLM	GPT-4o for SQL, GPT-4o-mini for rewrite and insights
Charts	MCP Chart Server with Vega-Lite v5

The basic goal was simple: take a cricket question, turn it into SQL, run it safely, and make the result feel responsive in the UI.

The pipeline

Each question goes through a pipeline inside run_agent_stream() that looks roughly like this:

Input validation
→ Response cache check
→ Query rewrite + history summarization
→ Entity resolution
→ [Table selection || Cricket RAG]
→ SQL generation
→ SQL validation + semantic check
→ SQL execution
→ [Answer rephrase || Insights || Viz]
→ Streamed NDJSON to frontend

The frontend receives events step by step, so the SQL shows up first, then the answer, then the extra pieces like insights and charts.

That streaming part made a much bigger difference to the feel of the app than I expected.

What turned out to be harder than I expected
1. Cricket stats are tricky in ways generic NL2SQL examples do not prepare you for
A lot of NL2SQL tutorials work on clean, simple schemas. IPL data is not that.

For example, a batting average is not just a straightforward aggregation. Dismissals can be subtle because the dismissed player is not always the striker. Ducks are also not a ball-level concept. They have to be computed at the innings level.

I ran into a lot of cases where the SQL looked reasonable but was still wrong from a cricket point of view.

To handle that better, I added:

a cricket rules document for retrieval
IPL-specific few-shot SQL examples
an extra semantic validation step before execution

That combination helped a lot more than just changing prompts randomly.

2. Accuracy improved only after I started measuring it properly

At first I was mostly testing with ad hoc questions, which felt fine until I started noticing inconsistencies.

So I put together a 50-question ground-truth evaluation set and started running the system against it repeatedly.

The first version was around 82% accurate.
After a lot of iteration, it got to 98% on that eval.

Most of the improvements did not come from big architectural changes. They came from fixing very specific failure modes, like:

using the wrong grain for aggregation
getting milestone logic wrong
small cricket-specific details like death overs being overs 16 to 20, which in this dataset meant handling indexing carefully
selecting columns that made the answer noisier than it needed to be

That was probably the biggest lesson in the whole project. Evaluation made the work much more grounded.

3. Follow-up questions needed more care than I thought

One of the things I wanted was for follow-ups to feel natural.

Questions like:

"Who scored the most runs in 2023?"
"What was his strike rate?"
"What about 2022?"

That sounds simple from a user perspective, but a lot has to go right for it to work consistently.

I ended up rewriting follow-up questions into standalone questions before sending them downstream. That made the rest of the pipeline much more reliable.

It was one of those changes that feels obvious in hindsight.

4. Reliability work matters even in small projects

I did not want this to be just a cool demo that works once.

So I added some basic safeguards:

per-IP rate limiting
a response cache
a circuit breaker
request timeouts
input validation
SELECT-only SQL enforcement

None of that is especially flashy, but it made the project feel much more solid.

What I learned

This project taught me a lot about building agents in a way that feels less magical and more engineering-focused.

A few things stood out:

Evaluation matters a lot.
Without a fixed eval set, it is very easy to convince yourself the system is getting better when it is just getting different.

Domain grounding matters more than I expected.
A strong model can generate convincing SQL, but convincing is not the same as correct. The cricket-specific rules and examples made a huge difference.

Streaming helps the UX a lot.
Even when the full pipeline takes a few seconds, showing progress step by step makes the app feel much better.

The hard part is usually not generation.
A lot of the work ended up being around validation, edge cases, memory, retries, and handling the weird questions cleanly.

Why I liked building this

I started this mainly as a way to learn more about AI agents, but it turned into a really useful exercise in building around failure cases.

It is easy to make an agent look smart in a short demo.
It is much harder to make it dependable when the inputs are messy, the domain has tricky rules, and the answer actually needs to be right.

That is what made this project fun.

Try it yourself

The dataset is public on Kaggle:
https://www.kaggle.com/datasets/sandeepbkadam/ipl-cricket-dataset-20082025-postgresql

GitHub: https://github.com/Sandhu93/nl2sql-agent

What I want to improve next

Right now I want to get better visibility into how the system behaves in practice.

Monitoring and observability are next on the list, especially:
latency by pipeline step
better failure logging
more structured evaluation runs

If you have worked on NL2SQL, agent reliability, or evaluation workflows, I would genuinely love to hear what has worked for you.

Happy to answer questions in the comments. Happy Learning

Preview