Build LangChain Chains Once with Lazy Initialization

#architecture #llm #performance #python

Build LangChain chains once with lazy initialization

Build LangChain chains once, on demand. Guard with a None check, initialize related singletons together on the first request, and let bad config fail as a clear runtime error instead of a startup crash.

Why this matters

Building a LangChain chain is not free. Constructing a SQLDatabase opens a database connection, inspects schema, and may sample rows. Instantiating ChatOpenAI validates configuration and prepares client state. Calling create_sql_query_chain wires prompts, models, and parsers into an executable graph.

In our NL2SQL agent, initialization added several hundred milliseconds.

If you do that work at import time, a missing DB URL or API key can kill the process before health checks or structured logs help you diagnose it. If you rebuild everything per request, you pay the same setup cost on every query before the model processes a single token.

Lazy initialization avoids both problems.

The problem

Without lazy initialization, you usually end up with two bad options:

Eager import-time loading: startup fails immediately if config is wrong.
Per-request initialization: identical objects are rebuilt on every call, adding avoidable setup latency.

Naive approach vs production approach

Naive: eager import-time loading	Production: lazy singleton
❌ Chains built when module loads	✅ `None` declared at module level
❌ Crashes on missing DB or API key	✅ Initialized on the first request
❌ No startup without full environment	✅ Hot-path guard is effectively negligible
❌ Hard to unit-test without live DB	✅ Bad config fails with a clear error
❌ Config read before app logs start	✅ Related singletons initialized together

How I implemented it

The NL2SQL agent keeps module-level placeholders for the database, LLM clients, and chains. _get_chain() is the single initialization point: the first call builds everything, and later calls return the cached objects.

# Pseudocode

# Initialized on first request so missing DB config or bad API keys
# fail at runtime with context, not during module import.

_db = None
_llm = None
_fast_llm = None

_generate_query = None
_execute_query = None
_rephrase_answer = None
_select_table = None
_rewrite_query = None

_llm_semaphore = None


def _get_chain():
    """Return all chains, initializing them once on first use."""
    global _db, _llm, _fast_llm
    global _generate_query, _execute_query, _rephrase_answer
    global _select_table, _rewrite_query, _llm_semaphore

    # Guard: if the last-built chain exists, the rest do too.
    if _generate_query is not None:
        return (
            _generate_query,
            _execute_query,
            _rephrase_answer,
            _select_table,
            _rewrite_query,
        )

    _init_redis()  # Redis or in-memory fallback

    if _llm_semaphore is None:
        _llm_semaphore = Semaphore(settings.llm_max_concurrency)

    _db = SQLDatabase.from_uri(
        settings.database_url,
        sample_rows_in_table_info=3,
    )

    _llm = build_llm_with_fallbacks()
    _fast_llm = build_fast_llm_with_fallbacks()

    _generate_query = create_sql_query_chain(_llm, _db, prompt=build_prompt())
    _execute_query = QuerySQLDataBaseTool(db=_db)
    _rephrase_answer = answer_prompt | _fast_llm | StrOutputParser()
    _select_table = table_prompt | _fast_llm | StrOutputParser() | split_fn
    _rewrite_query = rewrite_prompt | _fast_llm | StrOutputParser()

    return (
        _generate_query,
        _execute_query,
        _rephrase_answer,
        _select_table,
        _rewrite_query,
    )

Why check `_generate_query`?

_generate_query is the last object created in the initialization sequence. If it exists, the earlier objects should exist too.

That makes it a safer sentinel than _db. If initialization fails midway, _db might already be set while one or more chains are still missing. Guarding on the last-built object reduces the chance of returning a partially initialized state.

_init_redis() uses the same pattern internally: if the client already exists, return immediately. Both guards are idempotent; only the first successful call performs real work.

Concurrency note

In multi-worker or highly concurrent environments, protect first-time initialization with a lock if partial initialization is possible. The pattern is sound, but concurrent first access can still create race conditions if two requests enter the initialization path at the same time.

Bug story: lazy singleton without a TTL

This pattern worked well for chains, but failed when I used it for mutable data.
The entity resolver cached the players table behind a simple None guard and kept it for the lifetime of the process. That was fine until a new player was added mid-season after an IPL auction. The resolver kept serving the stale mapping, and lookups for the new player failed until the backend restarted.

The lesson is simple: lazy singletons are a good fit for resources that are expensive to build and effectively static for the lifetime of the process. They are a poor fit for data that changes over time.

If the underlying data can change, add a TTL or explicit invalidation. A bare None guard will cache forever.

General pattern

The same idea shows up in many languages under different names: lazy initialization, initialization-on-demand, deferred construction, or memoized setup.

The pattern is always the same:

Declare the resource as None at the shared scope.

Add a guard that returns the cached object if it already exists.

Initialize once, store the result, and reuse it on later calls.

After the first successful call, the setup cost disappears from the hot path.

When to use it

Use lazy singletons for expensive, effectively immutable resources such as:

database connections or pools
HTTP clients
LLM clients
LangChain chains
semaphores
model weights loaded once per process

When not to use it

Do not use a bare lazy singleton for mutable data such as:

lookup tables that can change
feature flags
config that may be reloaded
caches backed by changing database rows

Those cases need TTL-based refresh, invalidation, or a different caching strategy.

Common mistakes

Checking the wrong sentinel Guarding on _db instead of the last-built chain can expose partially initialized state after a mid-init failure.

Forgetting global In Python, assigning to a module-level variable inside a function creates a local unless declared `global1. The singleton never persists, and initialization repeats on every call.

Splitting related initialization across multiple guards If DB, LLM, and chains are initialized in separate paths, concurrent startup can leave them out of sync. Initialize related objects together behind one guard.

Using the pattern for mutable data The pattern is fine for process-lifetime resources, not for data that needs refresh or invalidation.

DEV Community

Build LangChain Chains Once with Lazy Initialization