DEV Community: Priyank Agrawal

Your Multi-Agent SSE Stream Works in Dev. Here's What Kills It in Production.

Priyank Agrawal — Thu, 09 Apr 2026 19:35:44 +0000

Methodology note: Intent classification numbers (25% → 8% misclassification) were measured over ~1,400 messages across two weeks using a 12-intent test set with human-labelled ground truth. Abandonment rates (68% → 31%) are session-level. Small dataset — treat as directional signals, not benchmarks.

The 90-second version

Before the deep dives — here are the six mistakes and fixes. If you only read this far, you still leave with something useful.

#	Mistake	Fix
01	Flat text streams break the moment structured data appears	Use typed SSE events. Design the schema before writing a single agent.
02	Loose `dict` state in LangGraph crashes two agents downstream, mid-stream	Pydantic with `extra='forbid'` at every agent boundary
03	`run_in_executor` introduces race conditions with shared mutable state	Use `ainvoke` instead. If you can't, use per-request graph instances.
04	Catching only two exception types leaves zombie connections open	Bare `except Exception` + `finally` block. Not lazy — required.
05	Free-text conversation summaries hallucinate user constraints	Structured Pydantic output for summarisation — explicit fields, not prose
06	Mobile browser kills `EventSource` silently on app switch	`ReconnectingEventSource` with Last-Event-ID resume, not just retry

1. Typed SSE events — design the contract before the code

Streaming a text response is trivial. Streaming structured data, UI signals, and text simultaneously over one connection is not.

When a multi-agent system runs, you have free-text chunks, typed job objects, quick-action buttons, and agent metadata all in flight at once. Streaming everything as flat text fails the moment structured data appears — you can't reliably parse a JSON object out of a partial token stream.

The mistake: Designing the SSE event schema after the agents are built. Every agent output has to be rewired to match the new contract.

The fix: Define the typed event schema first. It becomes the contract every agent writes to and every frontend listener reads from.

# backend/app/api/v1/chat.py
async def stream_agent_response(orchestrator_result):
    async for event in orchestrator_result:
        event_type = event.get("type")

        if event_type == "chunk":        # streaming text token
            yield f"event: chunk\ndata: {json.dumps({'content': event['content'], 'agent': event['agent']})}\n\n"
        elif event_type == "jobs_data":  # structured JSON — not text
            yield f"event: jobs_data\ndata: {json.dumps({'jobs': event['jobs']})}\n\n"
        elif event_type == "complete":
            yield f"event: complete\ndata: {json.dumps({'conversation_id': event['conversation_id']})}\n\n"

// frontend — subscribe by type, not raw text
eventSource.addEventListener('chunk', (e) => appendTextChunk(JSON.parse(e.data).content));
eventSource.addEventListener('jobs_data', (e) => setJobResults(JSON.parse(e.data).jobs));
eventSource.addEventListener('complete', () => eventSource.close());

⚠️ What this code is missing: No backpressure — a slow client + fast generator = unbounded memory buffering. No SSE heartbeat — load balancers silently kill idle connections after 30–60s. Add ": heartbeat\n\n" every 15 seconds alongside the main stream.

# Production addition — heartbeat so proxies don't kill idle connections
async def stream_with_heartbeat(generator, interval: int = 15):
    last_event = asyncio.get_event_loop().time()
    async for event in generator:
        yield event
        elapsed = asyncio.get_event_loop().time() - last_event
        if elapsed > interval:
            yield ": heartbeat\n\n"
            last_event = asyncio.get_event_loop().time()

2. Pydantic state in LangGraph — errors at write time, not read time

The Ranking Agent reads jobs_found. But there's no contract saying what fields it contains. If the Job Search Agent wrote malformed data, the crash happens at the Ranking Agent — mid-stream, with a live user waiting, and no easy way to trace it back to the source.

The mistake: Loose dict and list types everywhere in your LangGraph state. Crashes surface two agents downstream with no clear origin.

The fix: Pydantic models at every agent boundary. Add extra='forbid' — unexpected fields become hard errors at the producing agent, not silent pass-throughs.

# state.py
from pydantic import BaseModel, ConfigDict
from typing import Optional, List

class JobResult(BaseModel):
    model_config = ConfigDict(extra='forbid')  # unexpected fields = hard error
    job_id: str
    title: str
    match_score: float
    matched_skills: List[str]

class AgentState(BaseModel):
    model_config = ConfigDict(extra='forbid')
    user_message: str
    intent: Optional[IntentData] = None
    jobs_found: List[JobResult] = []
    error: Optional[str] = None
    is_complete: bool = False

Rule: If data crosses an agent boundary, it gets a type. No exceptions.

3. `run_in_executor` has a hidden race condition

FastAPI is async. LangGraph's LLM calls are, in some configurations, synchronous. Calling synchronous code inside an async endpoint blocks the entire event loop for the duration of that LLM call. The symptom: SSE connections open, the thinking event fires, then nothing for 10–15 seconds.

Before the fix: P95 latency was 3× P50.

After: 1.4× P50.

But here's what most guides skip:

⚠️ Critical caveat: run_in_executor moves code to a thread pool. If your LangGraph setup has any shared mutable state — a shared cache, a shared agent instance, anything not thread-safe — you've introduced race conditions that are invisible at low traffic and catastrophic at scale.

Prefer this:

# Use the async method if your setup supports it
result = await langgraph_chain.ainvoke(state)

Safe fallback when ainvoke is unavailable:

from concurrent.futures import ThreadPoolExecutor
from functools import partial

_executor = ThreadPoolExecutor(max_workers=10)

async def stream_chat(request):
    # Build a FRESH graph per request — no shared state across threads
    graph = build_agent_graph()
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(_executor, partial(graph.invoke, state))

The one-liner fix is only safe if you're certain there is no shared mutable state anywhere in the graph.

4. LLM down mid-stream — catch everything, not just two exceptions

The SSE connection is open. Intent classified. Then the LLM returns a 503. Without the right handling, the connection hangs indefinitely. After a few of these, zombie connections accumulate — open SSE connections that will never close.

The original mistake: Catching only httpx.HTTPStatusError and groq.APIStatusError. A Pydantic validation error, a plain Python bug, any other library's timeout — all bubble up uncaught, leaving the connection open forever.

async def safe_agent_stream(agent_fn, state, fallback: str):
    try:
        async for event in agent_fn(state):
            yield event
    except (httpx.HTTPStatusError, groq.APIStatusError):
        yield {"type": "error", "content": fallback, "retry": True}
    except asyncio.TimeoutError:
        yield {"type": "error", "content": "Response timed out.", "retry": True}
    except Exception:
        # Not lazy — this is the safety net for everything unexpected
        logger.exception("Unexpected agent error")
        yield {"type": "error", "content": "Something went wrong.", "retry": True}
    finally:
        # Runs regardless — frontend can always close the connection
        yield {"type": "complete", "conversation_id": state.conversation_id}

The finally block is not optional. It guarantees the frontend receives a complete event even when everything above it fails, so it can close the connection cleanly.

5. Free-text conversation summaries hallucinate — use structured output

At turn 20, injecting full conversation history into every call costs too much. The fix is to summarise old messages and keep recent ones verbatim.

But free-text summarisation introduces a subtle failure: the summary model says "user prefers small companies". The user said "no startups under 50 people". These are opposite. The downstream agent makes recommendations based on hallucinated preferences with no way to debug it from the recent context alone.

# Wrong — free text is hard to audit and easy to distort
summary = "User is looking for Python roles at mid-sized companies..."

# Right — structured output forces explicit, auditable extraction
class ConversationSummary(BaseModel):
    job_preferences: List[str]      # ["Python", "remote", "Series B+"]
    hard_constraints: List[str]     # ["no startups under 50 people"]
    mentioned_companies: List[str]  # ["Stripe", "Notion"]
    current_goal: str

summary = summarize_to_struct(older_messages, ConversationSummary)

summarize_to_struct is a separate LLM call using a fast, cheap model (haiku or gpt-3.5-turbo). The result is cached in your session store — don't re-summarise every turn. Structured output makes it auditable: you can log and diff summaries across turns to catch distortions before they affect downstream agents.

6. Mobile SSE dies silently — you need resume, not just retry

On desktop, SSE is reliable. On mobile, background/foreground app switching kills EventSource silently — no error event, no warning. The user returns to a chat with no response.

A naive retry loop reconnects from scratch and replays events the user already saw. What you actually need is resume from the last received event.

class ReconnectingEventSource {
    constructor(url, options = {}) {
        this.url = url;
        this.lastEventId = null;
        this.retryDelay = 1000;
        this.maxDelay = 30000;
        this.maxAttempts = 5;
        this.attempts = 0;
        this.closed = false;
        this.handlers = {};
        this.connect();
    }

    connect() {
        // Pass Last-Event-ID so server resumes from the right point
        const url = this.lastEventId
            ? `${this.url}&lastEventId=${this.lastEventId}`
            : this.url;

        this.es = new EventSource(url);

        this.es.addEventListener('complete', () => {
            this.closed = true;
            this.es.close();
        });

        this.es.onerror = () => {
            if (!this.closed) { this.es.close(); this.scheduleReconnect(); }
        };

        Object.entries(this.handlers).forEach(([type, fn]) => {
            this.es.addEventListener(type, (e) => {
                if (e.lastEventId) this.lastEventId = e.lastEventId; // track position
                fn(e);
            });
        });
    }

    scheduleReconnect() {
        if (this.attempts >= this.maxAttempts) {
            this.handlers['give_up']?.();
            return;
        }
        const delay = Math.min(this.retryDelay * 2 ** this.attempts++, this.maxDelay);
        setTimeout(() => this.connect(), delay);
    }

    addEventListener(type, fn) {
        this.handlers[type] = fn;
        this.es?.addEventListener(type, fn);
    }
}

Server side: Emit an id: field with every SSE event. On reconnect with lastEventId, replay only the missed events from a short-lived Redis buffer (60s TTL). If resume complexity isn't worth it, the honest fallback is a "connection dropped — tap to retry" button with the last message pre-filled.

7. Intent classification — the hard cases and fixes

"Find me Python jobs" → job search. "Write a cold email" → email generator. The hard cases:

"I applied to Google last week, any updates?" — followup or general?
"Help me with the Stripe interview" — interview_prep or general?
"Yo" — ???

Initial misclassification rate on ambiguous inputs: ~25%. Three changes reduced it to under 8%:

1. Explicit "when to use" descriptions in the classifier prompt:

INTENT_DESCRIPTIONS = {
    "followup": "User asking about an application they ALREADY submitted. Keywords: 'I applied', 'heard back', 'status'.",
    "job_search": "User wants to DISCOVER new jobs not yet applied to. Keywords: 'find', 'show me', 'any jobs'.",
    "skill_gap": "User wants to know what's MISSING. Keywords: 'missing', 'improve', 'gap'.",
}

2. Confidence scores with fallback:

class IntentClassification(BaseModel):
    intent: str
    confidence: float   # 0.0 to 1.0
    reasoning: str      # forces the model to show its work

if classification.confidence < 0.70:
    return route_to_general(message, context)

3. Previous intent as context:

context_hint = f"Previous intents: {', '.join(recent_intents[-3:])}"
# If last intent was "followup" and user replies "any updates?" —
# prior context makes correct classification almost automatic

8. Low confidence + open question = 68% abandonment

When intent confidence falls below 0.70 and the system asks "what are you looking for?" — 68% of sessions end there. Replacing the open question with three specific quick-reply buttons dropped abandonment to 31%.

if classification.confidence < 0.70:
    return {
        "text": "I can help with a few things — pick one:",
        "suggestions": [
            "Find me jobs matching my profile",
            "Write a cold email for a role",
            "What skills should I add to my resume?"
        ]
        # Personalise from user_profile in production
        # Don't show "add skills" to a user whose resume is already uploaded
    }

On the numbers: ~1,400 sessions, two weeks, single product. Directional, not benchmarks. What generalises: open question → blank form feeling → abandonment. Specific options → momentum → engagement.

9. Build these on day one — retrofitting is painful

Six decisions that are cheap upfront and expensive to add later:

SSE event schema first. Changing the contract mid-build means rewiring every agent output and every frontend listener simultaneously.
Pydantic + extra='forbid' everywhere. Retrofitting types into a working dict-based state means rewriting agents that already have subtle state-sharing bugs you haven't found yet.
Bare except + finally from day one. Zombie connections are hard to detect (they look like slow users) and hard to purge once accumulating.
Log every intent classification with confidence score. After two weeks of logs, the classifier's weak spots become obvious. You can't improve what you didn't measure.
Structured summarisation, not free text. Switching mid-product means auditing every cached summary in your session store for distortions.
ReconnectingEventSource before mobile launch. Retrofitting reconnection touches every event handler in the frontend. It's never a small change once you have production listeners.

Quick poll for the comments — which of these hit you in production?

Zombie SSE connections that never closed

Loose dict state crashing mid-stream

The run_in_executor race condition

Mobile EventSource dying silently

Free-text summaries hallucinating user constraints

Something not on this list ⬇️

Found a failure mode I missed? Drop it in the comments — I'll add it to the article with credit.

Context Engineering: The Production Problem Nobody Writes About

Priyank Agrawal — Mon, 06 Apr 2026 17:41:17 +0000

Andrej Karpathy called it "the new prompt engineering." Everyone wrote a definition article. Nobody wrote about the actual infrastructure.

When I built the ESG Analytics Chatbot at Planet Sustech — a RAG system serving 10+ organizations with 95%+ query accuracy — the hardest engineering challenge wasn't the model. It wasn't the retrieval. It was figuring out what to put in the context window before every single LLM call, and why getting that wrong destroyed accuracy in ways that were nearly impossible to debug.

Context engineering is about one thing: building the right input to the model, every call, for every state of your application. It's not about writing better prompts. It's about the pipeline that assembles the prompt at runtime.

This is the post I wish existed when I was building that system.

Why This Is Harder Than It Sounds

Here's a simplified view of what goes into a single LLM call in a production RAG system:

[System Prompt] + [Retrieved Documents] + [Conversation History] + [User Query] + [Tool Schemas]
= Total Context

Each component competes for the same finite token budget. And each one has a different freshness, relevance, and cost profile.

The naive approach: include everything. Stuff the context window as full as you can.

The result: accuracy drops, costs spike, and you hit a phenomenon called "lost in the middle" — where the model pays disproportionately less attention to content in the middle of a long context. Research shows up to 24.2% accuracy degradation when relevant information is buried in the middle of a long context rather than placed at the beginning or end.

We saw this in the ESG chatbot. When an analyst asked about a specific company's water usage data, our retrieval was returning 8 relevant documents. But we were appending them in retrieval order, so the most relevant chunk was often sitting in positions 3–6. Accuracy on those queries was 15–20 points lower than queries where the best chunk landed first or last.

The Four Layers of Agent Context

Before optimizing anything, it helps to think about context in layers:

Layer 1: Working Memory
The current task. User's query, the agent's current goal, any in-progress state. This is always present and always goes at the beginning of the context (primacy effect — models weight early content heavily).

Layer 2: Episodic Memory
Recent conversation history. Not all of it — a compressed or windowed version. For a chatbot, this is the last N turns. For a long-running agent, this is a summary of recent steps taken.

Layer 3: Semantic Memory
Retrieved knowledge — documents, database results, external data. This is what RAG injects. It's the most expensive layer to get right because it's query-dependent and changes every call.

Layer 4: Tool State
The results of previous tool calls within the current session. If your agent called search_jobs two steps ago, that result might still be relevant now.

Most systems treat these as a flat list appended in whatever order they're collected. Production systems need to treat them as a priority stack with a budget.

Building a Context Budget

In our ESG chatbot, we set an explicit token budget per call:

CONTEXT_BUDGET = {
    "system_prompt": 800,       # Fixed — organizational context, instructions
    "working_memory": 400,      # Current query + task state
    "episodic_memory": 600,     # Recent conversation (compressed)
    "semantic_memory": 1800,    # Retrieved documents
    "tool_results": 400,        # Previous tool call outputs
    "response_reserve": 1000,   # Leave room for model's response
    # Total: ~5000 tokens — fits in most model context windows with headroom
}

This forces you to make explicit tradeoffs instead of letting the context grow unbounded until you hit an error.

When retrieved documents exceed the semantic memory budget, you don't just truncate — you re-rank and select:

def build_semantic_context(query: str, retrieved_docs: list, budget: int) -> str:
    """
    Select and rank documents to fit within token budget.
    Prioritize by relevance score, place highest-scoring chunk first.
    """
    # Sort by relevance score descending
    ranked = sorted(retrieved_docs, key=lambda d: d['score'], reverse=True)

    selected = []
    used_tokens = 0

    for doc in ranked:
        doc_tokens = count_tokens(doc['content'])
        if used_tokens + doc_tokens > budget:
            break
        selected.append(doc)
        used_tokens += doc_tokens

    # Critical: place highest-relevance content first (primacy effect)
    return "\n\n".join([f"[Source: {d['source']}]\n{d['content']}" for d in selected])

This simple re-ranking step improved our query accuracy from the mid-70s to 95%+ on complex multi-document queries.

The KV-Cache Problem (and Why It's a Cost Issue, Not Just a Performance Issue)

If you're running in production at scale, KV-cache hit rate is one of the most important metrics you're probably not tracking.

When you make an LLM API call, the model can cache the key-value pairs computed for a prefix of your context. If your next call starts with the same prefix, it reuses the cache instead of recomputing — dramatically reducing latency and cost (cached tokens are typically 10x cheaper than prompt tokens on Anthropic's API).

The problem: if your system prompt or retrieved documents change slightly on every call, you get 0% cache hit rate. You pay full price every time.

Our ESG chatbot was rebuilding the organization's schema context from scratch on every call — even though that schema barely changed. Cache hit rate was near zero. When we moved the static organizational context to a fixed prefix that never changed, and only appended dynamic content after it, our cache hit rate jumped to ~65%.

def build_context(org_schema: str, query: str, retrieved_docs: list) -> list[dict]:
    """
    Structure context so static content comes first (cacheable prefix).
    Dynamic content comes after (changes per request).
    """
    return [
        # STATIC — same for all queries in this org. Gets cached.
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "system", "content": f"Organization schema:\n{org_schema}"},

        # DYNAMIC — changes per query. Not cached.
        {"role": "user", "content": f"Retrieved context:\n{build_semantic_context(query, retrieved_docs, 1800)}"},
        {"role": "user", "content": query},
    ]

The rule: put what doesn't change first. Put what changes last.

Episodic Memory: When You Can't Afford Full History

For multi-turn conversations, you can't keep appending the full history. At turn 20, your conversation history alone might be 5,000 tokens.

Two patterns that work in production:

Pattern 1: Rolling Window
Keep only the last N turns verbatim. Simple, predictable, no information loss on recent turns.

def get_episodic_context(conversation_history: list, max_turns: int = 6) -> list:
    """Keep the most recent N turns."""
    return conversation_history[-max_turns * 2:]  # *2 for user+assistant pairs

Pattern 2: Hierarchical Summarization
For longer sessions, summarize older turns and keep recent ones verbatim.

async def build_episodic_context(history: list, budget_tokens: int) -> str:
    recent_turns = history[-6:]  # Last 3 exchanges verbatim
    older_turns = history[:-6]

    if not older_turns:
        return format_turns(recent_turns)

    # Summarize everything older
    summary = await llm.summarize(
        f"Summarize this conversation context concisely:\n{format_turns(older_turns)}",
        max_tokens=200
    )

    return f"Earlier in this conversation: {summary}\n\nRecent exchanges:\n{format_turns(recent_turns)}"

This kept our ESG chatbot coherent across 30+ turn sessions while staying well under our token budget.

What to Measure

If you're not measuring these, you're flying blind:

# Log this for every LLM call
context_metrics = {
    "total_tokens": count_tokens(full_context),
    "system_tokens": count_tokens(system_prompt),
    "retrieved_tokens": count_tokens(semantic_context),
    "history_tokens": count_tokens(episodic_context),
    "cache_hit": response.usage.cache_read_input_tokens > 0,  # Anthropic API
    "retrieval_count": len(retrieved_docs),
    "retrieval_top_score": retrieved_docs[0]['score'] if retrieved_docs else 0,
    "query_id": query_id,
    "org_id": org_id,
}

The metrics that predicted accuracy problems in our system:

retrieval_top_score < 0.7 → likely hallucination risk
retrieved_tokens > 2000 → "lost in the middle" risk on complex queries
cache_hit = False on high-traffic routes → unnecessary cost

The Mental Model Shift

Prompt engineering asks: "What do I write in the prompt?"

Context engineering asks: "What information does the model need, in what order, with what budget, at this specific point in this specific conversation, for this specific user?"

The second question has an engineering answer. It involves data structures, caching strategies, budget allocation, and retrieval pipelines. It's not about being clever with words — it's about building infrastructure.

When we treated our context pipeline as an engineering artifact — with explicit budgets, priority layers, and metrics — our ESG chatbot went from inconsistent to consistent. 95%+ accuracy wasn't the result of a better model or better prompts. It was the result of better context architecture.

Conclusion

Context engineering is not a new concept, but the production implementation is genuinely underexplored. Most of what's written is definitional. The infrastructure details — budget allocation, cache optimization, memory layers, retrieval re-ranking — are things teams figure out on their own after expensive mistakes.

If you're building a production RAG system or a long-running agent and you're hitting accuracy walls or cost spikes, audit your context pipeline first. The answer is probably there.

What patterns have you found that work? Drop them in the comments — I'm genuinely curious what others are doing here.

I built the ESG Analytics Chatbot at Planet Sustech — a multi-tenant RAG system serving 10+ organizations — and these patterns came directly from debugging production failures in that system.

Tags: #ai #llm #rag #machinelearning #python

🔍 How MongoDB Indexing Works Internally: B+Tree, B- Tree Structure, Performance Impact & Best Practices

Priyank Agrawal — Sun, 04 May 2025 15:43:34 +0000

Indexing is the backbone of database performance. In MongoDB, indexes are not just a luxury—they're essential for building scalable, performant applications. But how do they really work under the hood?

In this deep dive, we'll explore:

The core architecture of MongoDB indexes
Internal algorithms and data structures
How indexing affects read vs write operations
Practical indexing strategies and best practices

🧠 What Is Indexing in MongoDB?

An index in MongoDB is a special data structure that stores a subset of a collection's data in an efficient, sorted format. This allows the database engine to locate documents without scanning the entire collection.

MongoDB automatically creates an index on the _id field. You can (and should) define additional indexes to optimize specific queries.

🌳 Internal Index Structure: B-Trees

MongoDB uses B-Trees to manage its indexes. Here's how they work:

🔍 What's a B-Tree?

A self-balancing tree data structure
Keeps data sorted for logarithmic-time lookups
Both internal and leaf nodes can store data
Supports range queries, prefix matching, and sorted access

💡 Why B-Trees in MongoDB?

Enables fast insertions, deletions, and lookups (O(log n))
Allows range scans for $gte, $lte, $in, etc.
Efficient balancing as data changes
Well-suited for disk-based storage systems

🔁 Index Lifecycle: How MongoDB Maintains Indexes

Every time a document is inserted, updated, or deleted, all relevant indexes must be updated. Here's what happens internally:

✅ Insert:

MongoDB finds the correct location in the B-Tree
A new key is inserted
Tree rebalancing may occur if necessary

✏️ Update:

If the indexed field changes:
- MongoDB updates the key in the tree
- May involve removing and reinserting keys
This causes write amplification if there are many indexes

❌ Delete:

Keys are removed from all applicable indexes

⚠️ Indexes help read performance but can affect write performance due to additional maintenance operations.

⚡ Types of Indexes in MongoDB and Their Internals

Index Type	Internals	Use Case
Single Field	B-tree (WiredTiger storage engine)	Basic filters and sorts
Compound	B-tree with multi-part keys	Queries with multiple filters/sorts
Multikey	B-tree with separate entry per array element	Indexing arrays
Text Index	B-tree of lexicographically sorted terms	Full-text search
TTL Index	Single field index + background deletion proc	Auto-expiring documents
Sparse/Partial	B-tree with filtered document set	Conditional indexing
Geospatial	B-tree (2d) or B-tree+S2 (2dsphere)	Location-based queries
Hashed	B-tree of hashed values	Hash-based sharding

📊 Query Execution with Indexes

🧠 The Query Planner

MongoDB's query optimizer evaluates different query execution plans using available indexes. It selects the most efficient plan based on:

Index selectivity (how well an index narrows results)
Query predicates and their matching to indexes
Sort requirements and whether indexes can satisfy them
Statistics about data distribution

The optimizer may periodically re-evaluate plans as collection data changes over time.

🔀 Index Intersection

MongoDB can use multiple indexes to resolve a single query when:

Different indexes match different query conditions
The intersection would be more selective than using a single index
No single index exists that fully covers the query

However, index intersection isn't always more efficient and has its limitations, especially with large collections.

📦 Covered Queries

If all fields required by the query (both in the query criteria and in the projection) are included in an index, MongoDB can fulfill the query using only the index without accessing the documents—these "covered" queries are extremely fast!

// Example of a covered query (assuming there's an index on {age: 1, name: 1})
db.users.find({ age: 30 }, { age: 1, name: 1, _id: 0 })

⚖️ Read vs. Write Trade-offs

✅ When Indexes Help:

High-frequency reads
Filters and sorts
Joins using $lookup
Range queries and pagination

❌ When Indexes Hurt:

High-frequency writes (inserts/updates)
Frequent indexed field changes
Low cardinality fields (e.g., gender)

Rule of Thumb: Use indexes on collections primarily accessed for reads. Be strategic with indexing on collections with high write throughput.

🧱 WiredTiger Storage Engine & Indexing

MongoDB's default engine, WiredTiger:

Stores collection data in separate data files
Uses B-trees for the _id index and all other indexes
Each index is maintained in its own file

🧬 Compression:

Prefix compression on index keys
Block compression for data
Reduces disk usage, improves cache efficiency

🛠 Hidden & Background Builds

Foreground: Locks collection (faster, blocking)
Background: Non-blocking (slower, safe for production)
Hidden indexes: Can be tested before making visible to the query planner

✅ Indexing Best Practices

Index fields used in filtering and sorting
Avoid indexing low-cardinality fields
Keep indexes narrow (fewer fields)
Use compound indexes in the correct field order
Use .explain() with verbosity modes to validate
Monitor index usage with MongoDB Atlas or profiler
Drop unused indexes

   db.collection.dropIndex("index_name")

Balance indexing on write-heavy collections

🧪 Real-World Example: Compound Index

// Create a compound index
db.orders.createIndex({ customerId: 1, createdAt: -1 })

// Efficient for:
db.orders.find({ customerId: "123" }).sort({ createdAt: -1 })

// Not efficient for:
db.orders.find({ createdAt: { $gte: ISODate() } })

🧠 Developer Insight

"Use indexing strategically by understanding your access patterns. For read-heavy collections, comprehensive indexing can dramatically improve performance. For write-heavy collections, be selective to avoid unnecessary index maintenance overhead."

📘 Conclusion

MongoDB indexing is a sophisticated system built on B-tree data structures, efficient compression techniques, and intelligent query planning.

By understanding:

B-Tree mechanics and limitations
Read/write trade-offs
Query planner decisions

You can architect highly optimized applications that balance performance across various workloads.

👨‍💻 Author: Priyank Agrawal

Software Developer | Node.js | MongoDB
🔗 Dev.to Profile
🔗 LinkedIn

📌 Follow for More

If you found this useful, follow me on Dev.to or connect with me on LinkedIn for more deep-dive technical articles.

Optimizing MongoDB Aggregations with the $function Operator: Reducing Time Complexity

Priyank Agrawal — Thu, 24 Apr 2025 19:29:49 +0000

When working with large datasets in MongoDB, performance optimization is critical. One common challenge arises when you need to apply custom logic, such as mapping data values or performing sorting, which can become inefficient when done in multiple steps. In this blog, we'll explore how MongoDB's $function operator can significantly reduce time complexity compared to traditional methods.

Before $function: Multiple Steps, Increased Complexity
Before MongoDB introduced the $function operator, custom logic like mapping data and sorting often required multiple aggregation stages. This could increase the overall query time, especially when dealing with large datasets, since multiple stages (e.g., $map and $sort) needed to be processed separately. Let’s consider an example where you are fetching employees for a particular department and sorting them based on their employee number (position within the department). Without $function, you would have to first retrieve the employee data, manually map the employee positions, and then sort them.

Use Case: Sorting Employees in a Department
Suppose you need to retrieve employees for a specific department, and you have a map (employeePositionMap) that stores the position (employee number) of each employee. In the past, you would have to perform this task in two separate steps:

async function getDepartmentEmployees(connection, departmentId, userId) {
  const { Department, Employee } = connection.models;

  // Check if the department exists
  const departmentExists = await Department.exists({ _id: new ObjectId(departmentId) });
  if (!departmentExists) {
    throw new Error("Department not found");
  }

  // Get the department's employees
  const department = await Department.findById(departmentId, {
    employees: { $elemMatch: { user: new ObjectId(userId) } }
  });

  // Extract the employee IDs
  const employeeIds = department.employees.length > 0 ? 
    department.employees[0].employeesList.map(e => e) : [];

  // Create a map of employee IDs to their position (employeeNumber)
  const employeePositionMap = {};
  if (department.employees.length > 0) {
    department.employees[0].employeesList.forEach((employeeId, index) => {
      employeePositionMap[employeeId.toString()] = index + 1; // Add 1 for 1-based indexing
    });
  }

  // Query employees and add employeeNumber manually
  const employees = await Employee.aggregate([
    { $match: { _id: { $in: employeeIds } } },
    {
      $lookup: {
        from: "employeeSalary",
        let: { employeeId: "$_id", departmentId: new ObjectId(departmentId) },
        pipeline: [
          { $match: { $expr: { $eq: ["$employeeId", "$$employeeId"] } } }
        ],
        as: "salaryDetails"
      }
    },
    { $unwind: "$salaryDetails" },
  ]);

  // Adding employeeNumber manually after fetching all employees
  const employeesWithNumbers = employees.map(employee => {
    const employeeId = employee._id.toString();
    return {
      ...employee,
      employeeNumber: employeePositionMap[employeeId] || null
    };
  });

  // Sort by employeeNumber
  employeesWithNumbers.sort((a, b) => {
    if (a.employeeNumber === null) return 1;
    if (b.employeeNumber === null) return -1;
    return a.employeeNumber - b.employeeNumber;
  });

  return employeesWithNumbers;
}

Time Complexity in the Old Approach
Two Steps: First, we retrieve all the employees and manually add employeeNumber to each employee. Then, we sort the employees based on this employeeNumber.

Increased Complexity: The mapping (employeePositionMap) and sorting require separate iterations, resulting in O(n log n) time complexity for sorting and O(n) for mapping.

For large datasets, this can quickly become inefficient as the number of documents grows, especially when the employeeNumber mapping and sorting happen outside the database.

After $function: Reduced Complexity
Now, let’s see how MongoDB's $function operator simplifies this process and improves performance. By using $function, we can add the employeeNumber directly in the aggregation pipeline and perform sorting in one step, avoiding the need for multiple iterations over the data.

Updated Code Using $function:

async function getDepartmentEmployeesOptimized(connection, departmentId, userId) {
  const { Department, Employee } = connection.models;

  // Check if the department exists
  const departmentExists = await Department.exists({ _id: new ObjectId(departmentId) });
  if (!departmentExists) {
    throw new Error("Department not found");
  }

  // Get the department's employees
  const department = await Department.findById(departmentId, {
    employees: { $elemMatch: { user: new ObjectId(userId) } }
  });

  // Extract the employee IDs
  const employeeIds = department.employees.length > 0 ? 
    department.employees[0].employeesList.map(e => e) : [];

  // Create a map of employee IDs to their position (employeeNumber)
  const employeePositionMap = {};
  if (department.employees.length > 0) {
    department.employees[0].employeesList.forEach((employeeId, index) => {
      employeePositionMap[employeeId.toString()] = index + 1;
    });
  }

  // Query employees with $function to add employeeNumber directly in aggregation pipeline
  const employees = await Employee.aggregate([
    { $match: { _id: { $in: employeeIds } } },
    {
      $lookup: {
        from: "employeeSalary",
        let: { employeeId: "$_id", departmentId: new ObjectId(departmentId) },
        pipeline: [
          { $match: { $expr: { $eq: ["$employeeId", "$$employeeId"] } } }
        ],
        as: "salaryDetails"
      }
    },
    { $unwind: "$salaryDetails" },
    {
      $addFields: {
        employeeNumber: {
          $function: {
            body: function(employeeId) {
              return employeePositionMap[employeeId.toString()] || null;
            },
            args: ["$_id"],
            lang: "js"
          }
        }
      }
    },
    { $sort: { employeeNumber: 1 } } // Sort by employeeNumber in ascending order
  ]);

  return employees;
}

Time Complexity After $function
Single Aggregation Pipeline: The custom logic for adding the employeeNumber is applied directly in the aggregation pipeline, eliminating the need for separate mapping and sorting steps.

Reduced Complexity: By handling everything in the aggregation query, we reduce the time complexity significantly. The aggregation now has a more efficient O(n) complexity, as it performs both the transformation and sorting in one step.

Benefits of Using $function:
Reduced Round Trips: With $function, MongoDB handles all of the data transformations in a single query, reducing the need to process data externally.

Faster Execution: By combining operations like adding fields and sorting within the aggregation pipeline, you avoid the overhead of multiple iterations in application code, improving performance.

Cleaner Code: The logic is embedded directly into the aggregation pipeline, making the code easier to maintain and reducing the need for additional processing steps.

Conclusion
By using MongoDB’s $function operator, we not only simplify the logic but also significantly reduce the time complexity of our operations. Instead of performing multiple steps in application code, we can execute custom JavaScript directly in the aggregation pipeline, leading to faster execution times and more efficient data processing. If you’re working with complex data transformations and large datasets, the $function operator is a powerful tool to optimize your MongoDB queries.

Have you used the $function operator in your projects? Share your thoughts and experiences in the comments below!``