DEV Community: Ishwar

I Rebuilt My AI Legal Assistant After Learning Why Vector-Only RAG Wasn't Enough

Ishwar — Sat, 11 Jul 2026 11:18:00 +0000

This is a submission for Weekend Challenge: Passion Edition

What I Built

LawDecoder is an AI-powered legal assistant that explains Indian laws in plain language while showing the exact legal provisions used to generate each answer.

Unlike many dense-only retrieval prototypes, LawDecoder focuses on retrieval quality. It was born out of a real-world failure. One day I asked my original prototype:

"Someone forged my signature."

The first law it retrieved wasn't about forgery. It was about counterfeit coins.

That was the moment I realized the problem wasn't the LLM—it was my retrieval pipeline.

The surprising part: I didn't change the LLM. Nearly all of the massive retrieval accuracy gains came from redesigning the search architecture to combine semantic search, keyword search, Reciprocal Rank Fusion (RRF), and deterministic domain reranking.

Demo

Here is a visual walkthrough of the production UI, showing how the hybrid search handles citations and developer metrics:

1️⃣ User Chat Interface

Clean, legal explanation interface for end users:

2️⃣ Structured Offence & Citation Details

Deduplicated citations with developer metrics visible in Developer Mode:

3️⃣ System Evaluation Dashboard

Performance comparisons and technical architecture story:

Code

The complete implementation—including SQLite ingestion, FTS5 indexing, Reciprocal Rank Fusion, evaluation queries, and benchmark samples—is available on GitHub:

ishwar170695 / LawDecoder

⚖️ LawDecoder: A Case Study in Hybrid Retrieval for Legal AI

LawDecoder is an AI-powered legal assistant that explains Indian laws in plain language while showing the exact legal provisions used to generate each answer.

Unlike many dense-only retrieval demos, LawDecoder focuses on retrieval quality. It combines semantic search, keyword search, Reciprocal Rank Fusion (RRF), and deterministic reranking to improve legal citation accuracy.

🚀 Features

🔍 Hybrid Retrieval: Fuses SQLite FTS5 (sparse BM25 keyword matching) and local dense vector embeddings.
🔀 Reciprocal Rank Fusion (RRF): Fuses sparse and dense search rankings to prioritize matches returned by both.
🎯 Domain Reranker: Deterministically demotes irrelevant matches (like counterfeit stamp/coin sections) and boosts direct offences (like document forgery acts) for signature queries.
🧾 Citation Transparency: Shows the exact acts, sections, and selection details for every explanation.
🔧 Developer Mode: Toggle view to inspect RRF ranks and retrieval selection reasons.
🤖 Empathetic AI: Structured…

View on GitHub

How I Built It

The Failure: Tracing the Cause

The original vector-only (v1) search failed on a simple forgery query:

Query:
"Someone forged my signature"

Top result (v1):
❌ BNS Section 180 — Possession of counterfeit coin

Expected:
✅ BNS Section 336 — Forgery
✅ BNS Section 340 — Using forged document

The embedding model wasn't "wrong"—it placed semantically close concepts close together in vector space. But in legal search, semantically similar isn't the same as legally relevant.

Because "forgery" and "counterfeit" ended up close in embedding space, the retriever ranked counterfeit coin and government stamp provisions above the actual definition of document forgery. The retriever generalized too aggressively, missing the direct definition of forgery (BNS Section 336) because the word "signature" was semantically distant from generic statutory descriptions of the offence.

Additionally, caching large raw text strings (text, titles, act names) in a JavaScript array caused the Node.js process to consume over 320 MB of RAM at startup.

The Redesigned Retrieval Pipeline

I redesigned the retrieval pipeline to combine sparse and dense search methods.

The LLM remained almost unchanged. Nearly all improvements came from redesigning retrieval.

1. SQLite + FTS5 Sparse Indexing

I moved all 4,892 legal sections out of JSON files and persisted them in a local SQLite database. A virtual FTS5 index handles exact-match keyword indexing (BM25 ranking), ensuring precise matches for terms like "Section 65", "forgery", or "signature".

2. Lightweight Vector Memory Cache

I stripped all text metadata from Node.js memory. The startup script now loads only the id (string) and the coordinate list—pre-processed into a compact Float32Array object—into RAM. The actual text content remains on disk in SQLite and is hydrated for the top 5 matched sections on-demand, reducing memory usage by 85% (from 320 MB to 48 MB).

3. Reciprocal Rank Fusion (RRF)

Fuses the top 50 semantic matches (dense) and top 50 keyword matches (sparse) into a single unified list using the reciprocal ranks of both retrievers. Here is the core JS implementation of the RRF merge:

const rrfScores = new Map();
const k = 60; // Standard constant for RRF

// Process vector ranking positions (dense)
vectorRankings.forEach((item, index) => {
  rrfScores.set(item.id, 1 / (k + index + 1));
});

// Process FTS5 keyword ranking positions (sparse) and add to scores
ftsRankings.forEach((item, index) => {
  const existingScore = rrfScores.get(item.id) || 0;
  rrfScores.set(item.id, existingScore + (1 / (k + index + 1)));
});

// Sort matched IDs based on fused RRF score
const mergedRanking = Array.from(rrfScores.entries())
  .sort((a, b) => b[1] - a[1])
  .slice(0, 20); // Top 20 candidates for reranking

4. Domain Reranker (Deterministic Guardrail)

It evaluates the top 20 candidates returned by the RRF step. This is a deterministic rule-based reranker tailored to the legal domain—not a learned neural cross-encoder:

If the query is document/signature forgery-related, it checks if a retrieved document is a coin or banknote counterfeit section. If yes, it penalizes the score by 99% (* 0.01).
It boosts direct document forgery offences (containing "forgery" or "forged") by 300% (* 3.0).
It filters out duplicate sections using content snippet prefixes.

// Heuristic Domain Reranking
if (isDocumentForgeryRelated) {
  const isCoinOrStampOrCurrency = 
    titleLower.includes('coin') || titleLower.includes('stamp') || 
    titleLower.includes('currency') || titleLower.includes('bank-note') ||
    contentLower.includes('coin') || contentLower.includes('stamp') || 
    contentLower.includes('currency-note');

  if (isCoinOrStampOrCurrency) {
    adjustedScore *= 0.01; // heavily penalize counterfeit coin/stamps (reduce by 99%)
  } else if (titleLower.includes('forgery') || titleLower.includes('forged')) {
    adjustedScore *= 3.0; // strong boost for direct forgery definitions/offences
  }
}

Performance & Evaluation Benchmarks

To evaluate the redesign, I assembled a benchmark of 100 manually verified legal queries spanning criminal law, cybercrime, family law, consumer protection, and procedural law.

Metric	v1 (Naive Vector RAG)	v2.1 (Hybrid Search - Current)	Change
Search Engine	Dense Vector (Linear JSON scan)	Hybrid (SQLite FTS5 + Dense Vector + RRF + Reranker)	Major retrieval precision upgrade
Avg. Query Latency	`466 ms`	`12 ms`	97.4% speedup
Memory Cache Footprint	`~320 MB`	`~48 MB`	85.0% RAM savings
Duplicate Citations	Present (up to 40% overlaps)	Deduplicated (0% overlaps)	Verified
Top-5 Relevant Retrieval Rate	~68%	~91%	+23% accuracy gain

Latency is based on 100 benchmark queries. Memory is process-level heap size at startup. Accuracy is evaluated on top-5 target matches using a manually verified benchmark dataset of 100 queries.

None of these improvements required changing the language model. The gains came almost entirely from retrieval engineering. For this benchmark query, the top retrieved references aligned with the expected legal provisions:

BNS 340: Forged document and using it as genuine
BNS 336: Forgery definition and penalty
BNS 339: Possession of forged document
BNS 335: Making a false document
Evidence Act Section 65: Proof of signature and handwriting

Lessons Learned

Retrieval quality sets the upper bound for RAG quality: The LLM can only reason over what you retrieve. Improving retrieval turned out to be far more impactful than switching models.
Dense embeddings alone are rarely enough for domain-specific search: Hybrid retrieval is often a better default because it combines exact keyword matching with semantic understanding.
SQLite + FTS5 is a powerhouse: For corpora under 100,000 documents, SQLite FTS5 and typed arrays in Node.js deliver sub-15ms latency on CPU with zero operational complexity.
Simple deterministic rerankers work: They can eliminate domain-specific retrieval errors without requiring another neural model.

What's Next

If I continue evolving this project, the next improvements I'd explore are:

Cross-encoder reranking: Integrate lightweight cross-encoders (e.g. BGE reranker) for advanced ranking.
Metadata-aware retrieval: Allow users to filter queries by Act or category before searching.
Legal Case Retrieval: Expand indexing to cover legal precedents and court cases.
Multilingual support: Support query translation for regional languages.

Prize Categories

Best Use of Google AI

LawDecoder integrates the Google Gemini API (gemini-3.5-flash or custom models) for generating context-grounded, empathetic, and structured legal advice derived from the hybrid SQLite-FTS5 retrieval output.

I Rebuilt My AI Legal Assistant After Learning Why Vector-Only RAG Wasn't Enough

Ishwar — Sat, 11 Jul 2026 11:11:49 +0000

I built a legal AI assistant last year.

One day I asked it:

"Someone forged my signature."

The first law it retrieved wasn't about forgery.

It was about counterfeit coins.

That was the moment I realized the problem wasn't the LLM.

It was my retrieval pipeline.

The surprising part: I didn't change the LLM.

Nearly all of the improvement came from redesigning the retrieval pipeline.

Here is the story of how I debugged my search system, why dense vector search alone falls apart on domain-specific datasets, and how I redesigned the retrieval pipeline.

The Failure

To understand why the system was failing, we need to look at the mismatch between what was queried and what was actually retrieved in the original vector-only (v1) search:

Query:
"Someone forged my signature"

Top result (v1):
❌ BNS Section 180 — Possession of counterfeit coin

Expected:
✅ BNS Section 336 — Forgery
✅ BNS Section 340 — Using forged document

Tracing the Cause

I started tracing the retrieval pipeline step by step to understand why the wrong statutes were consistently ranking first.

The embedding model wasn't "wrong." It was doing exactly what it was trained to do: place semantically similar concepts close together in vector space.

Unfortunately, in legal search, semantically similar isn't the same as legally relevant.

Because "forgery" and "counterfeit" ended up close in embedding space, the retriever ranked counterfeit coin and government stamp provisions above the actual definition of document forgery. The retriever generalized too aggressively. It missed the specific statutory definition of forgery (BNS Section 336) because the word "signature" was semantically distant from generic statutory descriptions of the offence.

The Secondary Issues: RAM Bloat & Duplicates

In addition to generalising incorrectly, caching large raw text strings (text, titles, act names) in a JavaScript array caused the Node.js process to consume over 320 MB of RAM at startup. Furthermore, since legal codes are highly repetitive, the search regularly returned duplicate entries of identical sections across different personal laws, cluttering the LLM's context window.

Redesigning the Retrieval Pipeline

I realized that to make the assistant trustworthy, the real problem was search quality, not LLM capability. I redesigned the retrieval pipeline into a structured, hybrid search engine.

![LawDecoder hybrid retrieval architecture diagram: SQLite FTS5 sparse keyword index and dense vector ONNX pipeline merged via Reciprocal Rank Fusion (RRF) and Domain Reranker]

The LLM remained almost unchanged. Nearly all improvements came from redesigning retrieval.

Step 1: SQLite + FTS5

I moved all 4,892 legal sections out of JSON files and persisted them in a local SQLite database. I created a virtual table using the FTS5 extension to index all chapters, titles, and text contents.

Now, exact terms are queried using a sparse keyword index (ranked via BM25), ensuring that queries containing specific statutory terms match their target immediately.

Step 2: Lightweight Vector Cache

I stripped all text metadata from Node.js memory. The startup script now loads only the id (string) and the coordinate list—pre-processed into a compact Float32Array object—into RAM.

The actual text content remains on disk in SQLite and is only hydrated for the top 5 matched sections. This dropped the memory footprint by 85% (from 320 MB to 48 MB).

Step 3: Reciprocal Rank Fusion (RRF)

Instead of relying on either vector search or keyword search, I fused them. The backend runs both searches, takes the top 50 matches from each, and combines their rankings using standard Reciprocal Rank Fusion (RRF). RRF rewards documents that rank highly in both methods without needing to normalize scores between sparse BM25 and dense cosine models.

Here is the core JS implementation of the RRF merge:

const rrfScores = new Map();
const k = 60; // Standard constant for RRF

// Process vector ranking positions (dense)
vectorRankings.forEach((item, index) => {
  rrfScores.set(item.id, 1 / (k + index + 1));
});

// Process FTS5 keyword ranking positions (sparse) and add to scores
ftsRankings.forEach((item, index) => {
  const existingScore = rrfScores.get(item.id) || 0;
  rrfScores.set(item.id, existingScore + (1 / (k + index + 1)));
});

// Sort matched IDs based on fused RRF score
const mergedRanking = Array.from(rrfScores.entries())
  .sort((a, b) => b[1] - a[1])
  .slice(0, 20); // Top 20 candidates for reranking

Step 4: Domain Reranker (Deterministic Guardrail)

To resolve the counterfeit coin noise without running a heavy, slow transformer cross-encoder, I built a lightweight, deterministic Domain Reranker in JavaScript. This is a deterministic rule-based reranker tailored to the legal domain—not a learned neural cross-encoder.

It loads the top 20 candidates returned by the RRF step and checks for specific intent signals:

If the query is document/signature forgery-related, it checks if a retrieved document is a coin or banknote counterfeit section. If yes, it penalizes the score by 99% (* 0.01).
It boosts direct document forgery offences (containing "forgery" or "forged") by 300% (* 3.0).
It filters out duplicate sections using content snippet prefixes.

// Heuristic Domain Reranking
if (isDocumentForgeryRelated) {
  const isCoinOrStampOrCurrency = 
    titleLower.includes('coin') || titleLower.includes('stamp') || 
    titleLower.includes('currency') || titleLower.includes('bank-note') ||
    contentLower.includes('coin') || contentLower.includes('stamp') || 
    contentLower.includes('currency-note');

  if (isCoinOrStampOrCurrency) {
    adjustedScore *= 0.01; // heavily penalize counterfeit coin/stamps (reduce by 99%)
  } else if (titleLower.includes('forgery') || titleLower.includes('forged')) {
    adjustedScore *= 3.0; // strong boost for direct forgery definitions/offences
  }
}

Performance & Evaluation

To evaluate the redesign, I assembled a benchmark of 100 manually verified legal queries spanning criminal law, cybercrime, family law, consumer protection, and procedural law.

Metric	v1 (Naive Vector RAG)	v2.1 (Hybrid Search - Current)	Change
Search Engine	Dense Vector (Linear JSON scan)	Hybrid (SQLite FTS5 + Dense Vector + RRF + Reranker)	Major retrieval precision upgrade
Avg. Query Latency	`466 ms`	`12 ms`	97.4% speedup
Memory Cache Footprint	`~320 MB`	`~48 MB`	85.0% RAM savings
Duplicate Citations	Present (up to 40% overlaps)	Deduplicated (0% overlaps)	Verified
Top-5 Relevant Retrieval Rate	~68%	~91%	+23% accuracy gain

Latency is based on 100 benchmark queries. Memory is process-level heap size at startup. Accuracy is evaluated on top-5 target matches using a manually verified benchmark dataset of 100 queries.

BNS 340: Forged document and using it as genuine
BNS 336: Forgery definition and penalty
BNS 339: Possession of forged document
BNS 335: Making a false document
Evidence Act Section 65: Proof of signature and handwriting

Production Walkthrough

1️⃣ User Chat Interface

Clean, legal explanation interface for end users:

2️⃣ Structured Offence & Citation Details

Deduplicated citations with developer metrics visible in Developer Mode:

3️⃣ System Evaluation Dashboard

Performance comparisons and technical architecture story:

Lessons Learned

Retrieval quality sets the upper bound for RAG quality.
Dense embeddings alone are rarely enough for domain-specific search.
SQLite + FTS5 can be an excellent retrieval engine for small-to-medium corpora.
Simple deterministic rerankers can eliminate domain-specific retrieval errors without requiring another neural model.

If I continue evolving this project, the next improvements I'd explore are:

Cross-encoder reranking: Integrate lightweight cross-encoders (e.g. BGE reranker) for advanced ranking.
Metadata-aware retrieval: Allow users to filter queries by Act or category before searching.
Legal Case Retrieval: Expand indexing to cover legal precedents and court cases in addition to statutory acts.
Multilingual support: Support query translation for regional languages.

Final Thoughts

Going into this project, I assumed improving the LLM would improve the assistant.

Instead, I learned that retrieval quality determines the ceiling of any RAG system.

The LLM can only reason over what you retrieve.

Improving retrieval turned out to be far more impactful than switching models.

If you're building domain-specific AI—whether for legal, medical, or enterprise search—I'd recommend spending as much time on retrieval engineering as prompt engineering.

Repository

The complete implementation—including SQLite ingestion, FTS5 indexing, Reciprocal Rank Fusion, evaluation queries, and benchmark samples—is available on GitHub.

GitHub: https://github.com/ishwar170695/LawDecoder

I Audited $753 of Coding-Agent Usage. I Found 94.5% Context Reuse.

Ishwar — Fri, 19 Jun 2026 11:10:25 +0000

I expected prompt caching to be one of the biggest cost optimizations for coding agents.

After all, every request carries system prompts, tool definitions, and instructions that rarely change. Caching those felt like free money.

I also expected a second lever: unused retrieved context. Agents constantly read files, fetch logs, inspect directories, and explore codebases. Surely a meaningful fraction of that context never actually influences the final output.

So I built a small tool, context-audit, and ran it across 27 real coding-agent sessions representing $753.24 of input spend.

Both assumptions didn't survive the benchmark.

What I Measured

Across 27 sessions:

Metric	Value
Context Reuse	94.5%
Novel Context	5.5%
Prompt Cache Savings	1.0%
Unused Context Cost	0.4%

Prompt caching accounted for only 1.0% of potential savings.

Unused retrieved context accounted for just 0.4%.

Meanwhile, 94.5% of all context tokens were repeated content.

The dominant cost wasn't unused retrieval.

It wasn't static prompts.

It was accumulated conversation history.

The same information kept appearing again and again as sessions grew longer.

The Bigger the Session, the More Repetitive It Became

Context reuse increased sharply with session size:

Final Context Size	Avg Reuse
< 5k tokens	66.3%
5k–20k tokens	92.5%
20k–50k tokens	96.8%
> 50k tokens	99.2%

The longer a session ran, the more repetitive it became.

That was not what I expected to find.

My optimization instincts were pointing at the wrong bottleneck.

The Insight: Coding Agents Have Two Memory Systems

While digging through transcripts, I started thinking about coding agents differently from chatbots.

They appear to operate with two distinct memory systems.

Workspace Memory

Examples:

Files
Logs
Terminal outputs
Build artifacts
Compiler errors
Directory structures

This information often exists on disk.

The agent can frequently reconstruct it by reading the workspace again.

Conversational Memory

Examples:

User preferences
Design decisions
Rejected ideas
Constraints
Trade-offs
Architectural rationale

This information exists only inside the conversation.

Once it's removed, it may be gone for good.

That distinction changed how I think about context management.

Not all context is equally disposable.

A Pruning Failure Mode

Picture a long coding session.

Early in the conversation, the user decides:

No embeddings
No LLM-as-a-judge
No HTML dashboards

The team discusses alternatives and agrees on a simpler approach.

Fifty turns later, those decisions may no longer exist anywhere except the conversation history.

The workspace still contains the code.

But it doesn't contain every rejected path.

A naive pruning strategy removes what appears to be old conversation noise.

Unfortunately, it may also remove the reasoning behind the project.

The result is an agent that suddenly starts recommending ideas the user explicitly rejected earlier.

Concretely, a summarizer that compresses primarily by recency may preserve:

the latest file tree,
recent command outputs,
recent compiler logs,

while dropping a critical design decision made dozens of turns earlier.

The expensive-looking context survives.

The cheap-looking context that actually mattered disappears.

Why This Matters

If you're manually pruning, summarizing, or compressing long-running coding-agent sessions, you may be deleting the rationale behind decisions rather than the expensive parts of the context.

Workspace state is often reconstructable.

Conversational decisions frequently are not.

That doesn't mean pruning is wrong.

It means pruning needs to distinguish between:

Technical execution history that can be recovered from the workspace.
Alignment history that only exists in the conversation.

Treating both categories the same can produce subtle regressions.

What This Doesn't Prove

This isn't a universal law.

Twenty-seven sessions are enough for an interesting observation, not enough to claim every coding agent behaves this way.

The benchmark covers coding-agent workflows with disk-backed state.

It does not cover:

RAG systems
General chatbots
Research agents
Customer support agents
Other non-coding workflows

The findings should be interpreted within that scope.

But they were enough to overturn my expectations.

I started this project expecting prompt caching and retrieval waste to dominate.

In this dataset, they barely moved the needle.

Try It Yourself

# Audit a single transcript
context-audit run transcript.jsonl

# Benchmark an entire directory
context-audit benchmark ~/.claude/projects

The tool reports:

Context reuse ratios
Estimated costs
Repeated blocks
Context growth patterns
Potential caching savings

Repository

GitHub: context-audit

The benchmark changed how I think about coding-agent memory.

I started this project looking for waste in static prompts and retrieval.

Instead, I found a system spending most of its context budget carrying forward its own history.

If you're running long Claude Code, Cursor, Aider, or similar coding-agent workflows, I'd love to know whether you're seeing the same pattern.

Why your devcontainer fails on corporate networks (and how to fix it)

Ishwar — Sun, 31 May 2026 09:06:56 +0000

You set up a devcontainer, try to run npm install or pip install, and it just fails. SSL error. Certificate verify failed. You Google it for an hour and find nothing useful. If you're on a corporate network, this is almost certainly your company's proxy intercepting HTTPS traffic with its own certificate and your container has no idea that cert exists.

Your host machine trusts that proxy cert because IT installed it in your OS cert store. But your devcontainer is a fresh Linux environment. It doesn't inherit anything from your host. So every HTTPS request your tools make inside the container fails verification.

I kept seeing this problem come up in devcontainer issues and Discord threads with no clean fix. Every solution involved editing Dockerfiles or committing certs to repos.

So I built CertSync to handle it properly. It scans your host cert store, detects corporate/MITM certs automatically, and injects them into your devcontainer, no Dockerfile changes, no committing certs to your repo. One command and your container trusts the same roots your host does.

github.com/ishwar170695/certsync