DEV Community: Muaz

The Clause Nobody Caught: How I Built Missing-Clause Detection for Contracts

Muaz — Tue, 23 Jun 2026 10:56:09 +0000

Most contract-analysis tools start with the same basic question:

What is wrong with the clauses in this document?

That question is useful, but incomplete. Sometimes the biggest risk is not a badly written clause. It is a clause that does not exist.

I ran into this while building AuditGuard, an AI-assisted compliance analysis tool. A client used it to review an agreement before signing. The system flagged risky language, cited potentially relevant requirements, and suggested draft replacements for human review.

The finding that changed the conversation, however, came from a different section of the report: Missing Clauses.

The agreement appeared not to address a provision that could be required under the selected compliance framework. There was no suspicious sentence for a reviewer to highlight because the relevant language was absent.

The client did not treat that output as a legal conclusion. He used it as a focused question to raise during the contract review and postponed signing until the issue had been addressed.

That case captures an interesting engineering problem: How do you search for text that is not there?

Why ordinary clause analysis misses this

A conventional contract-analysis pipeline usually looks like this:

Split the document into clauses.
Classify each clause by topic.
Retrieve regulations or policies related to its text.
Ask a model whether the clause creates a potential issue.
Generate an explanation and possible remediation.

This pipeline can identify problematic wording. It cannot reliably identify an omitted requirement because retrieval begins with the document's existing text.

If a contract says nothing about breach notification, for example, there may be no breach-notification language to retrieve the corresponding requirement.

Missing-clause detection has to reverse the direction of the search:

Instead of asking which requirements match each clause, ask whether every applicable required provision is covered by any clause.

The two-stage approach

I implemented gap analysis as a separate pass after clause extraction. It combines a deterministic retrieval stage with a constrained model review.

At a high level, the process is:

contract
  -> extract clause inventory
  -> load required provisions for selected frameworks
  -> score each provision against every clause
  -> shortlist provisions with weak or no coverage
  -> ask a model to verify the shortlist against the clause inventory
  -> report only high-confidence, apparently unaddressed provisions

The split matters. Comparing every regulation with every clause using a large model would be slow, expensive, and difficult to control. Pure similarity search would be cheaper, but it would produce too many false positives.

Stage 1: deterministic candidate selection

The first stage uses TF-IDF similarity as a fast coverage screen.

For every required provision, the system records its best similarity score across all extracted clauses:

for clause in clauses:
    for provision, score in search(clause.text):
        coverage[provision.id] = max(
            coverage.get(provision.id, 0),
            score,
        )

A required, high-impact provision becomes a gap candidate when no clause reaches the configured coverage threshold.

candidates = [
    provision
    for provision in required_provisions
    if coverage.get(provision.id, 0) < COVERAGE_THRESHOLD
]

This is deliberately only a screening step. Low lexical similarity does not prove that a provision is missing. Contracts often express the same obligation using different vocabulary.

To keep the next stage bounded, candidates are prioritized by severity and low coverage, then capped per framework.

Stage 2: conservative model verification

The second stage gives the model two things:

an inventory of the contract's clauses; and
the shortlisted required provisions.

For each provision, the model must return structured data:

{
  "regulation_ref": "...",
  "addressed": false,
  "confidence": 0.91,
  "rationale": "No clause appears to address ...",
  "matched_clause": null
}

The prompt is intentionally conservative. A provision counts as addressed when any clause covers its subject matter, even partially or with different wording. The model is also told to judge coverage—not final legal sufficiency.

The system reports a candidate only when the model says it is unaddressed and its confidence clears a minimum threshold.

This second pass filters cases where TF-IDF missed a semantic match. It also produces a short rationale that a human reviewer can verify.

Why this is not a legal conclusion

An important distinction is easy to lose in product copy:

“The system did not find coverage” is not the same as “the contract violates the law.”

Whether a provision is actually required depends on facts outside the text, including jurisdiction, the parties' roles, the data involved, and the purpose of the agreement. A model can also miss indirect coverage or misunderstand a cross-reference.

That is why the output should be presented as a review queue, not a verdict. In AuditGuard, the finding includes the source reference, rationale, confidence, and suggested draft language. The user still needs to verify applicability and wording, and should involve qualified counsel when the decision carries legal risk.

What I learned

Three design decisions made the feature more useful.

1. Treat absence detection as its own retrieval problem

Clause-by-clause analysis and gap analysis answer opposite questions. Trying to handle both in one prompt makes the logic harder to inspect and test.

2. Use models for verification, not exhaustive search

Deterministic retrieval reduces the search space. The model then handles the narrower semantic question that similarity scoring cannot answer reliably.

3. Optimize for reviewability

A missing-clause warning without a citation or rationale is difficult to trust. Each finding should tell the reviewer what may be missing, why it was selected, and which source requirement triggered it.

The broader pattern

This approach is not limited to contracts. The same pattern can help find missing controls in security policies, absent sections in technical specifications, or unaddressed requirements in procurement responses:

Build an inventory of what exists.
Define the set of requirements expected to be covered.
Cheaply shortlist weakly covered requirements.
Semantically verify those candidates.
Send uncertain results to a human.

Finding problematic text is classification. Finding missing text is coverage analysis. Treating them as separate problems makes the retrieval logic easier to test, bounds model usage, and keeps the output reviewable.

Disclosure: I built AuditGuard, the product discussed in this article. AuditGuard provides AI-assisted compliance analysis for informational purposes and does not provide legal advice. I used AI to help edit this article and reviewed its technical claims against the implementation before publication.

Where Are You Storing Your API Keys? (And Why Slack Isn't It)

Muaz — Fri, 19 Jun 2026 13:04:27 +0000

Be honest for a second.

Where are your API keys right now?

Not the answer you'd write in a security audit. The real answer.

Pinned message in your team's #dev-private Slack channel?
A .env file someone scp'd from a colleague's laptop last summer?
That one Notion page titled "secrets — don't share"?
A shared 1Password vault that hasn't been audited since 2023?
An email thread from when the newest hire was onboarded?

If even one of those gave you a tiny twinge of "yeah… that's where mine are," keep reading. There's a pattern across dev teams that's worth naming, and there are tools that fix it without costing you a year of engineering or a four-figure annual bill.

The 30-second pattern almost every startup hits

Talk to any startup CTO who's onboarded more than five engineers and the story is the same.

A lead dev quits. No drama, they just leave. And suddenly:

Half the team's API keys lived in that person's head
The other half are spread across DMs and .env files on four laptops
Nobody knows which OpenAI key is charging which project
Someone shipped a feature using a Stripe test key by accident because they copied the wrong line from a screenshot

Cue four days of auditing keys, rotating secrets, and quietly hoping nothing leaks before rotation completes.

The fix is obvious in hindsight. Treat API keys like the production assets they are. Don't share them in chat. Don't email them. Don't put them in a Notion page.

But here's the question nobody answers honestly: where, then?

How bad is "keys in random places" actually?

Two stats are enough.

31% of breaches over the past decade involved stolen credentials — Verizon 2024 Data Breach Investigations Report
23+ million secrets exposed in public GitHub commits in 2023 alone — GitGuardian's State of Secrets Sprawl

The pattern: it's almost never a James Bond villain. It's a .env file accidentally committed to a public repo. A Slack export shared with a contractor. A laptop left on a train. A junior dev who pushed a hotfix in a panic and forgot to add .env to .gitignore.

The tools to prevent this all exist. They've existed for a decade. So why do most teams still have keys in Slack?

Because the existing tools are either expensive, overengineered, or both.

The honest landscape, ranked by what dev teams actually feel

Let's walk through the real options. No marketing fluff.

1. HashiCorp Vault

The "real" answer that everyone respects and nobody on a 5-person team actually deploys.

Reality
Price	"Free" OSS, but you self-host. Add EC2, maintenance, on-call.
Setup time	Hours to days. Then more days to learn the policy language.
Team UI	Minimal. Mostly CLI + HCL policies.
Who it's for	Enterprises with a dedicated security team.

Vault is genuinely a great tool, but it's a tool for operations people who do nothing but security. Not for a frontend dev who just needs to share the staging Stripe key with the new backend hire.

2. AWS Secrets Manager

$0.40 per secret per month. Plus API calls.

Let that sink in.

Fifty keys across dev / staging / prod for a dozen services — and let's be honest, you have at least that many — is $240/year minimum, before request charges. You're also welded to AWS. If your team is on Vercel, Fly, Render, or Cloudflare Workers, congratulations: you're calling a cross-cloud API every time you need to read your own credentials.

And the UI? IAM policies. Eight pages of documentation to grant one engineer read access to one secret. Bring snacks.

3. 1Password / Bitwarden Teams

Honestly, decent. Real encryption, real teams, real UX.

But they're built for passwords, not API keys. There's no first-class concept of "this is the staging Stripe secret for the payments project." It's folders, items, custom fields. You can make it work. People do. It feels like using a hammer to drive a screw — it sort of goes in, but you can tell something's wrong.

Also: $7–$9 per user per month. A 10-person team is $70–$90/month. That's $840–$1,080 a year, every year, forever, for a tool that wasn't designed for the job.

4. The default: Slack, email, `.env` files

Zero cost
Zero encryption at rest
Zero access control — if Bob can see the channel, Bob can see every key forever, even after he leaves
Zero audit trail. "Who used that key last and when?" "Uh."

Pretty much the highest-risk path on this list, and the most popular one in the wild. Be honest: this is what your team is using.

What "good" actually looks like (in plain English)

If you sat down and wrote out what a sane API-key sharing tool should do, the list looks something like this:

Encrypt at rest. Not "we use TLS." TLS is for the network. The actual blob in the database is ciphertext, and the encryption key isn't in the same place as the data.
Per-project, per-environment organization. "Payments-staging" and "Payments-production" are two different things. Stop treating them the same.
Role-based access. Read, write, admin. The intern needs read on one project, not the keys to the kingdom.
Read-only members should not see plaintext. Underrated. If you give someone "read" access, the value should be ••••••••••••. They can see the key exists. They can't copy it. The server doesn't even send the real value.
An audit log. Who looked at what, when. No exceptions. This is what saves you when someone leaks a key — you can prove rotation and trace blast radius.
Team invitations that can't be hijacked. If you send "you've been added to the team" via email, and someone forwards that email, the recipient should NOT auto-join the org. Most invitation systems get this wrong.
Cheap or free for small teams. A 4-person startup should not be paying $300/year to not put keys in Slack.

Almost no commercial tool ticks all seven.

The tool that's getting traction in 2026: KeyVault

A free option that's been showing up in dev forums lately is KeyVault (apisharing.vercel.app). Worth a closer look because it's the first thing in this category that's actually built for small teams.

What it does, in one paragraph

You sign up and you're the boss of a fresh organization (it creates one automatically). You make a project "payments-production." You add an API key to it. It's encrypted with Fernet (AES-128-CBC + HMAC-SHA256). You invite a teammate by email. You give them read, write, or admin access per project. Read-only members see masked values. Every action is logged.

That's the product. No HCL, no IAM, no $0.40 per secret per month, no "talk to sales."

The boring details that matter (and that competitors get wrong)

A few things to highlight because they're rare in this space:

Read-only really means read-only. The masking happens on the server, not in the browser. If you give a teammate read access, the API response sends •••••••••••• with a masked: true flag. The actual ciphertext never leaves the database for that role. You can't right-click → Inspect → see the secret.
Invitation links can't be replayed against existing accounts. If your email already has a KeyVault account, accepting an invitation requires you to be signed in with that email first. The token alone doesn't grant access. (Many invitation systems are exploitable here.)
Tenant isolation enforced in SQL, not in app code. Every query has AND organization_id = %s baked in. A bug in a route handler can't accidentally leak another org's data the SQL itself refuses to return rows.
Login is constant-time. A dummy bcrypt comparison runs even when the email doesn't exist, so an attacker can't probe "is this email registered?" by timing the response.
Lockout is per (email, source-IP). Lock by email alone and any attacker can pre-lock arbitrary accounts as a denial-of-service. Per-IP keeps the legit user working from their own network.

A full technical breakdown lives at apisharing.vercel.app/llms-full.txt. It doubles as a public transparency report on how the system is built.

Pricing, plain

Free: 1 project. Unlimited keys, unlimited team members, all the security features. Forever. No card required.
Pro — $10/month for the whole org. Unlimited projects. That's the only difference.

Compare to AWS Secrets Manager at ~$240/year for the same number of secrets. KeyVault Pro is $120/year flat, for the entire team. The free tier is enough for a side project or a 2-person team that just wants to stop sharing the OpenAI key in Slack.

"Can I trust some new tool with my keys?"

Fair question. The honest answers:

Don't trust anyone blindly. Look at what they do, not what they say. The Fernet scheme is published. The masking behavior described above is testable make a read member and watch the API response. The audit log is queryable. The tenant isolation is in the SQL.
The encryption key isn't theirs. In production, KeyVault refuses to start without an explicit ENCRYPTION_KEY environment variable. The server-side admin can't dump plaintext keys without it.
The tradeoff is honest. Self-host Vault if there's time and a team for it. Use AWS Secrets Manager if the whole stack is AWS and budget isn't a factor. Use KeyVault if the team is small, tired of Slack, and doesn't want to spend a quarter setting up Vault.

The only thing worse than overpaying for security is not having it at all.

TL;DR — pick your fit

Team Size	Best Fit	Why
Solo / 2-person side project	KeyVault free tier	Encrypted, free, 30-second setup
2–10 person startup	KeyVault Pro ($10/mo flat)	Unlimited projects, beats $240+/yr on AWS Secrets Manager
10–50 person team on AWS	AWS Secrets Manager	If budget allows and stack is fully AWS
50+ engineers with security ops	HashiCorp Vault	Worth the setup cost at this scale
Mixed-tool team that already has 1Password	1Password Teams	Not ideal for keys, but acceptable if it's already paid for

Try the cheapest option first

The honest move: spend 30 seconds at apisharing.vercel.app/signup before committing to anything more expensive. Free tier is enough to migrate one project off Slack today. If the team grows, $10/month covers everyone forever.

The worst-case outcome is finding out it's not for you — total time invested, two minutes.

The best case: it's the last time you ever paste an API key into Slack.

Found this useful? Drop a 💜 or share it with the teammate who keeps pasting prod keys into the wrong channel.

About the Author

I am a freelance AI engineer. I build AI agents, RAG systems, and AI tools for real businesses. I have shipped more than 20 AI systems across 7 countries, and I finished every single project I started.

I am open for AI consulting, RAG work, AI agent work, and LLM app work. Most first versions are ready in 2 to 4 weeks.

Portfolio: muazashraf.org/portfolio
Case studies: muazashraf.org/case-studies
Hire me: muazashraf.org/contact

If this helped you, follow me here. I share simple lessons from real AI work.

5 Reasons Your RAG System Will Fail in Production (And the Patterns I Use to Fix Each One)

Muaz — Sun, 17 May 2026 19:15:36 +0000

The 80% Problem

Most RAG demos look magical. You drop in 10 PDFs, ask 3 questions, get clean answers. Ship it.

Then production hits. The document corpus grows from 10 to 10,000. Users ask questions the demo never anticipated. Edge cases stack up. Accuracy drops from 95% to 60% in two weeks. The team starts apologising to the client.

I've built 20+ production RAG systems for clients across the USA, UK, UAE, Canada, Australia, Switzerland, and Pakistan. About 80% of the RAG projects I audit before clients hire me are in this exact failure mode — they passed the demo, then collapsed under real data.

The fixes aren't more complex models. They're architectural patterns designed for failure modes from day one. Here are the five that matter most.

Failure 1: Hallucinations on edge cases

A vanilla RAG pipeline does this: embed the user query, retrieve top-k documents, stuff them into a prompt, ask the LLM to answer. When retrieval finds something, the LLM dutifully constructs an answer — even when the retrieved context is unrelated to the question.

In production, you get confident-sounding nonsense on the long tail of queries.

The fix: a self-correction loop. Before the LLM answers, force it to grade the retrieved context against the question. If the grade is poor, rewrite the query or fall back to a "I don't have enough information" response.

from langgraph.graph import StateGraph, END

def grade_relevance(state):
    docs = state["documents"]
    question = state["question"]
    prompt = f"""Given the question and retrieved documents, score 0-10 how
    relevant the documents are to answering the question. Be strict.
    Question: {question}
    Documents: {docs[:3000]}
    Respond with just a number."""
    score = int(llm.invoke(prompt).content.strip())
    return {"relevance_score": score}

def route_after_grading(state):
    if state["relevance_score"] < 6:
        return "rewrite_query"
    return "generate_answer"

graph = StateGraph(RAGState)
graph.add_node("retrieve", retrieve)
graph.add_node("grade", grade_relevance)
graph.add_node("rewrite_query", rewrite_query)
graph.add_node("generate_answer", generate_answer)
graph.add_conditional_edges("grade", route_after_grading)

I built exactly this pattern for an enterprise client — full breakdown in my Agentic RAG case study. It moved accuracy from ~70% to 90%+ on real questions, and dropped hallucinations to single digits.

Failure 2: Stale retrieval as your data changes

You ship a RAG system on Monday with 500 documents. By Friday, 50 of those documents have been edited. Your vector store still has the old embeddings.

Users ask questions about the new content. The system retrieves the old version. They lose trust.

The fix: incremental re-indexing with content hashing, not full re-builds. Hash each source document. On a schedule (or webhook), only re-embed documents whose hash changed.

import hashlib

def document_hash(text, metadata):
    payload = text + str(sorted(metadata.items()))
    return hashlib.sha256(payload.encode()).hexdigest()

def upsert_if_changed(doc_id, text, metadata, pinecone_index):
    new_hash = document_hash(text, metadata)
    existing = pinecone_index.fetch([doc_id]).vectors.get(doc_id)
    if existing and existing.metadata.get("hash") == new_hash:
        return False  # unchanged, skip
    embedding = embed(text)
    pinecone_index.upsert([{
        "id": doc_id,
        "values": embedding,
        "metadata": {**metadata, "hash": new_hash, "indexed_at": now()}
    }])
    return True

This single pattern saved a client 70% on embedding API costs and kept their knowledge base accurate without manual intervention.

Failure 3: Bad retrieval ranking

Top-k retrieval over pure semantic similarity has a known weakness: it rewards documents that sound similar to the question, not documents that answer the question. Worse, exact keyword matches (product codes, names, error codes) often get ranked below conceptually-similar-but-wrong chunks.

The fix: hybrid search + a reranker. Combine dense vector search with sparse keyword search (BM25), then run the merged candidates through a cross-encoder reranker.

from rank_bm25 import BM25Okapi
from sentence_transformers import CrossEncoder

bm25 = BM25Okapi([doc.text.split() for doc in corpus])
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def hybrid_retrieve(query, k=20):
    dense_hits = vector_store.similarity_search(query, k=k)
    sparse_hits = bm25.get_top_n(query.split(), corpus, n=k)
    candidates = dedupe(dense_hits + sparse_hits)
    pairs = [(query, c.text) for c in candidates]
    scores = reranker.predict(pairs)
    ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])
    return [c for c, _ in ranked[:5]]

Why this matters: in financial, legal, and medical use cases, missing a specific code or term means missing the entire answer. Pure semantic search misses these constantly. Hybrid + rerank fixed this for a healthcare client managing 10,000+ patient records.

Failure 4: Multimodal blindspots

Most RAG systems can't read the charts, diagrams, screenshots, or tables inside PDFs. They OCR the text and lose 40% of the information.

If your domain has visual content — research papers, technical docs, medical scans, financial reports — text-only RAG is broken by design.

The fix: vision-language embeddings (ColPali, CLIP) for image regions alongside text chunks. Index both. Let the retriever match queries against both modalities.

from colpali_engine.models import ColPali, ColPaliProcessor

processor = ColPaliProcessor.from_pretrained("vidore/colpali")
model = ColPali.from_pretrained("vidore/colpali")

def embed_page_image(pdf_page_image):
    inputs = processor(images=[pdf_page_image], return_tensors="pt")
    return model(**inputs).last_hidden_state.mean(dim=1)

# Store both text embeddings AND image embeddings in the same vector store
# with a 'modality' tag. Retrieve from both, then merge.

I built this for a research firm searching 10,000+ pages of mixed-content PDFs. Asking "show me the Q3 conversion funnel chart" actually returns the right chart now. Full writeup: Multimodal RAG with ColPali & CLIP.

Failure 5: No evaluation harness = no improvement

Most teams ship RAG without an evaluation pipeline. Then when accuracy degrades, they can't tell:

Did retrieval get worse?
Did the LLM get worse?
Did the data get harder?
Was it always this bad and we just didn't notice?

You can't fix what you can't measure.

The fix: a golden dataset + automated nightly eval. 50–100 hand-curated question/answer pairs covering your edge cases. Run them through the system every deploy. Track three metrics:

def evaluate_rag(golden_dataset, rag_system):
    results = {
        "retrieval_hit_rate": 0,    # did retrieval find the right doc?
        "answer_correctness": 0,    # did the final answer match?
        "faithfulness": 0,          # was the answer grounded in retrieved docs?
    }
    for q, expected_doc_ids, expected_answer in golden_dataset:
        retrieved = rag_system.retrieve(q)
        answer = rag_system.answer(q)
        results["retrieval_hit_rate"] += any(d.id in expected_doc_ids for d in retrieved)
        results["answer_correctness"] += llm_judge(answer, expected_answer)
        results["faithfulness"] += llm_judge_grounding(answer, retrieved)
    return {k: v / len(golden_dataset) for k, v in results.items()}

This is the single highest-leverage thing you can build. Every RAG improvement I've shipped started with one of these metrics moving in the wrong direction.

The Pattern: Design for failure on day 1

If I had to compress all 20 RAG projects into one sentence: the production-ready systems are the ones designed for failure from the first commit. Self-correction loops, hash-based incremental indexing, hybrid retrieval, multimodal embeddings, and an evaluation harness aren't optimizations you add later — they're load-bearing infrastructure.

Most "AI demos that broke in production" stories are really "demos without failure handling that met production." The fix isn't a smarter model. It's better architecture.

If you're building a RAG system that needs to survive real data, look at every component and ask: what happens when this fails? If you don't have an answer, that's the next thing to build.

About the Author

I'm Muaz Ashraf, a freelance AI engineer specialising in production-ready RAG systems, AI agents, and AI integration. I've shipped 20+ AI systems across 7 countries with a 100% project completion rate.

🔗 Portfolio: muazashraf.org/portfolio
📖 Case studies: muazashraf.org/case-studies
✉️ Hire me: muazashraf.org/contact

Open for AI consulting, RAG system development, AI agent development, and LLM application work. Typical MVP delivery: 2–4 weeks.

If you found this useful, follow me here on dev.to — I publish field notes from real production AI work.

I tested Claude Code, Gemini CLI, and OpenAI Codex for 3 months – here's the verdict

Muaz — Thu, 04 Sep 2025 18:15:53 +0000

Model Intelligence & Code Quality

Claude Code CLI ⭐⭐⭐⭐⭐ - Powered by Claude Sonnet 4 (1M context) - Consistently produces the highest quality, most maintainable code - Exceptional at capturing coding style and project conventions - Best-in-class for complex architectural decisions

Gemini CLI ⭐⭐⭐⭐☆ - Uses Gemini 2.5 Pro with massive 1M token context - 63.8% on SWE-bench Verified (trailing Claude Code) - Strong multimodal capabilities - Excellent at handling large codebases due to context size

OpenAI Codex CLI ⭐⭐⭐⭐☆ - GPT-5 achieves 74.9% on SWE-bench Verified - Highly accurate, idiomatically correct code on first try - Strong at rapid prototyping and bug fixing - Good reasoning capabilities with o4-mini integration

Developer Features & Integration

Claude Code CLI ⭐⭐⭐⭐⭐ - Complete GitHub/GitLab integration - Multi-file editing capabilities - Agentic codebase search and understanding - CLI-first design works with any editor - Advanced hooks system for workflow customization - Real-time progress tracking with TodoWrite

Gemini CLI ⭐⭐⭐⭐☆ - Built-in Google Search grounding - MCP (Model Context Protocol) support - File operations and shell commands - ReAct loop for complex task completion - Strong containerized environment support

OpenAI Codex CLI ⭐⭐⭐⭐☆ - Multimodal inputs (text, screenshots, diagrams) - Local sandboxed execution - Multiple approval modes (suggest, auto-edit, full-auto) - Comprehensive testing integration - Built-in security with command review

Real-World Performance Comparison
Scenario 1: Large Codebase Refactoring
Winner: Claude Code CLI - Superior architectural understanding - Better handling of cross-file dependencies - Most maintainable code output

Scenario 2: Quick Bug Fixing
Winner: OpenAI Codex CLI

Fastest time to resolution - Highly accurate first-try fixes - Excellent at understanding error contexts

Scenario 3: Learning New Framework
Winner: Gemini CLI - Free tier allows extensive experimentation - Excellent documentation search capabilities - Large context window for comprehensive examples

Scenario 4: Team Collaboration
Winner: Claude Code CLI - Best integration with existing workflows - Superior code style consistency - Professional-grade reliability

Verdict

MuazAshraf

Advanced RAG vs Basic RAG: When simple retrieval isn't enough (LangChain + LangGraph implementation)

Muaz — Thu, 04 Sep 2025 18:07:13 +0000

Introduction

RAG (Retrieval-Augmented Generation) has changed how AI systems work with information. But while basic RAG gets the job done, advanced RAG takes it to the next level. In this post, I'll show you the difference between basic and advanced RAG, and how modern tools like LangChain and LangGraph make building smart AI systems much easier.

I've been working with RAG systems and noticed basic retrieval fails for complex queries. Here's what I learned about advanced techniques:

Problems with Basic RAG

Can't handle multi-step reasoning
Poor context understanding
No query refinement

Advanced RAG Solutions:

Self-correcting retrieval loops
Multi-agent reasoning with LangGraph
Contextual re-ranking

Why I choose Langchain + Langgraph?

I tried my own custom logic but the code will become much complex and difficult to manage
Langchain provide built in libraries, its effective and easy to manageable.
Now you can use this advance Rag in any sector like in Education, Finance, Healthcare, you name it.

Has anyone else run into these limitations? Would love to hear your experiences.

Full technical breakdown: Advance RAG

DEV Community: Muaz

The Clause Nobody Caught: How I Built Missing-Clause Detection for Contracts

Why ordinary clause analysis misses this

The two-stage approach

Stage 1: deterministic candidate selection

Stage 2: conservative model verification

Why this is not a legal conclusion

What I learned

1. Treat absence detection as its own retrieval problem

2. Use models for verification, not exhaustive search

3. Optimize for reviewability

The broader pattern

Where Are You Storing Your API Keys? (And Why Slack Isn't It)

The 30-second pattern almost every startup hits

How bad is "keys in random places" actually?

The honest landscape, ranked by what dev teams actually feel

1. HashiCorp Vault

2. AWS Secrets Manager

3. 1Password / Bitwarden Teams

4. The default: Slack, email, .env files

What "good" actually looks like (in plain English)

The tool that's getting traction in 2026: KeyVault

What it does, in one paragraph

The boring details that matter (and that competitors get wrong)

Pricing, plain

"Can I trust some new tool with my keys?"

TL;DR — pick your fit

Try the cheapest option first

About the Author

5 Reasons Your RAG System Will Fail in Production (And the Patterns I Use to Fix Each One)

The 80% Problem

Failure 1: Hallucinations on edge cases

Failure 2: Stale retrieval as your data changes

Failure 3: Bad retrieval ranking

Failure 4: Multimodal blindspots

Failure 5: No evaluation harness = no improvement

The Pattern: Design for failure on day 1

About the Author

I tested Claude Code, Gemini CLI, and OpenAI Codex for 3 months – here's the verdict

Model Intelligence & Code Quality

Developer Features & Integration

Verdict

Advanced RAG vs Basic RAG: When simple retrieval isn't enough (LangChain + LangGraph implementation)

Introduction

I've been working with RAG systems and noticed basic retrieval fails for complex queries. Here's what I learned about advanced techniques:

4. The default: Slack, email, `.env` files