<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Muaz</title>
    <description>The latest articles on DEV Community by Muaz (@muazashraf).</description>
    <link>https://dev.to/muazashraf</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1067809%2F6afb56d1-68c0-421f-90a4-aab7f4e8d13a.jpg</url>
      <title>DEV Community: Muaz</title>
      <link>https://dev.to/muazashraf</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/muazashraf"/>
    <language>en</language>
    <item>
      <title>The Clause Nobody Caught: How I Built Missing-Clause Detection for Contracts</title>
      <dc:creator>Muaz</dc:creator>
      <pubDate>Tue, 23 Jun 2026 10:56:09 +0000</pubDate>
      <link>https://dev.to/muazashraf/the-clause-nobody-caught-how-i-built-missing-clause-detection-for-contracts-3eef</link>
      <guid>https://dev.to/muazashraf/the-clause-nobody-caught-how-i-built-missing-clause-detection-for-contracts-3eef</guid>
      <description>&lt;p&gt;Most contract-analysis tools start with the same basic question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What is wrong with the clauses in this document?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That question is useful, but incomplete. Sometimes the biggest risk is not a badly written clause. It is a clause that does not exist.&lt;/p&gt;

&lt;p&gt;I ran into this while building &lt;a href="https://auditguard.org" rel="noopener noreferrer"&gt;AuditGuard&lt;/a&gt;, an AI-assisted compliance analysis tool. A client used it to review an agreement before signing. The system flagged risky language, cited potentially relevant requirements, and suggested draft replacements for human review.&lt;/p&gt;

&lt;p&gt;The finding that changed the conversation, however, came from a different section of the report: &lt;strong&gt;Missing Clauses&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The agreement appeared not to address a provision that could be required under the selected compliance framework. There was no suspicious sentence for a reviewer to highlight because the relevant language was absent.&lt;/p&gt;

&lt;p&gt;The client did not treat that output as a legal conclusion. He used it as a focused question to raise during the contract review and postponed signing until the issue had been addressed.&lt;/p&gt;

&lt;p&gt;That case captures an interesting engineering problem: &lt;strong&gt;How do you search for text that is not there?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why ordinary clause analysis misses this
&lt;/h2&gt;

&lt;p&gt;A conventional contract-analysis pipeline usually looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Split the document into clauses.&lt;/li&gt;
&lt;li&gt;Classify each clause by topic.&lt;/li&gt;
&lt;li&gt;Retrieve regulations or policies related to its text.&lt;/li&gt;
&lt;li&gt;Ask a model whether the clause creates a potential issue.&lt;/li&gt;
&lt;li&gt;Generate an explanation and possible remediation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pipeline can identify problematic wording. It cannot reliably identify an omitted requirement because retrieval begins with the document's existing text.&lt;/p&gt;

&lt;p&gt;If a contract says nothing about breach notification, for example, there may be no breach-notification language to retrieve the corresponding requirement.&lt;/p&gt;

&lt;p&gt;Missing-clause detection has to reverse the direction of the search:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instead of asking which requirements match each clause, ask whether every applicable required provision is covered by any clause.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The two-stage approach
&lt;/h2&gt;

&lt;p&gt;I implemented gap analysis as a separate pass after clause extraction. It combines a deterministic retrieval stage with a constrained model review.&lt;/p&gt;

&lt;p&gt;At a high level, the process is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;contract
  -&amp;gt; extract clause inventory
  -&amp;gt; load required provisions for selected frameworks
  -&amp;gt; score each provision against every clause
  -&amp;gt; shortlist provisions with weak or no coverage
  -&amp;gt; ask a model to verify the shortlist against the clause inventory
  -&amp;gt; report only high-confidence, apparently unaddressed provisions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The split matters. Comparing every regulation with every clause using a large model would be slow, expensive, and difficult to control. Pure similarity search would be cheaper, but it would produce too many false positives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1: deterministic candidate selection
&lt;/h3&gt;

&lt;p&gt;The first stage uses TF-IDF similarity as a fast coverage screen.&lt;/p&gt;

&lt;p&gt;For every required provision, the system records its best similarity score across all extracted clauses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;clause&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;clauses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;provision&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;coverage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;provision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;coverage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;provision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A required, high-impact provision becomes a gap candidate when no clause reaches the configured coverage threshold.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;provision&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;provision&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;required_provisions&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;coverage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;provision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;COVERAGE_THRESHOLD&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is deliberately only a screening step. Low lexical similarity does not prove that a provision is missing. Contracts often express the same obligation using different vocabulary.&lt;/p&gt;

&lt;p&gt;To keep the next stage bounded, candidates are prioritized by severity and low coverage, then capped per framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: conservative model verification
&lt;/h3&gt;

&lt;p&gt;The second stage gives the model two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an inventory of the contract's clauses; and&lt;/li&gt;
&lt;li&gt;the shortlisted required provisions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For each provision, the model must return structured data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"regulation_ref"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"addressed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.91&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"No clause appears to address ..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"matched_clause"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The prompt is intentionally conservative. A provision counts as addressed when any clause covers its subject matter, even partially or with different wording. The model is also told to judge coverage—not final legal sufficiency.&lt;/p&gt;

&lt;p&gt;The system reports a candidate only when the model says it is unaddressed and its confidence clears a minimum threshold.&lt;/p&gt;

&lt;p&gt;This second pass filters cases where TF-IDF missed a semantic match. It also produces a short rationale that a human reviewer can verify.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is not a legal conclusion
&lt;/h2&gt;

&lt;p&gt;An important distinction is easy to lose in product copy:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The system did not find coverage” is not the same as “the contract violates the law.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Whether a provision is actually required depends on facts outside the text, including jurisdiction, the parties' roles, the data involved, and the purpose of the agreement. A model can also miss indirect coverage or misunderstand a cross-reference.&lt;/p&gt;

&lt;p&gt;That is why the output should be presented as a review queue, not a verdict. In AuditGuard, the finding includes the source reference, rationale, confidence, and suggested draft language. The user still needs to verify applicability and wording, and should involve qualified counsel when the decision carries legal risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;Three design decisions made the feature more useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Treat absence detection as its own retrieval problem
&lt;/h3&gt;

&lt;p&gt;Clause-by-clause analysis and gap analysis answer opposite questions. Trying to handle both in one prompt makes the logic harder to inspect and test.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use models for verification, not exhaustive search
&lt;/h3&gt;

&lt;p&gt;Deterministic retrieval reduces the search space. The model then handles the narrower semantic question that similarity scoring cannot answer reliably.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Optimize for reviewability
&lt;/h3&gt;

&lt;p&gt;A missing-clause warning without a citation or rationale is difficult to trust. Each finding should tell the reviewer what may be missing, why it was selected, and which source requirement triggered it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The broader pattern
&lt;/h2&gt;

&lt;p&gt;This approach is not limited to contracts. The same pattern can help find missing controls in security policies, absent sections in technical specifications, or unaddressed requirements in procurement responses:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build an inventory of what exists.&lt;/li&gt;
&lt;li&gt;Define the set of requirements expected to be covered.&lt;/li&gt;
&lt;li&gt;Cheaply shortlist weakly covered requirements.&lt;/li&gt;
&lt;li&gt;Semantically verify those candidates.&lt;/li&gt;
&lt;li&gt;Send uncertain results to a human.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Finding problematic text is classification. Finding missing text is coverage analysis. Treating them as separate problems makes the retrieval logic easier to test, bounds model usage, and keeps the output reviewable.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Disclosure: I built AuditGuard, the product discussed in this article. AuditGuard provides AI-assisted compliance analysis for informational purposes and does not provide legal advice. I used AI to help edit this article and reviewed its technical claims against the implementation before publication.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>legaltech</category>
      <category>showdev</category>
      <category>abotwrotethis</category>
    </item>
    <item>
      <title>Where Are You Storing Your API Keys? (And Why Slack Isn't It)</title>
      <dc:creator>Muaz</dc:creator>
      <pubDate>Fri, 19 Jun 2026 13:04:27 +0000</pubDate>
      <link>https://dev.to/muazashraf/where-are-you-storing-your-api-keys-and-why-slack-isnt-it-hn</link>
      <guid>https://dev.to/muazashraf/where-are-you-storing-your-api-keys-and-why-slack-isnt-it-hn</guid>
      <description>&lt;p&gt;Be honest for a second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where are your API keys right now?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not the answer you'd write in a security audit. The real answer.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pinned message in your team's &lt;code&gt;#dev-private&lt;/code&gt; Slack channel?&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;.env&lt;/code&gt; file someone &lt;code&gt;scp&lt;/code&gt;'d from a colleague's laptop last summer?&lt;/li&gt;
&lt;li&gt;That one Notion page titled "secrets — don't share"?&lt;/li&gt;
&lt;li&gt;A shared 1Password vault that hasn't been audited since 2023?&lt;/li&gt;
&lt;li&gt;An email thread from when the newest hire was onboarded?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If even one of those gave you a tiny twinge of "yeah… that's where mine are," keep reading. There's a pattern across dev teams that's worth naming, and there are tools that fix it without costing you a year of engineering or a four-figure annual bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 30-second pattern almost every startup hits
&lt;/h2&gt;

&lt;p&gt;Talk to any startup CTO who's onboarded more than five engineers and the story is the same.&lt;/p&gt;

&lt;p&gt;A lead dev quits. No drama, they just leave. And suddenly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Half the team's API keys lived in &lt;strong&gt;that&lt;/strong&gt; person's head&lt;/li&gt;
&lt;li&gt;The other half are spread across DMs and &lt;code&gt;.env&lt;/code&gt; files on four laptops&lt;/li&gt;
&lt;li&gt;Nobody knows which OpenAI key is charging which project&lt;/li&gt;
&lt;li&gt;Someone shipped a feature using a Stripe &lt;strong&gt;test&lt;/strong&gt; key by accident because they copied the wrong line from a screenshot&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cue four days of auditing keys, rotating secrets, and quietly hoping nothing leaks before rotation completes.&lt;/p&gt;

&lt;p&gt;The fix is obvious in hindsight. Treat API keys like the production assets they are. Don't share them in chat. Don't email them. Don't put them in a Notion page.&lt;/p&gt;

&lt;p&gt;But here's the question nobody answers honestly: &lt;strong&gt;where, then?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How bad is "keys in random places" actually?
&lt;/h2&gt;

&lt;p&gt;Two stats are enough.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;31%&lt;/strong&gt; of breaches over the past decade involved stolen credentials — Verizon 2024 Data Breach Investigations Report&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;23+ million secrets&lt;/strong&gt; exposed in public GitHub commits in 2023 alone — GitGuardian's State of Secrets Sprawl&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern: it's almost never a James Bond villain. It's a &lt;code&gt;.env&lt;/code&gt; file accidentally committed to a public repo. A Slack export shared with a contractor. A laptop left on a train. A junior dev who pushed a hotfix in a panic and forgot to add &lt;code&gt;.env&lt;/code&gt; to &lt;code&gt;.gitignore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The tools to prevent this all exist. They've existed for a decade. So why do most teams still have keys in Slack?&lt;/p&gt;

&lt;p&gt;Because the existing tools are either &lt;strong&gt;expensive&lt;/strong&gt;, &lt;strong&gt;overengineered&lt;/strong&gt;, or both.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest landscape, ranked by what dev teams actually feel
&lt;/h2&gt;

&lt;p&gt;Let's walk through the real options. No marketing fluff.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. HashiCorp Vault
&lt;/h3&gt;

&lt;p&gt;The "real" answer that everyone respects and nobody on a 5-person team actually deploys.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Reality&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;"Free" OSS, but you self-host. Add EC2, maintenance, on-call.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup time&lt;/td&gt;
&lt;td&gt;Hours to days. Then more days to learn the policy language.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team UI&lt;/td&gt;
&lt;td&gt;Minimal. Mostly CLI + HCL policies.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Who it's for&lt;/td&gt;
&lt;td&gt;Enterprises with a dedicated security team.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Vault is genuinely a great tool, but it's a tool for &lt;strong&gt;operations people who do nothing but security&lt;/strong&gt;. Not for a frontend dev who just needs to share the staging Stripe key with the new backend hire.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. AWS Secrets Manager
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;$0.40 per secret per month. Plus API calls.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let that sink in.&lt;/p&gt;

&lt;p&gt;Fifty keys across dev / staging / prod for a dozen services — and let's be honest, you have at least that many — is &lt;strong&gt;$240/year minimum&lt;/strong&gt;, before request charges. You're also welded to AWS. If your team is on Vercel, Fly, Render, or Cloudflare Workers, congratulations: you're calling a cross-cloud API every time you need to read your own credentials.&lt;/p&gt;

&lt;p&gt;And the UI? IAM policies. Eight pages of documentation to grant one engineer read access to one secret. Bring snacks.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. 1Password / Bitwarden Teams
&lt;/h3&gt;

&lt;p&gt;Honestly, decent. Real encryption, real teams, real UX.&lt;/p&gt;

&lt;p&gt;But they're built for &lt;strong&gt;passwords&lt;/strong&gt;, not API keys. There's no first-class concept of "this is the staging Stripe secret for the payments project." It's folders, items, custom fields. You can make it work. People do. It feels like using a hammer to drive a screw — it sort of goes in, but you can tell something's wrong.&lt;/p&gt;

&lt;p&gt;Also: &lt;strong&gt;$7–$9 per user per month&lt;/strong&gt;. A 10-person team is $70–$90/month. That's &lt;strong&gt;$840–$1,080 a year&lt;/strong&gt;, every year, forever, for a tool that wasn't designed for the job.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The default: Slack, email, &lt;code&gt;.env&lt;/code&gt; files
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Zero cost&lt;/li&gt;
&lt;li&gt;Zero encryption at rest&lt;/li&gt;
&lt;li&gt;Zero access control — if Bob can see the channel, Bob can see every key forever, even after he leaves&lt;/li&gt;
&lt;li&gt;Zero audit trail. "Who used that key last and when?" "Uh."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pretty much &lt;strong&gt;the highest-risk path on this list&lt;/strong&gt;, and the most popular one in the wild. Be honest: this is what your team is using.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "good" actually looks like (in plain English)
&lt;/h2&gt;

&lt;p&gt;If you sat down and wrote out what a sane API-key sharing tool should do, the list looks something like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Encrypt at rest.&lt;/strong&gt; Not "we use TLS." TLS is for the network. The actual blob in the database is ciphertext, and the encryption key isn't in the same place as the data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-project, per-environment organization.&lt;/strong&gt; "Payments-staging" and "Payments-production" are two different things. Stop treating them the same.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Role-based access.&lt;/strong&gt; Read, write, admin. The intern needs read on one project, not the keys to the kingdom.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-only members should not see plaintext.&lt;/strong&gt; Underrated. If you give someone "read" access, the value should be &lt;code&gt;••••••••••••&lt;/code&gt;. They can see the key exists. They can't copy it. The server doesn't even send the real value.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An audit log.&lt;/strong&gt; Who looked at what, when. No exceptions. This is what saves you when someone leaks a key — you can prove rotation and trace blast radius.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team invitations that can't be hijacked.&lt;/strong&gt; If you send "you've been added to the team" via email, and someone forwards that email, the recipient should NOT auto-join the org. Most invitation systems get this wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cheap or free for small teams.&lt;/strong&gt; A 4-person startup should not be paying $300/year to not put keys in Slack.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Almost no commercial tool ticks all seven.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tool that's getting traction in 2026: KeyVault
&lt;/h2&gt;

&lt;p&gt;A free option that's been showing up in dev forums lately is &lt;a href="https://apisharing.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;KeyVault&lt;/strong&gt;&lt;/a&gt; (&lt;code&gt;apisharing.vercel.app&lt;/code&gt;). Worth a closer look because it's the first thing in this category that's actually built for small teams.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it does, in one paragraph
&lt;/h3&gt;

&lt;p&gt;You sign up and you're the boss of a fresh organization (it creates one automatically). You make a project "payments-production." You add an API key to it. It's encrypted with &lt;strong&gt;Fernet&lt;/strong&gt; (AES-128-CBC + HMAC-SHA256). You invite a teammate by email. You give them &lt;code&gt;read&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;, or &lt;code&gt;admin&lt;/code&gt; access &lt;strong&gt;per project&lt;/strong&gt;. Read-only members see masked values. Every action is logged.&lt;/p&gt;

&lt;p&gt;That's the product. No HCL, no IAM, no $0.40 per secret per month, no "talk to sales."&lt;/p&gt;

&lt;h3&gt;
  
  
  The boring details that matter (and that competitors get wrong)
&lt;/h3&gt;

&lt;p&gt;A few things to highlight because they're rare in this space:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read-only really means read-only.&lt;/strong&gt; The masking happens on the server, not in the browser. If you give a teammate &lt;code&gt;read&lt;/code&gt; access, the API response sends &lt;code&gt;••••••••••••&lt;/code&gt; with a &lt;code&gt;masked: true&lt;/code&gt; flag. The actual ciphertext never leaves the database for that role. You can't right-click → Inspect → see the secret.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invitation links can't be replayed against existing accounts.&lt;/strong&gt; If your email already has a KeyVault account, accepting an invitation requires you to be &lt;strong&gt;signed in with that email first&lt;/strong&gt;. The token alone doesn't grant access. (Many invitation systems are exploitable here.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tenant isolation enforced in SQL, not in app code.&lt;/strong&gt; Every query has &lt;code&gt;AND organization_id = %s&lt;/code&gt; baked in. A bug in a route handler can't accidentally leak another org's data the SQL itself refuses to return rows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Login is constant-time.&lt;/strong&gt; A dummy bcrypt comparison runs even when the email doesn't exist, so an attacker can't probe "is this email registered?" by timing the response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lockout is per &lt;code&gt;(email, source-IP)&lt;/code&gt;.&lt;/strong&gt; Lock by email alone and any attacker can pre-lock arbitrary accounts as a denial-of-service. Per-IP keeps the legit user working from their own network.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A full technical breakdown lives at &lt;a href="https://apisharing.vercel.app/llms-full.txt" rel="noopener noreferrer"&gt;apisharing.vercel.app/llms-full.txt&lt;/a&gt;. It doubles as a public transparency report on how the system is built.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pricing, plain
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free&lt;/strong&gt;: 1 project. Unlimited keys, unlimited team members, all the security features. Forever. No card required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pro — $10/month&lt;/strong&gt; for the whole org. Unlimited projects. That's the only difference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compare to AWS Secrets Manager at ~$240/year for the same number of secrets. KeyVault Pro is &lt;strong&gt;$120/year flat, for the entire team&lt;/strong&gt;. The free tier is enough for a side project or a 2-person team that just wants to stop sharing the OpenAI key in Slack.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Can I trust some new tool with my keys?"
&lt;/h2&gt;

&lt;p&gt;Fair question. The honest answers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Don't trust anyone blindly.&lt;/strong&gt; Look at what they do, not what they say. The Fernet scheme is published. The masking behavior described above is testable make a &lt;code&gt;read&lt;/code&gt; member and watch the API response. The audit log is queryable. The tenant isolation is in the SQL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The encryption key isn't theirs.&lt;/strong&gt; In production, KeyVault refuses to start without an explicit &lt;code&gt;ENCRYPTION_KEY&lt;/code&gt; environment variable. The server-side admin can't dump plaintext keys without it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The tradeoff is honest.&lt;/strong&gt; Self-host Vault if there's time and a team for it. Use AWS Secrets Manager if the whole stack is AWS and budget isn't a factor. Use KeyVault if the team is small, tired of Slack, and doesn't want to spend a quarter setting up Vault.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The only thing worse than overpaying for security is not having it at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR — pick your fit
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Team Size&lt;/th&gt;
&lt;th&gt;Best Fit&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Solo / 2-person side project&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;KeyVault free tier&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Encrypted, free, 30-second setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2–10 person startup&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;KeyVault Pro ($10/mo flat)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unlimited projects, beats $240+/yr on AWS Secrets Manager&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10–50 person team on AWS&lt;/td&gt;
&lt;td&gt;AWS Secrets Manager&lt;/td&gt;
&lt;td&gt;If budget allows and stack is fully AWS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50+ engineers with security ops&lt;/td&gt;
&lt;td&gt;HashiCorp Vault&lt;/td&gt;
&lt;td&gt;Worth the setup cost at this scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed-tool team that already has 1Password&lt;/td&gt;
&lt;td&gt;1Password Teams&lt;/td&gt;
&lt;td&gt;Not ideal for keys, but acceptable if it's already paid for&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Try the cheapest option first
&lt;/h2&gt;

&lt;p&gt;The honest move: spend 30 seconds at &lt;a href="https://apisharing.vercel.app/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;apisharing.vercel.app/signup&lt;/strong&gt;&lt;/a&gt; before committing to anything more expensive. Free tier is enough to migrate one project off Slack today. If the team grows, $10/month covers everyone forever.&lt;/p&gt;

&lt;p&gt;The worst-case outcome is finding out it's not for you — total time invested, two minutes.&lt;/p&gt;

&lt;p&gt;The best case: it's the last time you ever paste an API key into Slack.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? Drop a 💜 or share it with the teammate who keeps pasting prod keys into the wrong channel.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  About the Author
&lt;/h3&gt;

&lt;p&gt;I am a freelance AI engineer. I build AI agents, RAG systems, and AI tools for real businesses. I have shipped more than 20 AI systems across 7 countries, and I finished every single project I started.&lt;/p&gt;

&lt;p&gt;I am open for AI consulting, RAG work, AI agent work, and LLM app work. Most first versions are ready in 2 to 4 weeks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Portfolio: &lt;a href="https://muazashraf.org/portfolio" rel="noopener noreferrer"&gt;muazashraf.org/portfolio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Case studies: &lt;a href="https://muazashraf.org/case-studies" rel="noopener noreferrer"&gt;muazashraf.org/case-studies&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Hire me: &lt;a href="https://muazashraf.org/contact" rel="noopener noreferrer"&gt;muazashraf.org/contact&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this helped you, follow me here. I share simple lessons from real AI work.&lt;/p&gt;

</description>
      <category>security</category>
      <category>devops</category>
      <category>apikeys</category>
      <category>opensource</category>
    </item>
    <item>
      <title>5 Reasons Your RAG System Will Fail in Production (And the Patterns I Use to Fix Each One)</title>
      <dc:creator>Muaz</dc:creator>
      <pubDate>Sun, 17 May 2026 19:15:36 +0000</pubDate>
      <link>https://dev.to/muazashraf/5-reasons-your-rag-system-will-fail-in-production-and-the-patterns-i-use-to-fix-each-one-34ac</link>
      <guid>https://dev.to/muazashraf/5-reasons-your-rag-system-will-fail-in-production-and-the-patterns-i-use-to-fix-each-one-34ac</guid>
      <description>&lt;h2&gt;
  
  
  The 80% Problem
&lt;/h2&gt;

&lt;p&gt;Most RAG demos look magical. You drop in 10 PDFs, ask 3 questions, get clean answers. Ship it.&lt;/p&gt;

&lt;p&gt;Then production hits. The document corpus grows from 10 to 10,000. Users ask questions the demo never anticipated. Edge cases stack up. Accuracy drops from 95% to 60% in two weeks. The team starts apologising to the client.&lt;/p&gt;

&lt;p&gt;I've built &lt;strong&gt;20+ production RAG systems&lt;/strong&gt; for clients across the USA, UK, UAE, Canada, Australia, Switzerland, and Pakistan. About 80% of the RAG projects I audit before clients hire me are in this exact failure mode — they passed the demo, then collapsed under real data.&lt;/p&gt;

&lt;p&gt;The fixes aren't more complex models. They're &lt;strong&gt;architectural patterns&lt;/strong&gt; designed for failure modes from day one. Here are the five that matter most.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 1: Hallucinations on edge cases
&lt;/h2&gt;

&lt;p&gt;A vanilla RAG pipeline does this: embed the user query, retrieve top-k documents, stuff them into a prompt, ask the LLM to answer. When retrieval finds &lt;em&gt;something&lt;/em&gt;, the LLM dutifully constructs an answer — even when the retrieved context is unrelated to the question.&lt;/p&gt;

&lt;p&gt;In production, you get confident-sounding nonsense on the long tail of queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: a self-correction loop.&lt;/strong&gt; Before the LLM answers, force it to grade the retrieved context against the question. If the grade is poor, rewrite the query or fall back to a "I don't have enough information" response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;grade_relevance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Given the question and retrieved documents, score 0-10 how
    relevant the documents are to answering the question. Be strict.
    Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Documents: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Respond with just a number.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relevance_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_after_grading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relevance_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rewrite_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RAGState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grade&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grade_relevance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rewrite_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rewrite_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grade&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;route_after_grading&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I built exactly this pattern for an enterprise client — full breakdown in my &lt;a href="https://muazashraf.vercel.app/case-studies/agentic-rag-chatbot-langchain-langgraph" rel="noopener noreferrer"&gt;Agentic RAG case study&lt;/a&gt;. It moved accuracy from ~70% to 90%+ on real questions, and dropped hallucinations to single digits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 2: Stale retrieval as your data changes
&lt;/h2&gt;

&lt;p&gt;You ship a RAG system on Monday with 500 documents. By Friday, 50 of those documents have been edited. Your vector store still has the old embeddings.&lt;/p&gt;

&lt;p&gt;Users ask questions about the new content. The system retrieves the old version. They lose trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: incremental re-indexing with content hashing, not full re-builds.&lt;/strong&gt; Hash each source document. On a schedule (or webhook), only re-embed documents whose hash changed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;document_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upsert_if_changed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pinecone_index&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;new_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;document_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pinecone_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;new_hash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;  &lt;span class="c1"&gt;# unchanged, skip
&lt;/span&gt;    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pinecone_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;([{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;values&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;new_hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;indexed_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="p"&gt;}])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This single pattern saved a client 70% on embedding API costs and kept their knowledge base accurate without manual intervention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 3: Bad retrieval ranking
&lt;/h2&gt;

&lt;p&gt;Top-k retrieval over pure semantic similarity has a known weakness: it rewards documents that &lt;em&gt;sound similar&lt;/em&gt; to the question, not documents that &lt;em&gt;answer&lt;/em&gt; the question. Worse, exact keyword matches (product codes, names, error codes) often get ranked below conceptually-similar-but-wrong chunks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: hybrid search + a reranker.&lt;/strong&gt; Combine dense vector search with sparse keyword search (BM25), then run the merged candidates through a cross-encoder reranker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rank_bm25&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BM25Okapi&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CrossEncoder&lt;/span&gt;

&lt;span class="n"&gt;bm25&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BM25Okapi&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;reranker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CrossEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cross-encoder/ms-marco-MiniLM-L-6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hybrid_retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;dense_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sparse_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bm25&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_top_n&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dedupe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dense_hits&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;sparse_hits&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pairs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reranker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this matters: in financial, legal, and medical use cases, missing a specific code or term means missing the entire answer. Pure semantic search misses these constantly. Hybrid + rerank fixed this for a healthcare client managing 10,000+ patient records.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 4: Multimodal blindspots
&lt;/h2&gt;

&lt;p&gt;Most RAG systems can't read the charts, diagrams, screenshots, or tables inside PDFs. They OCR the text and lose 40% of the information.&lt;/p&gt;

&lt;p&gt;If your domain has visual content — research papers, technical docs, medical scans, financial reports — text-only RAG is broken by design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: vision-language embeddings (ColPali, CLIP) for image regions alongside text chunks.&lt;/strong&gt; Index both. Let the retriever match queries against both modalities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;colpali_engine.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ColPali&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ColPaliProcessor&lt;/span&gt;

&lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ColPaliProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vidore/colpali&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ColPali&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vidore/colpali&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed_page_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_page_image&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pdf_page_image&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;last_hidden_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Store both text embeddings AND image embeddings in the same vector store
# with a 'modality' tag. Retrieve from both, then merge.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I built this for a research firm searching 10,000+ pages of mixed-content PDFs. Asking "show me the Q3 conversion funnel chart" actually returns the right chart now. Full writeup: &lt;a href="https://muazashraf.vercel.app/case-studies/multimodal-rag-colpali-clip" rel="noopener noreferrer"&gt;Multimodal RAG with ColPali &amp;amp; CLIP&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 5: No evaluation harness = no improvement
&lt;/h2&gt;

&lt;p&gt;Most teams ship RAG without an evaluation pipeline. Then when accuracy degrades, they can't tell:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did retrieval get worse?&lt;/li&gt;
&lt;li&gt;Did the LLM get worse?&lt;/li&gt;
&lt;li&gt;Did the data get harder?&lt;/li&gt;
&lt;li&gt;Was it always this bad and we just didn't notice?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can't fix what you can't measure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: a golden dataset + automated nightly eval.&lt;/strong&gt; 50–100 hand-curated question/answer pairs covering your edge cases. Run them through the system every deploy. Track three metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_rag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;golden_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval_hit_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# did retrieval find the right doc?
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer_correctness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# did the final answer match?
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faithfulness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# was the answer grounded in retrieved docs?
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_doc_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_answer&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;golden_dataset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;retrieved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval_hit_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expected_doc_ids&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer_correctness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;llm_judge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faithfulness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;llm_judge_grounding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;golden_dataset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the single highest-leverage thing you can build. Every RAG improvement I've shipped started with one of these metrics moving in the wrong direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern: Design for failure on day 1
&lt;/h2&gt;

&lt;p&gt;If I had to compress all 20 RAG projects into one sentence: &lt;strong&gt;the production-ready systems are the ones designed for failure from the first commit.&lt;/strong&gt; Self-correction loops, hash-based incremental indexing, hybrid retrieval, multimodal embeddings, and an evaluation harness aren't optimizations you add later — they're load-bearing infrastructure.&lt;/p&gt;

&lt;p&gt;Most "AI demos that broke in production" stories are really "demos without failure handling that met production." The fix isn't a smarter model. It's better architecture.&lt;/p&gt;

&lt;p&gt;If you're building a RAG system that needs to survive real data, look at every component and ask: &lt;em&gt;what happens when this fails?&lt;/em&gt; If you don't have an answer, that's the next thing to build.&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;I'm &lt;strong&gt;&lt;a href="https://muazashraf.vercel.app" rel="noopener noreferrer"&gt;Muaz Ashraf&lt;/a&gt;&lt;/strong&gt;, a freelance AI engineer specialising in production-ready RAG systems, AI agents, and AI integration. I've shipped 20+ AI systems across 7 countries with a 100% project completion rate.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔗 Portfolio: &lt;a href="https://muazashraf.org/portfolio" rel="noopener noreferrer"&gt;muazashraf.org/portfolio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 Case studies: &lt;a href="https://muazashraf.org/case-studies" rel="noopener noreferrer"&gt;muazashraf.org/case-studies&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;✉️ Hire me: &lt;a href="https://muazashraf.org/contact" rel="noopener noreferrer"&gt;muazashraf.org/contact&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open for AI consulting, RAG system development, AI agent development, and LLM application work. Typical MVP delivery: 2–4 weeks.&lt;/p&gt;

&lt;p&gt;If you found this useful, follow me here on dev.to — I publish field notes from real production AI work.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>langchain</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I tested Claude Code, Gemini CLI, and OpenAI Codex for 3 months – here's the verdict</title>
      <dc:creator>Muaz</dc:creator>
      <pubDate>Thu, 04 Sep 2025 18:15:53 +0000</pubDate>
      <link>https://dev.to/muazashraf/i-tested-claude-code-gemini-cli-and-openai-codex-for-3-months-heres-the-verdict-g3n</link>
      <guid>https://dev.to/muazashraf/i-tested-claude-code-gemini-cli-and-openai-codex-for-3-months-heres-the-verdict-g3n</guid>
      <description>&lt;h2&gt;
  
  
  Model Intelligence &amp;amp; Code Quality
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code CLI&lt;/strong&gt; ⭐⭐⭐⭐⭐ - Powered by Claude Sonnet 4 (1M context) - Consistently produces the highest quality, most maintainable code - Exceptional at capturing coding style and project conventions - Best-in-class for complex architectural decisions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini CLI&lt;/strong&gt; ⭐⭐⭐⭐☆ - Uses Gemini 2.5 Pro with massive 1M token context - 63.8% on SWE-bench Verified (trailing Claude Code) - Strong multimodal capabilities - Excellent at handling large codebases due to context size&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Codex CLI&lt;/strong&gt; ⭐⭐⭐⭐☆ - GPT-5 achieves 74.9% on SWE-bench Verified - Highly accurate, idiomatically correct code on first try - Strong at rapid prototyping and bug fixing - Good reasoning capabilities with o4-mini integration&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer Features &amp;amp; Integration
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code CLI&lt;/strong&gt; ⭐⭐⭐⭐⭐ - Complete GitHub/GitLab integration - Multi-file editing capabilities - Agentic codebase search and understanding - CLI-first design works with any editor - Advanced hooks system for workflow customization - Real-time progress tracking with TodoWrite&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini CLI&lt;/strong&gt; ⭐⭐⭐⭐☆ - Built-in Google Search grounding - MCP (Model Context Protocol) support - File operations and shell commands - ReAct loop for complex task completion - Strong containerized environment support&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Codex CLI&lt;/strong&gt; ⭐⭐⭐⭐☆ - Multimodal inputs (text, screenshots, diagrams) - Local sandboxed execution - Multiple approval modes (suggest, auto-edit, full-auto) - Comprehensive testing integration - Built-in security with command review&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-World Performance Comparison&lt;/strong&gt;&lt;br&gt;
Scenario 1: Large Codebase Refactoring&lt;br&gt;
Winner: Claude Code CLI - Superior architectural understanding - Better handling of cross-file dependencies - Most maintainable code output&lt;/p&gt;

&lt;p&gt;Scenario 2: Quick Bug Fixing&lt;br&gt;
Winner: OpenAI Codex CLI&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fastest time to resolution - Highly accurate first-try fixes - Excellent at understanding error contexts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scenario 3: Learning New Framework&lt;br&gt;
Winner: Gemini CLI - Free tier allows extensive experimentation - Excellent documentation search capabilities - Large context window for comprehensive examples&lt;/p&gt;

&lt;p&gt;Scenario 4: Team Collaboration&lt;br&gt;
Winner: Claude Code CLI - Best integration with existing workflows - Superior code style consistency - Professional-grade reliability&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://muazashraf.vercel.app/blog/10-claude-code-vs-gemini-cli-vs-codex" rel="noopener noreferrer"&gt;MuazAshraf&lt;/a&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>gemini</category>
      <category>programming</category>
      <category>ai</category>
    </item>
    <item>
      <title>Advanced RAG vs Basic RAG: When simple retrieval isn't enough (LangChain + LangGraph implementation)</title>
      <dc:creator>Muaz</dc:creator>
      <pubDate>Thu, 04 Sep 2025 18:07:13 +0000</pubDate>
      <link>https://dev.to/muazashraf/advanced-rag-vs-basic-rag-when-simple-retrieval-isnt-enough-langchain-langgraph-implementation-5bem</link>
      <guid>https://dev.to/muazashraf/advanced-rag-vs-basic-rag-when-simple-retrieval-isnt-enough-langchain-langgraph-implementation-5bem</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;RAG (Retrieval-Augmented Generation) has changed how AI systems work with information. But while basic RAG gets the job done, advanced RAG takes it to the next level. In this post, I'll show you the difference between basic and advanced RAG, and how modern tools like LangChain and LangGraph make building smart AI systems much easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  I've been working with RAG systems and noticed basic retrieval fails for complex queries. Here's what I learned about advanced techniques:
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problems with Basic RAG&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can't handle multi-step reasoning&lt;/li&gt;
&lt;li&gt;Poor context understanding&lt;/li&gt;
&lt;li&gt;No query refinement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Advanced RAG Solutions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Self-correcting retrieval loops&lt;/li&gt;
&lt;li&gt;Multi-agent reasoning with LangGraph&lt;/li&gt;
&lt;li&gt;Contextual re-ranking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why I choose Langchain + Langgraph?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I tried my own custom logic but the code will become much complex and difficult to manage&lt;/li&gt;
&lt;li&gt;Langchain provide built in libraries, its effective and easy to manageable.&lt;/li&gt;
&lt;li&gt;Now you can use this advance Rag in any sector like in Education, Finance, Healthcare, you name it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Has anyone else run into these limitations? Would love to hear your experiences.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..." class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..." alt="Uploading image" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Full technical breakdown: &lt;a href="https://muazashraf.vercel.app/blog/11-advanced-rag-techniques" rel="noopener noreferrer"&gt;Advance RAG&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>langchain</category>
      <category>langgraph</category>
    </item>
  </channel>
</rss>
