<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Siddharth Pandey</title>
    <description>The latest articles on DEV Community by Siddharth Pandey (@siddharth_pandey_27).</description>
    <link>https://dev.to/siddharth_pandey_27</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1878784%2Fc5935d13-014b-42fe-a1ab-a9374ff32cbb.jpg</url>
      <title>DEV Community: Siddharth Pandey</title>
      <link>https://dev.to/siddharth_pandey_27</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/siddharth_pandey_27"/>
    <language>en</language>
    <item>
      <title>Your DB Is Still Red After Adding a Cache — Here's Why</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Tue, 23 Jun 2026 12:21:17 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/your-db-is-still-red-after-adding-a-cache-heres-why-2e4e</link>
      <guid>https://dev.to/siddharth_pandey_27/your-db-is-still-red-after-adding-a-cache-heres-why-2e4e</guid>
      <description>&lt;p&gt;You deployed a cache in front of your database three weeks ago. The DB is still running at 90% utilization. Traffic doubled last month and you're wondering if the cache is doing anything at all.&lt;/p&gt;

&lt;p&gt;It is — just not as much as you expected, because cache hit rate is not something you configure. It emerges from two things: how much of your working set fits in memory, and how skewed your access patterns are.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://paperstack-app.vercel.app/" rel="noopener noreferrer"&gt;Paperstack&lt;/a&gt; is a free system design simulator that makes this visible. Sketch an architecture, press play, watch utilization numbers and node colors update live. The demo below walks through the cache problem using it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hit Rate Isn't a Setting
&lt;/h2&gt;

&lt;p&gt;A cache absorbs reads by serving them from memory instead of forwarding them to the database. The fraction it absorbs — hit rate — depends on one thing: whether the data a request needs is in memory.&lt;/p&gt;

&lt;p&gt;Two variables determine that:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working-set vs memory.&lt;/strong&gt; If your active data is 100,000 keys and your cache holds 50,000, only half the requests can possibly hit — the rest miss and forward to the DB. Your cache isn't broken. It's undersized for the working set.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access skew.&lt;/strong&gt; If 80% of requests hit 10% of keys (common in social or content workloads), a much smaller cache can achieve a high hit rate because the hot keys stay warm and rarely get evicted. Paperstack models this directly via the &lt;code&gt;skew&lt;/code&gt; parameter on the Cache node: with an LFU eviction policy, higher skew boosts hit rate beyond raw memory coverage. With LRU, skew gives no benefit — the eviction algorithm doesn't take access frequency into account, so cold pages get evicted as readily as hot ones.&lt;/p&gt;

&lt;p&gt;This is why you can't just set &lt;code&gt;hitRate: 0.9&lt;/code&gt; in the config panel — Paperstack doesn't expose hit rate as a field. You set &lt;code&gt;memory&lt;/code&gt; and &lt;code&gt;workingSet&lt;/code&gt;; hit rate is computed. If the simulation let you enter a hit rate directly, it would be lying to you about what your architecture actually does.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Comparative in Action
&lt;/h2&gt;

&lt;p&gt;Here's the experiment worth running in Paperstack: sketch &lt;code&gt;Traffic → App → Cache → Database&lt;/code&gt;. Run the simulation. Watch which nodes go red.&lt;/p&gt;

&lt;p&gt;With cache &lt;code&gt;memory&lt;/code&gt; smaller than &lt;code&gt;workingSet&lt;/code&gt;, most reads miss and forward to the DB. The DB stays red. The cache stays green — it has throughput headroom, it's just not absorbing much.&lt;/p&gt;

&lt;p&gt;Now increase the cache memory past the working-set size. The hit rate climbs. Fewer reads reach the DB. At some threshold the DB color shifts from red to orange to green — and a different node becomes the bottleneck. Maybe the App Server. Maybe the cache's own throughput cap.&lt;/p&gt;

&lt;p&gt;This is what Paperstack calls the comparative: when you change one variable, the bottleneck doesn't disappear — it &lt;em&gt;moves&lt;/em&gt;. That relocation is the lesson. Scaling the cache fixed your DB problem and revealed your next one.&lt;/p&gt;

&lt;p&gt;The inverse is just as instructive. Remove the Cache node and rerun. Watch the DB immediately redline. This is how you build intuition for what a cache is actually doing — not by reading about hit rates, but by watching the utilization delta before and after.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Write Policy Matters
&lt;/h2&gt;

&lt;p&gt;Paperstack exposes three write patterns on the Cache node: &lt;strong&gt;Cache-aside&lt;/strong&gt;, &lt;strong&gt;Write-through&lt;/strong&gt;, and &lt;strong&gt;Write-behind&lt;/strong&gt;. The choice affects both latency and what happens when you kill the cache node.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache-aside&lt;/strong&gt; (the default) separates reads from writes entirely. Reads check the cache first; misses go to the DB. Writes bypass the cache and go directly to the DB. The cache is populated on read-miss, not on write. Kill the cache: reads start missing entirely, DB load spikes, but the write path was already going to DB — no disruption there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write-through&lt;/strong&gt; keeps the cache and DB in sync on every write. Writes pay the cache's latency &lt;em&gt;and&lt;/em&gt; the DB's latency on the write path, making writes more expensive than cache-aside. Kill the cache: reads fall through to DB, but every write was already reaching the DB, so nothing is lost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write-behind&lt;/strong&gt; is where the kill scenario gets interesting. In this mode, the cache absorbs writes entirely — they never reach the DB during normal operation. Only read-misses reach the DB. The DB is effectively shielded from the write load.&lt;/p&gt;

&lt;p&gt;Kill the cache node in write-behind mode: Paperstack's &lt;code&gt;passThroughOnKill&lt;/code&gt; behavior makes the cache transparent — all traffic falls straight through. The DB suddenly receives the write workload that was never reaching it before. If the DB was sized assuming writes were handled by the cache, it may not have the &lt;code&gt;writeCap&lt;/code&gt; headroom to absorb the sudden change. The simulation shows this directly as DB utilization spiking and requests dropping.&lt;/p&gt;

&lt;p&gt;This failure mode is invisible on a static architecture diagram. The diagram shows cache → DB regardless of write policy. The simulation shows what breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The cache-doesn't-help problem is usually a mismatch between memory and working set, not a configuration error. Once hit rate is computed from real inputs rather than typed in, the DB utilization behavior makes sense.&lt;/p&gt;

&lt;p&gt;Paperstack makes the relationship between working-set size, cache memory, write policy, and DB utilization visible without deploying anything. Sketch the architecture, tune the numbers, kill nodes, and watch the bottleneck move. When the DB finally turns green, you know exactly why.&lt;/p&gt;

&lt;p&gt;Try it at &lt;a href="https://paperstack-app.vercel.app/" rel="noopener noreferrer"&gt;Paperstack&lt;/a&gt; — it runs in the browser, no account needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Cache hit rate emerges from &lt;code&gt;memory&lt;/code&gt; vs &lt;code&gt;workingSet&lt;/code&gt; and access &lt;code&gt;skew&lt;/code&gt; — it's computed, not configured. Undersizing cache memory caps hit rate regardless of traffic volume.&lt;/li&gt;
&lt;li&gt;The comparative (change one variable, watch the bottleneck move) is how you build cache intuition: adding cache makes the DB green, revealing the next bottleneck.&lt;/li&gt;
&lt;li&gt;LFU eviction benefits high-skew workloads (popular keys stay warm); LRU does not — &lt;code&gt;skew&lt;/code&gt; only matters with the right eviction policy.&lt;/li&gt;
&lt;li&gt;Write-behind shields the DB from writes during normal operation; kill the cache and the DB suddenly receives the write load it was never sized for.&lt;/li&gt;
&lt;li&gt;Write-through and Cache-aside are safe to kill (writes were already reaching DB); write-behind changes the DB's workload profile on failure.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>systemdesign</category>
      <category>webdev</category>
      <category>database</category>
      <category>performance</category>
    </item>
    <item>
      <title>Why Your Reranker Isn't Helping Your RAG Pipeline (And How to Prove It)</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Sun, 21 Jun 2026 07:57:40 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/why-your-reranker-isnt-helping-your-rag-pipeline-and-how-to-prove-it-5a4h</link>
      <guid>https://dev.to/siddharth_pandey_27/why-your-reranker-isnt-helping-your-rag-pipeline-and-how-to-prove-it-5a4h</guid>
      <description>&lt;p&gt;You add a cross-encoder reranker to your RAG pipeline, measure answer quality on a test set, see a marginal improvement on 3 of 8 questions, and ship it. Six weeks later your p99 retrieval latency has climbed 200ms per query and you're paying Cohere API costs on every call. Nobody has revisited the decision because there's no data to revisit. The reranker is in the pipeline now. It probably helps.&lt;/p&gt;

&lt;p&gt;That "probably" is the problem. &lt;a href="https://github.com/Sidd27/ragscope" rel="noopener noreferrer"&gt;RAGScope&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/ragscope" rel="noopener noreferrer"&gt;npm&lt;/a&gt; gives you a per-query metric that tells you exactly whether your reranker is earning its cost — or actively making things worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Reranker Gain" Actually Measures
&lt;/h2&gt;

&lt;p&gt;When your RAG app runs a query, it emits OpenTelemetry spans: a retrieval span carrying chunk IDs, scores, and content; an optional reranking span; and an LLM span containing the full prompt text. RAGScope receives these via OTLP on port 4321 and analyzes the full trace end-to-end.&lt;/p&gt;

&lt;p&gt;The rerank-gain metric answers one question: did the reranker pull the chunks the LLM actually used toward the top of the list? RAGScope compares each chunk's retrieval rank (its position before reranking) against its reranked rank (its position after), then measures the average rank improvement of the chunks that ended up in the LLM's prompt. A chunk that was retrieved 8th but reranked to 2nd and appeared in the prompt counts as a large positive gain. A chunk that was retrieved 2nd, reranked to 9th, and got dropped from the prompt counts as a loss.&lt;/p&gt;

&lt;p&gt;The metric only appears in the score when the trace contains a reranker span. When it does, the weights renormalize automatically — precision drops from 40% to 35%, efficiency from 30% to 25%, and rerank-gain takes a 15% slice alongside uniqueness at 15% and coverage at 10%. Traces without a reranker span score exactly as before, so you can compare directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reading the Signal — What Good and Bad Look Like
&lt;/h2&gt;

&lt;p&gt;A reranker earning its cost looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;│  ✓  rerank-gain  88  █████████░  used chunks promoted avg +3.0 ranks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The chunks the LLM actually used were promoted an average of 3 positions by the reranker. That means the reranker is doing its job: surfacing the relevant material higher so it reaches the prompt and lands near the edges where the LLM attends most.&lt;/p&gt;

&lt;p&gt;A reranker not earning its cost:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;│  ✗  rerank-gain  25  ███░░░░░░░  used chunks demoted avg -2.0 ranks
│  → Reranker is not surfacing the chunks the LLM actually uses
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The chunks the LLM used were demoted by the reranker. They reached the prompt despite the reranker, not because of it. The reranker added latency and cost and then moved the useful material further back in the queue.&lt;/p&gt;

&lt;p&gt;The key insight is that RAGScope measures gain on the chunks that actually appeared in the LLM's prompt — not on the full ranked list. A reranker can shuffle 10 results around impressively while consistently pushing the 3 chunks the LLM uses toward position 7, 8, and 9. That's not a reranker working; that's a reranker actively degrading retrieval for this query type.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Do When the Reranker Is Hurting
&lt;/h2&gt;

&lt;p&gt;The first step is query segmentation. RAGScope scores every query your pipeline processes individually. Run it for a day and you'll have a distribution of rerank-gain scores broken down by query type. If your reranker earns a score of 80+ on factual lookups but consistently scores below 30 on comparison queries, you have a model-query-type mismatch, not a broken reranker.&lt;/p&gt;

&lt;p&gt;The second step is checking your reranker's training domain. Cross-encoders trained on MS MARCO work well for web-search-style queries. If your documents are internal API docs, legal contracts, or medical literature, the reranker may be applying a relevance signal that's semantically misaligned with your content. A low rerank-gain score on a specific document type is a strong signal to evaluate a domain-specific model.&lt;/p&gt;

&lt;p&gt;If the rerank-gain score is consistently low across query types, the simplest intervention is removing the reranker entirely and routing that latency budget into a higher TOP_K with tighter similarity thresholds. RAGScope's precision metric will tell you immediately whether that trade works: if precision improves and efficiency holds, you've recovered the latency without losing quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;A reranker is not always additive. It introduces latency, API cost, and an additional failure mode on every query — and most teams have no per-query signal to determine whether it's paying for itself. Aggregate quality metrics on a test set don't expose query-level degradation.&lt;/p&gt;

&lt;p&gt;RAGScope's rerank-gain metric gives you that signal query by query, live in your terminal as the pipeline runs. Start it with &lt;code&gt;npx ragscope start&lt;/code&gt;, add OTLP instrumentation to your retrieval and reranker calls, and you'll know within the first few queries whether the reranker is earning its place in your pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Sidd27/ragscope" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/ragscope" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;rerank-gain measures the average rank improvement of the chunks the LLM actually used — not the full ranked list, which can mask per-query degradation.&lt;/li&gt;
&lt;li&gt;The metric only appears when the trace contains a reranker span; weights renormalize automatically (precision 35%, efficiency 25%, rerank-gain 15%, uniqueness 15%, coverage 10%).&lt;/li&gt;
&lt;li&gt;A reranker with consistently negative rerank-gain is demoting the chunks the LLM uses — adding cost and latency for a net-negative retrieval outcome.&lt;/li&gt;
&lt;li&gt;Query segmentation reveals whether the reranker works for some query types but not others, pointing to model-query-type mismatch.&lt;/li&gt;
&lt;li&gt;If rerank-gain is consistently low across query types, removing the reranker and increasing TOP_K is often a better trade — RAGScope's precision score will validate it immediately.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>opensource</category>
      <category>llm</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Fix N+1 Trigger Patterns Where Lambda Functions Hammer the Same DynamoDB Partition Key</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Sat, 20 Jun 2026 18:15:52 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/fix-n1-trigger-patterns-where-lambda-functions-hammer-the-same-dynamodb-partition-key-1m60</link>
      <guid>https://dev.to/siddharth_pandey_27/fix-n1-trigger-patterns-where-lambda-functions-hammer-the-same-dynamodb-partition-key-1m60</guid>
      <description>&lt;p&gt;You add a sixth Lambda trigger to your &lt;code&gt;OrderEvents&lt;/code&gt; table, deploy it, and within 20 minutes your SLA dashboard goes red. Latency on order writes jumps from 4ms to 40ms. The function itself is fine. The table is fine. The problem is that five other Lambdas are already hitting the same partition key on every write, and you just made it six. DynamoDB's internal partition throttling doesn't care that each function looks clean in isolation.&lt;/p&gt;

&lt;p&gt;This is an N+1 trigger problem, and your AI coding assistant cannot catch it. Not because it lacks intelligence, but because the fact that five Lambdas already target that table lives in your AWS account and your full codebase — not in the file your assistant has open.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;Infrawise&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/infrawise" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the LLM Can't See the Pattern
&lt;/h2&gt;

&lt;p&gt;When you ask Claude to write a new order processing Lambda, it reads the file you have open and generates code that looks correct — because in the context of that one file, it is correct. It doesn't know about &lt;code&gt;ProcessRefundsLambda&lt;/code&gt;, &lt;code&gt;NotifyFulfillmentLambda&lt;/code&gt;, &lt;code&gt;SyncInventoryLambda&lt;/code&gt;, &lt;code&gt;UpdateAnalyticsLambda&lt;/code&gt;, and &lt;code&gt;AuditTrailLambda&lt;/code&gt;, all of which you wrote in previous sprints and which all write to the &lt;code&gt;Orders&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;This is a category of failure that model quality doesn't fix. A better model produces a more fluent explanation for why your latency spiked. The fact that five functions converge on the same table is a lookup, not a prediction. The source of truth is a combination of your code (which functions exist) and your infrastructure (what they access).&lt;/p&gt;

&lt;p&gt;Infrawise draws that boundary explicitly. It extracts the answer from your code using AST parsing and from your infrastructure using API calls, then hands that graph to the model as structured context — it never generates the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Infrawise Traces Trigger Chains to the Same Table
&lt;/h2&gt;

&lt;p&gt;When Infrawise scans your repository, it uses &lt;a href="https://ts-morph.com" rel="noopener noreferrer"&gt;ts-morph&lt;/a&gt; to walk every &lt;code&gt;CallExpression&lt;/code&gt; in every source file. It's not searching for the string "DynamoDB" — it matches call structure against a known set of SDK patterns in a &lt;code&gt;DYNAMO_OPERATIONS&lt;/code&gt; set: both v2 method names (&lt;code&gt;getItem&lt;/code&gt;, &lt;code&gt;query&lt;/code&gt;, &lt;code&gt;putItem&lt;/code&gt;, &lt;code&gt;updateItem&lt;/code&gt;, &lt;code&gt;deleteItem&lt;/code&gt;, &lt;code&gt;batchWriteItem&lt;/code&gt;) and v3 command classes (&lt;code&gt;QueryCommand&lt;/code&gt;, &lt;code&gt;PutItemCommand&lt;/code&gt;, &lt;code&gt;UpdateItemCommand&lt;/code&gt;, &lt;code&gt;DeleteItemCommand&lt;/code&gt;). Each matched call becomes an extracted operation: this function performs this operation against this table.&lt;/p&gt;

&lt;p&gt;That list feeds into a &lt;code&gt;SystemGraph&lt;/code&gt;. Nodes represent tables, functions, indexes, queues, and topics. Edges represent query, scan, and write relationships. The graph is what makes the N+1 pattern visible: not just "six functions exist" and "a table exists," but "six functions all write to &lt;code&gt;Orders&lt;/code&gt; with no distribution across paths."&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;HotPartitionAnalyzer&lt;/code&gt; walks the graph and fires when a table receives five or more distinct access edges from separate code paths. The threshold is configurable per-table via &lt;code&gt;hotPartitionThresholds&lt;/code&gt; in &lt;code&gt;infrawise.yaml&lt;/code&gt; — Issue #57 resolved false positives on high fan-in systems by making this a per-table setting rather than a single global value. A finding looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Medium severity
Potential hot partition detected on DynamoDB table "Orders"
  Table "Orders" is accessed by 6 distinct code paths, which may create
  hot partition issues at scale. High access concentration on the same
  partition key can throttle requests.
  Recommendation: Consider adding a random suffix or timestamp to partition
  keys (write sharding). Use DynamoDB DAX for read-heavy workloads.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs deterministically. Feed it the same graph, get the same findings. There's no sampling temperature involved.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;infrawise check --fail-on medium&lt;/code&gt; command gates CI on this finding. Since &lt;code&gt;HotPartitionAnalyzer&lt;/code&gt; emits medium severity, you need &lt;code&gt;--fail-on medium&lt;/code&gt; (the default &lt;code&gt;--fail-on high&lt;/code&gt; won't catch it). When violations are found, &lt;code&gt;infrawise check&lt;/code&gt; exits with code 1 — your build fails before the sixth Lambda merges, and the engineer who wrote it sees the finding in the PR, not on a latency dashboard at 11pm.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixing It — Restructuring the Key or Sharding the Access Pattern
&lt;/h2&gt;

&lt;p&gt;Once Infrawise surfaces the pattern, you have two practical options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write sharding&lt;/strong&gt; adds a random suffix to the partition key — distributing writes across logical partitions. Reads require scatter-gather or a deterministic suffix derived from the order ID. This is the right choice when all six functions are pure writers and reads are handled by a separate query path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access pattern separation&lt;/strong&gt; restructures which functions need direct table access at all. If &lt;code&gt;SyncInventoryLambda&lt;/code&gt; and &lt;code&gt;UpdateAnalyticsLambda&lt;/code&gt; are consuming state that flows through the &lt;code&gt;Orders&lt;/code&gt; table, they shouldn't write to it directly — they should react to a DynamoDB stream and write to their own tables. The fan-in often exists because multiple services treat the same source-of-truth table as a synchronization point when they should be downstream consumers.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;analyze_function&lt;/code&gt; tool helps here. Point it at any function and it traces the full access path: which tables the function reads and writes, which indexes it uses, what event shapes trigger it, and what queues or topics it publishes to. That trace makes it clear which functions can be moved to stream consumption and which genuinely need direct write access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The N+1 trigger problem is invisible to any tool that works only from your open files. It's not a reasoning failure — no amount of context about a single Lambda reveals that five others already saturate the same table. That fact lives in the intersection of your code and your infrastructure.&lt;/p&gt;

&lt;p&gt;Infrawise puts that intersection in a graph, runs deterministic analyzers over it, and surfaces the finding before it becomes a production incident. The model's job is to decide what to do — restructure the key, introduce a stream, separate the access pattern. The detection is never generated; it's extracted.&lt;/p&gt;

&lt;p&gt;If your AI assistant is writing Lambda functions against DynamoDB, give it the access graph first: &lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/infrawise" rel="noopener noreferrer"&gt;npm&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A hot partition problem requires knowing how many code paths hit the same table — that fact lives in your AWS account and your full codebase, not in the file your AI assistant has open.&lt;/li&gt;
&lt;li&gt;Infrawise's &lt;code&gt;HotPartitionAnalyzer&lt;/code&gt; counts distinct code paths hitting each DynamoDB table and fires at a configurable threshold, with per-table overrides via &lt;code&gt;hotPartitionThresholds&lt;/code&gt; in &lt;code&gt;infrawise.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Hot partition findings emit &lt;strong&gt;medium&lt;/strong&gt; severity; use &lt;code&gt;infrawise check --fail-on medium&lt;/code&gt; to gate CI builds on them (the default &lt;code&gt;--fail-on high&lt;/code&gt; won't catch them).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;analyze_function&lt;/code&gt; traces the full access path for any function — tables, indexes, event shapes, queues — making it easy to separate writers from downstream consumers.&lt;/li&gt;
&lt;li&gt;Write sharding and event-stream separation are the two practical fixes; which one to pick depends on whether converging functions genuinely need to write or are just consuming state.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>typescript</category>
      <category>aws</category>
      <category>dynamodb</category>
    </item>
    <item>
      <title>Stop Paying For Retrieval Latency On Chunks You Never Use In The Prompt</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Tue, 16 Jun 2026 08:12:31 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/stop-paying-for-retrieval-latency-on-chunks-you-never-use-in-the-prompt-4kh5</link>
      <guid>https://dev.to/siddharth_pandey_27/stop-paying-for-retrieval-latency-on-chunks-you-never-use-in-the-prompt-4kh5</guid>
      <description>&lt;h2&gt;
  
  
  Your pipeline fetched 10 chunks. Your LLM saw 3.
&lt;/h2&gt;

&lt;p&gt;You set &lt;code&gt;TOP_K=10&lt;/code&gt; on your vector store. Ten candidate chunks means more signal for the model — that's the logic. Then you run &lt;code&gt;npx ragscope&lt;/code&gt; and the audit prints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  WARN   51/100  █████░░░░░  my-rag-service
  │  "what are the pricing tiers?"
  │
  │  ✗  precision     30  ███░░░░░░░  3/10 chunks used
  │  ✗  efficiency    30  ███░░░░░░░  70% tokens wasted
  │  ✓  uniqueness   100  ██████████  chunks are distinct
  │  ✓  coverage     100  ██████████  all chunks scored
  │
  │  → Reduce TOP_K 10→3 (only 3 chunks reached LLM)
  │  → 70% of retrieved tokens never reached the LLM
  │
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three of ten chunks made it into the final prompt. You paid for ten round-trips to the vector store and seven went nowhere. The other seven were fetched, scored, and discarded somewhere between your retrieval step and your prompt assembly code.&lt;/p&gt;

&lt;p&gt;This is the gap &lt;a href="https://github.com/Sidd27/ragscope" rel="noopener noreferrer"&gt;ragscope&lt;/a&gt; (&lt;a href="https://www.npmjs.com/package/ragscope" rel="noopener noreferrer"&gt;npm&lt;/a&gt;) was built to surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Chunks Disappear
&lt;/h2&gt;

&lt;p&gt;Retrieval and prompt assembly are separate steps, but most teams treat them as one. The gap is where the waste lives.&lt;/p&gt;

&lt;p&gt;A typical RAG pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Embed the query&lt;/li&gt;
&lt;li&gt;Fetch &lt;code&gt;TOP_K&lt;/code&gt; chunks from the vector store&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;(Optional)&lt;/em&gt; Rerank&lt;/li&gt;
&lt;li&gt;Assemble the prompt — filter by score threshold, truncate to fit the context window&lt;/li&gt;
&lt;li&gt;Send to the LLM&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 4 is where chunks disappear. Post-retrieval filtering — a score threshold, a hard token budget, a deduplication pass — silently drops chunks you already paid to retrieve. If your prompt assembly filters out anything below a certain confidence score and your vector store returns six chunks that don't clear it, those six fetches were wasted. The network round-trip still happened. The latency still accumulated.&lt;/p&gt;

&lt;p&gt;The problem compounds over time. &lt;code&gt;TOP_K=10&lt;/code&gt; gets set as a safe default, the pipeline ships, and the setting never gets revisited. LLM eval scores look fine because the three chunks that do reach the prompt are the right ones. The waste is invisible in your evals — it only shows up in latency and cost.&lt;/p&gt;

&lt;p&gt;Vector stores typically scale retrieval latency with &lt;code&gt;TOP_K&lt;/code&gt;. Fetching ten results takes measurably longer than fetching three, especially at tail latencies. When seven of those ten are discarded before the prompt, you're paying that latency premium on every query for nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  How ragscope Measures &lt;code&gt;precision&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;ragscope runs as a local server (default port 4321) and receives spans from your pipeline via OTLP at &lt;code&gt;http://localhost:4321/v1/traces&lt;/code&gt;. No changes to your RAG code needed — configure your OpenTelemetry exporter to point there.&lt;/p&gt;

&lt;p&gt;When both a retrieval span and an LLM span arrive for the same trace, ragscope compares them. The key field is &lt;code&gt;inContext&lt;/code&gt; on each &lt;code&gt;RagChunk&lt;/code&gt;. It inspects the full text of the LLM span's assembled prompt and checks whether each retrieved chunk's content appears in it — positionally, by string match. A chunk either appears in the prompt or it doesn't. No LLM, no heuristics, no sampling.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;precision&lt;/code&gt; starts as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;base = (chunks where inContext is true / total retrieved chunks) × 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's one additional penalty: if high-retrieval-rank chunks land in the middle of a long context window — the zone where LLM attention typically falls off — ragscope subtracts 12 points per buried chunk, capped at 36. This is the lost-in-the-middle detection. A pipeline where 3/10 chunks are used but two of those three are buried mid-context will score lower than one where the same 3 chunks sit at the prompt edges.&lt;/p&gt;

&lt;p&gt;When the base falls below 60, ragscope generates a specific recommendation with the exact numbers: &lt;code&gt;Reduce TOP_K 10→3 (only 3 chunks reached LLM)&lt;/code&gt;. It prints this directly below the score bars, in the terminal, at the time the trace arrives — not in a cloud dashboard after a deploy.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;efficiency&lt;/code&gt; is a related metric: the fraction of retrieved tokens that reached the LLM. If &lt;code&gt;precision&lt;/code&gt; is 30 and your chunks are roughly uniform in size, &lt;code&gt;efficiency&lt;/code&gt; will also be around 30 — meaning 70% of the tokens you transferred from the vector store never appeared in a prompt. That shows up in retrieval latency and, depending on your pipeline, in processing time downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tuning TOP_K Based on the Audit
&lt;/h2&gt;

&lt;p&gt;Once you have &lt;code&gt;precision&lt;/code&gt; data, tuning is mechanical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Follow the recommendation.&lt;/strong&gt; ragscope prints &lt;code&gt;Reduce TOP_K 10→3&lt;/code&gt; because &lt;code&gt;Math.max(used_chunks, 3)&lt;/code&gt; gives you the minimum viable retrieval count with a floor of 3. Start there and re-run the audit against a few real queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check &lt;code&gt;efficiency&lt;/code&gt; alongside.&lt;/strong&gt; Low &lt;code&gt;efficiency&lt;/code&gt; paired with low &lt;code&gt;precision&lt;/code&gt; means your unused chunks are large — a chunking strategy problem as much as a &lt;code&gt;TOP_K&lt;/code&gt; problem. If &lt;code&gt;efficiency&lt;/code&gt; is high but &lt;code&gt;precision&lt;/code&gt; is low, your unused chunks are small and the token cost is modest; fixing &lt;code&gt;TOP_K&lt;/code&gt; will mostly recover the latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check &lt;code&gt;uniqueness&lt;/code&gt; too.&lt;/strong&gt; If the chunks that do reach your prompt include near-duplicates, the &lt;code&gt;uniqueness&lt;/code&gt; metric will flag them. Two near-duplicate chunks in the prompt means one is redundant regardless of your &lt;code&gt;TOP_K&lt;/code&gt; setting — that's a deduplication-at-ingest problem, not a retrieval count problem. ragscope computes overlap between chunk pairs and surfaces high-overlap counts in the audit output.&lt;/p&gt;

&lt;p&gt;The typical path after a low-precision audit:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Audit shows &lt;code&gt;3/10 chunks used&lt;/code&gt; — follow the &lt;code&gt;Reduce TOP_K 10→3&lt;/code&gt; recommendation&lt;/li&gt;
&lt;li&gt;Re-run against real queries — &lt;code&gt;precision&lt;/code&gt; should move above 75 (PASS)&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;uniqueness&lt;/code&gt; also flagged duplicates, fix chunking and re-run&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;efficiency&lt;/code&gt; is still low after fixing &lt;code&gt;TOP_K&lt;/code&gt;, chunks may be too large for the token budget&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Going from &lt;code&gt;TOP_K=10&lt;/code&gt; to &lt;code&gt;TOP_K=3&lt;/code&gt; on a pipeline that was only ever using 3 chunks means 70% fewer vector store round-trips on every query. No model changes, no prompt rewriting, no reranker added.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Number You Should Check First
&lt;/h2&gt;

&lt;p&gt;The assumption that more retrieval means better answers is almost never validated against what the model actually sees. &lt;code&gt;TOP_K&lt;/code&gt; defaults get set once and forgotten. Eval scores stay flat because the chunks that do reach the LLM are the right ones — the waste doesn't affect quality, just cost and latency.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;npx ragscope&lt;/code&gt; gives you &lt;code&gt;precision&lt;/code&gt;, &lt;code&gt;efficiency&lt;/code&gt;, and &lt;code&gt;uniqueness&lt;/code&gt; in the time it takes to run a dev command. If you haven't checked what fraction of your retrieved chunks survive to the prompt, that's the number to look at first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Sidd27/ragscope" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/ragscope" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;precision&lt;/code&gt; measures what fraction of retrieved chunks appear in the final LLM prompt — anything below 60/100 triggers a &lt;code&gt;Reduce TOP_K&lt;/code&gt; recommendation with exact numbers&lt;/li&gt;
&lt;li&gt;Vector store round-trips typically scale with &lt;code&gt;TOP_K&lt;/code&gt;; fetching chunks you discard is pure overhead with no quality benefit&lt;/li&gt;
&lt;li&gt;ragscope detects in-prompt chunk presence by matching content against the OTLP LLM span's prompt text — deterministic, no LLM needed&lt;/li&gt;
&lt;li&gt;Low &lt;code&gt;efficiency&lt;/code&gt; paired with low &lt;code&gt;precision&lt;/code&gt; points to a chunking strategy problem, not just a &lt;code&gt;TOP_K&lt;/code&gt; problem&lt;/li&gt;
&lt;li&gt;Low &lt;code&gt;uniqueness&lt;/code&gt; means near-duplicate chunks in the prompt — fix at ingest, not by adjusting retrieval count&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>opensource</category>
      <category>llm</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Why Infrawise Uses Deterministic Analysis Instead of an LLM</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Sat, 13 Jun 2026 10:58:44 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/why-infrawise-uses-deterministic-analysis-instead-of-an-llm-15fk</link>
      <guid>https://dev.to/siddharth_pandey_27/why-infrawise-uses-deterministic-analysis-instead-of-an-llm-15fk</guid>
      <description>&lt;p&gt;Ask your AI coding assistant which Global Secondary Indexes exist on your &lt;code&gt;Orders&lt;/code&gt; table. It will read your repository, find a few &lt;code&gt;QueryCommand&lt;/code&gt; calls, and answer — fluent, specific, and confident. It also has no way to know. GSI definitions live in AWS, not in your source files. The model isn't lying; the fact simply isn't available to it, so it generates the most statistically plausible substitute and delivers it in the same tone it uses for things it actually knows.&lt;/p&gt;

&lt;p&gt;That failure mode is why &lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;Infrawise&lt;/a&gt; (&lt;a href="https://www.npmjs.com/package/infrawise" rel="noopener noreferrer"&gt;npm&lt;/a&gt;) — an MCP server that gives AI coding assistants infrastructure context — contains no LLM calls at all. Every answer it serves comes from AST parsing, schema introspection, rule-based analyzers, and graph correlation. The LLM is only ever a consumer of that context, never a producer of it. This post is about why that boundary exists, and what it looks like in code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure questions are lookups, not generation
&lt;/h2&gt;

&lt;p&gt;There are two kinds of questions you can ask a tool. "How should I model sessions in DynamoDB?" is a judgment question — many defensible answers, context matters, an LLM is genuinely useful. "Does the &lt;code&gt;Sessions&lt;/code&gt; table have a GSI on &lt;code&gt;userId&lt;/code&gt;?" is a fact question. It has exactly one correct answer, and that answer is sitting in a &lt;code&gt;DescribeTable&lt;/code&gt; response.&lt;/p&gt;

&lt;p&gt;When you route a fact question through a generative model, you convert a lookup with a perfectly accurate source into a prediction with an unknown error rate. The motivating examples in the Infrawise README are all of this shape: an assistant suggesting a &lt;code&gt;.scan()&lt;/code&gt; on an &lt;code&gt;Orders&lt;/code&gt; table with 50 million rows, recommending a GSI on &lt;code&gt;status&lt;/code&gt; that already exists, or not noticing that five functions are already hammering the same partition key. None of these are reasoning failures. They are missing-fact failures, and no amount of model quality fixes them — a better model just produces a more convincing wrong answer.&lt;/p&gt;

&lt;p&gt;So Infrawise draws a hard line: facts get extracted deterministically, and the model receives them through MCP tool calls instead of guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What deterministic extraction looks like
&lt;/h2&gt;

&lt;p&gt;Infrawise builds its picture of your system from three sources, none of which involve a model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your code, through the compiler's eyes.&lt;/strong&gt; &lt;code&gt;scanRepository()&lt;/code&gt; in &lt;code&gt;src/context/index.ts&lt;/code&gt; loads the repo with &lt;a href="https://ts-morph.com" rel="noopener noreferrer"&gt;ts-morph&lt;/a&gt; — using your own &lt;code&gt;tsconfig.json&lt;/code&gt; when one exists — and walks every &lt;code&gt;CallExpression&lt;/code&gt; node in every source file. It doesn't regex for the word "scan". It matches call structure against known client patterns: a &lt;code&gt;DYNAMO_OPERATIONS&lt;/code&gt; set covering both SDK v2 method names (&lt;code&gt;query&lt;/code&gt;, &lt;code&gt;scan&lt;/code&gt;, &lt;code&gt;getItem&lt;/code&gt;) and SDK v3 command classes (&lt;code&gt;ScanCommand&lt;/code&gt;, &lt;code&gt;QueryCommand&lt;/code&gt;, &lt;code&gt;PutItemCommand&lt;/code&gt;), &lt;code&gt;query&lt;/code&gt;/&lt;code&gt;execute&lt;/code&gt;/&lt;code&gt;exec&lt;/code&gt; calls on PostgreSQL and MySQL clients, and MongoDB collection methods — where &lt;code&gt;find&lt;/code&gt; and &lt;code&gt;aggregate&lt;/code&gt; are classified as scan-type operations and the rest as queries. The output is a list of extracted operations: this function performs this operation type against this table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your databases, through their own catalogs.&lt;/strong&gt; The PostgreSQL adapter doesn't ask a model to summarize your schema. It runs the same introspection queries you would run by hand — &lt;code&gt;information_schema.tables&lt;/code&gt; for tables, &lt;code&gt;information_schema.columns&lt;/code&gt; for columns, &lt;code&gt;pg_indexes&lt;/code&gt; for indexes, and the constraint tables for keys. The docs recommend pointing it at a dedicated read-only user, and the DynamoDB side needs only &lt;code&gt;dynamodb:ListTables&lt;/code&gt; and &lt;code&gt;dynamodb:DescribeTable&lt;/code&gt; permissions. What comes back isn't a description of your schema; it &lt;em&gt;is&lt;/em&gt; your schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Correlation, through a graph.&lt;/strong&gt; Both streams land in a &lt;code&gt;SystemGraph&lt;/code&gt;: typed nodes for tables, functions, indexes, queues, topics, lambdas, buckets, secrets, parameters, and log groups, connected by typed edges like &lt;code&gt;query&lt;/code&gt;, &lt;code&gt;scan&lt;/code&gt;, and &lt;code&gt;uses_index&lt;/code&gt;. The graph is what turns two boring fact lists into something an analyzer can interrogate — not just "this table exists" and "this function scans something," but "&lt;code&gt;listAllOrders()&lt;/code&gt; scans the &lt;code&gt;Orders&lt;/code&gt; table, and no index covers that access."&lt;/p&gt;

&lt;h2&gt;
  
  
  Rules, not vibes
&lt;/h2&gt;

&lt;p&gt;The analysis layer is where most tools would reach for a model — and where Infrawise stays deterministic. The analyzer index exports 27 rule classes covering DynamoDB, PostgreSQL, MySQL, MongoDB, SQS, S3, Lambda, RDS, secrets, log retention, and Terraform drift. Each one is an ordinary class with an &lt;code&gt;analyze(graph)&lt;/code&gt; method that walks the graph and emits findings.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;FullTableScanAnalyzer&lt;/code&gt; follows scan-type edges to DynamoDB table nodes and emits a high-severity finding naming the table and every calling function. &lt;code&gt;MissingGSIAnalyzer&lt;/code&gt; flags tables that receive query edges but have no &lt;code&gt;uses_index&lt;/code&gt; edge — medium severity, because it might be intentional. &lt;code&gt;HotPartitionAnalyzer&lt;/code&gt; fires when a table is accessed by five or more distinct code paths (the threshold is a constructor parameter, defaulting to 5).&lt;/p&gt;

&lt;p&gt;Two properties fall out of this design that a model can't give you:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Findings are testable.&lt;/strong&gt; Every analyzer is a pure function of the graph. Feed it a fixture, assert on the output, done. There's no eval harness, no sampling temperature, no "run it three times and hope." If &lt;code&gt;FullTableScanAnalyzer&lt;/code&gt; regresses, a unit test catches it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failures are contained and honest.&lt;/strong&gt; &lt;code&gt;runAllAnalyzers()&lt;/code&gt; wraps each analyzer in its own try/catch — one analyzer crashing logs a warning while the rest keep running. The combined findings are then sorted by a fixed severity order: &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt;, and notably &lt;code&gt;verify&lt;/code&gt; — a severity that exists precisely so a deterministic system can say "I detected a pattern but can't confirm the intent" instead of bluffing. An LLM has no equivalent of &lt;code&gt;verify&lt;/code&gt;; everything it says arrives with the same confident fluency.&lt;/p&gt;

&lt;h2&gt;
  
  
  The LLM is the consumer, not the analyst
&lt;/h2&gt;

&lt;p&gt;None of this means LLMs are useless here. It means they belong at a specific layer. Infrawise exposes the graph and findings through 15 MCP tools: &lt;code&gt;get_infra_overview&lt;/code&gt; for a quick snapshot, &lt;code&gt;analyze_function&lt;/code&gt; to trace a single function's tables, queues, secrets, and trigger event shapes, &lt;code&gt;suggest_gsi&lt;/code&gt; to generate a ready-to-use GSI definition for a table and attribute, &lt;code&gt;postgres_index_suggestions&lt;/code&gt; for index advice, and so on. The assistant decides &lt;em&gt;when&lt;/em&gt; to ask and &lt;em&gt;what to do&lt;/em&gt; with the answer. It never produces the answer.&lt;/p&gt;

&lt;p&gt;The plumbing is deliberately boring: analysis results are cached as JSON files under &lt;code&gt;.infrawise/cache&lt;/code&gt;, and the &lt;code&gt;infrawise stdio&lt;/code&gt; process your editor spawns re-runs the analysis when the cache is older than 24 hours. Run &lt;code&gt;infrawise start --claude&lt;/code&gt; once and it writes &lt;code&gt;.mcp.json&lt;/code&gt; so Claude Code reconnects automatically on every future launch.&lt;/p&gt;

&lt;p&gt;This division of labor generalizes well beyond one project. The model handles intent ("the user wants this query to be cheaper") and synthesis ("given these findings, here's the migration plan"). The deterministic layer handles every claim that has a ground truth. The test is simple: if asking the same question twice should yield the same answer, don't generate the answer — look it up.&lt;/p&gt;

&lt;p&gt;If your AI assistant writes code against AWS or a database, give it facts instead of letting it guess: &lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/infrawise" rel="noopener noreferrer"&gt;npm&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A fact question routed through a generative model turns a lookup with a perfect source into a prediction with an unknown error rate. Route facts around the model, not through it.&lt;/li&gt;
&lt;li&gt;AST-level extraction (ts-morph walking &lt;code&gt;CallExpression&lt;/code&gt; nodes) catches what schema introspection alone can't see — which function scans which table, and how.&lt;/li&gt;
&lt;li&gt;Rule-based analyzers are unit-testable and fail loudly per rule; model-based analysis is neither.&lt;/li&gt;
&lt;li&gt;A deterministic system can emit a &lt;code&gt;verify&lt;/code&gt; severity when it isn't sure. A model can't reliably tell you when it's guessing.&lt;/li&gt;
&lt;li&gt;Put the LLM at the boundary: it consumes structured facts over MCP and decides what to do next — it never gets to invent the facts.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>typescript</category>
      <category>aws</category>
      <category>ai</category>
    </item>
    <item>
      <title>Give Your AI Assistant Infrastructure Eyes Before It Writes Another Query</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Tue, 09 Jun 2026 18:18:02 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/give-your-ai-assistant-infrastructure-eyes-before-it-writes-another-query-p0o</link>
      <guid>https://dev.to/siddharth_pandey_27/give-your-ai-assistant-infrastructure-eyes-before-it-writes-another-query-p0o</guid>
      <description>&lt;p&gt;You asked Claude Code to add pagination to your order history endpoint. It generated a clean function — &lt;code&gt;listOrdersByUser()&lt;/code&gt; — using a DynamoDB &lt;code&gt;Scan&lt;/code&gt; with a &lt;code&gt;Limit&lt;/code&gt; parameter. It compiled. Tests passed. You shipped it.&lt;/p&gt;

&lt;p&gt;Three days later your AWS bill had a line item you didn't recognize: 47 million read capacity units consumed in 72 hours. The Orders table has 50M rows. &lt;code&gt;Scan&lt;/code&gt; reads every one of them regardless of &lt;code&gt;Limit&lt;/code&gt; — &lt;code&gt;Limit&lt;/code&gt; only controls how many results come back, not how many items DynamoDB reads.&lt;/p&gt;

&lt;p&gt;Claude Code didn't know your table had 50M rows. It didn't know you had a GSI on &lt;code&gt;userId&lt;/code&gt;. It guessed, and the guess was expensive.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;infrawise&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/infrawise" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI Assistants Don't Know About Your Infrastructure
&lt;/h2&gt;

&lt;p&gt;AI coding assistants read your source files. They understand function signatures, TypeScript types, and import chains. What they cannot see is the infrastructure those functions run against.&lt;/p&gt;

&lt;p&gt;When Claude Code looks at a file that calls &lt;code&gt;dynamoClient.scan({ TableName: "Orders" })&lt;/code&gt;, it has no idea that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Orders table has 50M items&lt;/li&gt;
&lt;li&gt;There is already a GSI named &lt;code&gt;userId-index&lt;/code&gt; on the &lt;code&gt;userId&lt;/code&gt; attribute&lt;/li&gt;
&lt;li&gt;Three other functions are already using &lt;code&gt;Query&lt;/code&gt; against that same GSI&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Sessions&lt;/code&gt; table is accessed by 6 separate code paths, making it a hot partition candidate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that context, the assistant fills the gap with generic patterns. It recommends &lt;code&gt;Scan&lt;/code&gt; because it has no reason not to. It suggests adding a GSI on &lt;code&gt;status&lt;/code&gt; because it doesn't know one exists. It writes &lt;code&gt;SELECT *&lt;/code&gt; because it has no idea which columns are expensive to pull.&lt;/p&gt;

&lt;p&gt;This isn't a bug in the model. It's a missing input. The model was never given your infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens When infrawise Is in the Loop
&lt;/h2&gt;

&lt;p&gt;infrawise statically analyzes your codebase, your DynamoDB tables, and your PostgreSQL schemas, then exposes that context to your editor through MCP. Claude Code gets 15 tools that answer questions like: which tables exist, what are their partition keys and sort keys, which GSIs are already defined, which functions are already scanning, and which patterns are flagged as high severity.&lt;/p&gt;

&lt;p&gt;The difference in output is concrete. Here's what infrawise surfaces before any code gets written:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Findings  3 total

  1.  HIGH   Full table scan detected on DynamoDB table "Orders"
             listAllOrders() scans without any filter — reads every item in the table.
             → Replace Scan with Query using a partition key or add a GSI.

  2.  MED    PostgreSQL table "users" has no index on column "email"
             Filtering on "email" causes sequential scans.
             → CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

  3.  MED    DynamoDB table "Sessions" accessed by 6 distinct code paths
             Hot partition risk — multiple functions hammer the same key.
             → Review access patterns and consider partition key design.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Claude Code has this context, its suggestions change. It knows &lt;code&gt;userId-index&lt;/code&gt; exists and recommends &lt;code&gt;Query&lt;/code&gt; against it instead of &lt;code&gt;Scan&lt;/code&gt;. It knows the &lt;code&gt;email&lt;/code&gt; column has no index and includes the exact &lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; statement rather than a generic suggestion. It knows which functions are already hitting a partition hard before it adds another one.&lt;/p&gt;

&lt;p&gt;The recommendations become specific to your actual tables, not generic advice copied from documentation.&lt;/p&gt;

&lt;p&gt;infrawise does none of this with an LLM. The analysis is entirely deterministic: TypeScript AST parsing via ts-morph for the code graph, schema introspection for the database layer, rule-based analyzers for pattern detection, and graph correlation to connect code paths to tables. No model is involved in the analysis — models are only consumers of the output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring It Up — infrawise start --claude
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; infrawise
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
infrawise start &lt;span class="nt"&gt;--claude&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On first run, infrawise asks a few questions and generates &lt;code&gt;infrawise.yaml&lt;/code&gt;. It then scans your AWS services, databases, and codebase, writes &lt;code&gt;.mcp.json&lt;/code&gt; so your editor auto-connects, and opens Claude Code with all 15 MCP tools ready.&lt;/p&gt;

&lt;p&gt;Every session after that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No infrawise command needed. The editor manages the MCP connection. Analysis is cached for 24 hours; when the cache goes stale, &lt;code&gt;infrawise stdio&lt;/code&gt; — spawned automatically at session start — refreshes it. File changes are detected within the session and the code graph updates automatically without re-running AWS extraction.&lt;/p&gt;

&lt;p&gt;For PostgreSQL, infrawise uses a read-only connection. Create the user with these four statements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;infrawise_ro&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'yourpassword'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;CONNECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;yourdb&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;infrawise_ro&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;infrawise_ro&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;infrawise_ro&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to check findings without opening an editor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;infrawise analyze &lt;span class="nt"&gt;--severity&lt;/span&gt; high
infrawise analyze &lt;span class="nt"&gt;--severity&lt;/span&gt; high &lt;span class="nt"&gt;--output&lt;/span&gt; report.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--severity&lt;/code&gt; flag accepts &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt;, or &lt;code&gt;verify&lt;/code&gt;. The &lt;code&gt;--output&lt;/code&gt; flag saves findings as a markdown report.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The problem isn't that AI coding assistants write bad code. The problem is that they write code for an infrastructure they've never seen. A &lt;code&gt;Scan&lt;/code&gt; on an empty dev table and a &lt;code&gt;Scan&lt;/code&gt; on a 50M-row production table look identical in source — the model has no way to tell them apart unless something provides that context.&lt;/p&gt;

&lt;p&gt;infrawise makes that context available deterministically, before the code gets written. The assistant stops guessing about your GSIs, your partition keys, and your missing indexes because it no longer needs to guess.&lt;/p&gt;

&lt;p&gt;Try it: &lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/infrawise" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AI coding assistants have no knowledge of your actual infrastructure — they infer from source files and fill gaps with generic patterns&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;Scan&lt;/code&gt; with &lt;code&gt;Limit&lt;/code&gt; still reads every item in DynamoDB before applying the limit — the model won't know this unless it knows your table's access patterns&lt;/li&gt;
&lt;li&gt;infrawise exposes your exact schemas, GSIs, partition keys, and flagged patterns to your editor through MCP — 15 tools Claude Code can query before writing a single line&lt;/li&gt;
&lt;li&gt;All analysis is deterministic: TypeScript AST parsing, schema introspection, rule-based detection — no LLM in the analysis path&lt;/li&gt;
&lt;li&gt;Setup is one command: &lt;code&gt;infrawise start --claude&lt;/code&gt; generates config, writes &lt;code&gt;.mcp.json&lt;/code&gt;, and opens your editor with full context ready&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>typescript</category>
      <category>aws</category>
      <category>dynamodb</category>
    </item>
    <item>
      <title>Add a PASS/WARN/FAIL Quality Gate to Your RAG Pipeline in 30 Seconds</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Sat, 06 Jun 2026 11:34:50 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/add-a-passwarnfail-quality-gate-to-your-rag-pipeline-in-30-seconds-d4o</link>
      <guid>https://dev.to/siddharth_pandey_27/add-a-passwarnfail-quality-gate-to-your-rag-pipeline-in-30-seconds-d4o</guid>
      <description>&lt;p&gt;You deployed a RAG chatbot. The answers are vague. You bump the LLM from GPT-3.5 to GPT-4. The answers are still vague. You double the chunk size. Still vague. You spend three hours tuning prompts. Still. Vague.&lt;/p&gt;

&lt;p&gt;The real problem isn't the model. It's that your pipeline is retrieving 10 chunks and the LLM is only seeing 3 of them — and nothing in your logs tells you that.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Actually Breaking (and Why You Can't See It)
&lt;/h2&gt;

&lt;p&gt;A RAG pipeline has at least two moving parts between a user query and an answer: a retrieval step that fetches relevant chunks from a vector store, and an LLM call that uses those chunks to generate a response.&lt;/p&gt;

&lt;p&gt;The failure mode that kills most RAG quality work is invisible: chunks are retrieved, then silently discarded before they reach the LLM prompt.&lt;/p&gt;

&lt;p&gt;This happens because of &lt;code&gt;TOP_K&lt;/code&gt;. You set &lt;code&gt;TOP_K=10&lt;/code&gt; thinking more context is better. But your LLM has a token budget. The orchestration layer (LangChain, LlamaIndex, your custom code) fills the prompt until it hits the limit — and quietly drops whatever didn't fit. The LLM never saw chunks 4 through 10. Your logs show a successful retrieval. Your logs show a successful LLM call. Nothing reports that 70% of your retrieved context was thrown away.&lt;/p&gt;

&lt;p&gt;There are three failure patterns that account for most bad RAG answers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TOP_K too high.&lt;/strong&gt; You retrieve 10 chunks, the LLM uses 3. The 7 you paid to embed, store, and retrieve contribute nothing. Worse: if the 3 that fit aren't the 3 most relevant, your answer quality is determined by which chunks happened to survive token truncation rather than which ones actually matched the query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Near-duplicate chunks.&lt;/strong&gt; Sliding-window chunking creates overlapping segments. If chunk 3 ends with &lt;em&gt;"...chlorophyll to capture light and convert it into chemical energy"&lt;/em&gt; and chunk 4 starts with the same phrase, you've burned 30% of your context window repeating one sentence. The model sees it twice and may over-weight it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing similarity scores.&lt;/strong&gt; Some vector stores (notably Chroma with L2 distance) return raw distance values, not normalized [0, 1] similarity scores. Your retrieval logs show scores like &lt;code&gt;1.42&lt;/code&gt; and &lt;code&gt;0.93&lt;/code&gt; with no indication which is better. Without normalized scores you can't tune thresholds or understand ranking.&lt;/p&gt;

&lt;p&gt;These are all measurable. You just need something to measure them.&lt;/p&gt;




&lt;h2&gt;
  
  
  One Command to Add a Quality Gate
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx ragscope start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That starts a local OTLP receiver on port 4321. Then point your pipeline at it with one environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:4321
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're using Traceloop or OpenLLMetry for instrumentation, that's all you need — they auto-instrument LangChain, LlamaIndex, OpenAI, Qdrant, and Cohere out of the box:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Traceloop&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@traceloop/node-server-sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;OTLPTraceExporter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/exporter-trace-otlp-http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;Traceloop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;exporter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OTLPTraceExporter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:4321/v1/traces&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Vercel AI SDK or custom pipelines, add two span attributes: one on the retrieval span listing your chunks, and one on the LLM span with the full prompt text. That's the minimum RAGScope needs to score a trace.&lt;/p&gt;

&lt;p&gt;Run your app, fire a few queries, and your terminal shows this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;  PASS  90/100  █████████░  my-rag-app
  │  "What is RAG?"
  │
  │  ✓  precision    90  █████████░  9/10 chunks used
  │  ✓  efficiency   80  ████████░░  20% tokens wasted
  │  ✓  uniqueness  100  ██████████  chunks are distinct
  │  ✓  coverage    100  ██████████  all chunks scored
  │

  WARN  54/100  █████░░░░░  my-rag-app
  │  "What is dense passage retrieval?"
  │
  │  ✗  precision    40  ████░░░░░░  4/10 chunks used
  │  ~  efficiency   50  █████░░░░░  50% tokens wasted
  │  ~  uniqueness   65  ███████░░░  2 near-duplicate pairs
  │  ✓  coverage    100  ██████████  all chunks scored
  │
  │  → Reduce TOP_K 10→4 (only 4 chunks reached LLM)
  │  → 50% of retrieved tokens never reached the LLM
  │  → 2 near-duplicate chunks — deduplicate at ingest time
  │

  ──────────────────────────────────────────────────
  Session  2 queries  ·  avg 72/100  ↓
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every trace is scored the moment it arrives. No dashboard to open, no query to run — just a PASS/WARN/FAIL with specific numbers sitting next to the query that produced them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reading the Score — What Each Number Means
&lt;/h2&gt;

&lt;p&gt;Four sub-scores combine into a single 0–100 overall score. The weights reflect actual impact on answer quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Precision (40%)&lt;/strong&gt; — what fraction of retrieved chunks appeared in the LLM prompt. This is weighted highest because a chunk that doesn't reach the LLM has zero value to the answer. It consumed retrieval latency, vector bandwidth, and context window space, and then got dropped. A score of 40 means &lt;code&gt;Reduce TOP_K 10→4 (only 4 chunks reached LLM)&lt;/code&gt; — RAGScope tells you the exact number to set it to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Efficiency (30%)&lt;/strong&gt; — what fraction of retrieved tokens the LLM actually consumed. Low precision and low efficiency usually appear together, but they can diverge: if you retrieve three large chunks and the LLM fits two and a half, efficiency is low even though precision is decent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Uniqueness (20%)&lt;/strong&gt; — how distinct your chunks are from each other. Computed from exact text overlap between adjacent chunks (sorted by retrieval rank). Score of 100 means all chunks are fully distinct. A score of 65 with &lt;code&gt;2 near-duplicate pairs&lt;/code&gt; means your chunking strategy is creating redundant segments — deduplicate at ingest time or increase your chunk step size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coverage (10%)&lt;/strong&gt; — whether your chunks carry non-zero similarity scores. This is a signal flag: if it fires, your vector store is returning raw values that couldn't be normalized, which means you also can't tune retrieval thresholds. RAGScope normalizes Chroma distances automatically, so this usually only fires when scores are genuinely missing from the trace.&lt;/p&gt;

&lt;p&gt;The overall label maps to:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Label&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;≥ 75&lt;/td&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Retrieval is healthy for this query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50–74&lt;/td&gt;
&lt;td&gt;WARN&lt;/td&gt;
&lt;td&gt;Issues present — review the recommendations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;lt; 50&lt;/td&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;td&gt;Significant retrieval problems&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each WARN or FAIL comes with a concrete recommendation. Not "consider reducing TOP_K" — &lt;code&gt;Reduce TOP_K 10→4 (only 4 chunks reached LLM)&lt;/code&gt;. The actual number, derived from your actual trace.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Adding a quality gate to your RAG pipeline takes one command and one environment variable. From that point, every query you run during development is scored — precision, efficiency, uniqueness, coverage — with specific recommendations when something is wrong.&lt;/p&gt;

&lt;p&gt;You stop guessing whether a model upgrade or a prompt rewrite will fix the vague answers. You see whether the retrieval pipeline is the problem, and exactly where it's breaking.&lt;/p&gt;

&lt;p&gt;RAGScope runs entirely locally. No accounts, no configuration files, no data leaving your machine. Trace data lives in memory for the session lifetime. It's the same category of tool as a linter: runs while you build, catches problems before users see them.&lt;/p&gt;

&lt;p&gt;Try it on your next RAG session: &lt;a href="https://github.com/Sidd27/ragscope" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/ragscope" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Most RAG quality problems are retrieval mechanics problems, not model problems — and they're invisible without tracing&lt;/li&gt;
&lt;li&gt;TOP_K too high is the most common culprit: chunks are retrieved, then silently dropped before the LLM prompt is assembled&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npx ragscope start&lt;/code&gt; + one env var adds a live PASS/WARN/FAIL score to every query during development&lt;/li&gt;
&lt;li&gt;Precision (40% weight) measures chunk utilization; efficiency (30%) measures token utilization — both usually fix with TOP_K reduction&lt;/li&gt;
&lt;li&gt;Near-duplicate chunks from sliding-window chunking waste context window space and can bias model outputs&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>opensource</category>
      <category>llm</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Three Commands to Make Claude Code Stop Guessing Your Infra</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Fri, 05 Jun 2026 18:30:45 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/three-commands-to-make-claude-code-stop-guessing-your-infra-2onj</link>
      <guid>https://dev.to/siddharth_pandey_27/three-commands-to-make-claude-code-stop-guessing-your-infra-2onj</guid>
      <description>&lt;p&gt;You asked Claude Code to add a query for orders by customer status. It generated a &lt;code&gt;.scan()&lt;/code&gt; with a &lt;code&gt;FilterExpression&lt;/code&gt;. Your Orders table has 50M rows and three functions already hammering the same partition key. Claude Code had no idea — it read your TypeScript files, not your AWS account.&lt;/p&gt;

&lt;p&gt;That's the problem. AI coding assistants are literate in your source code. They are blind to your infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/infrawise" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Claude Code Actually Sees (and What It Doesn't)
&lt;/h2&gt;

&lt;p&gt;When Claude Code reads your codebase, it builds a model of your application: function names, variable patterns, the string &lt;code&gt;"Orders"&lt;/code&gt; passed to &lt;code&gt;DynamoDB.DocumentClient&lt;/code&gt;. It can follow call chains, infer intent, and generate syntactically correct code.&lt;/p&gt;

&lt;p&gt;What it cannot do is describe your actual infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It doesn't know which GSIs exist on your DynamoDB tables&lt;/li&gt;
&lt;li&gt;It doesn't know how your tables are partitioned or what sort keys you use&lt;/li&gt;
&lt;li&gt;It doesn't know that &lt;code&gt;listAllOrders()&lt;/code&gt; already does a full scan and costs $40/day&lt;/li&gt;
&lt;li&gt;It doesn't know that 5 functions already write to the same partition key on &lt;code&gt;Sessions&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So when you ask it to add a new query, it generates something that looks correct. It might use &lt;code&gt;.query()&lt;/code&gt; instead of &lt;code&gt;.scan()&lt;/code&gt;. But it'll query on an attribute with no index — because it has no way to know which attributes are indexed. It'll write a &lt;code&gt;FilterExpression&lt;/code&gt; that reads every item before filtering — which is exactly a scan, just spelled differently.&lt;/p&gt;

&lt;p&gt;The code compiles. Tests pass. The problem ships.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Commands That Close the Gap
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;infrawise&lt;/strong&gt; gives Claude Code deterministic knowledge of your infrastructure through the Model Context Protocol. Three commands get you there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;infrawise init&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
infrawise init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Runs once per project. Detects your AWS profile and region, asks which databases you use, and writes a single file: &lt;code&gt;infrawise.yaml&lt;/code&gt;. That's the only file it creates in your repository — one config, no framework, no SDK changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;infrawise doctor&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;infrawise doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before you trust any analysis, verify that infrawise can actually reach your infrastructure. Doctor checks every configured adapter — DynamoDB, PostgreSQL, Lambda, S3 — and reports what's reachable.&lt;/p&gt;

&lt;p&gt;This step matters more than it sounds. If your AWS credentials are stale or your DB password rotated, &lt;code&gt;infrawise analyze&lt;/code&gt; will run against cached metadata from last week and give you confident-but-wrong context. Doctor catches this before you feed stale data to Claude Code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;✓ DynamoDB  Connected (profile: default)
✓ PostgreSQL  Connected
✗ Lambda  credentials expired
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix what's broken, then move on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. &lt;code&gt;infrawise dev&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;infrawise dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The command you run while you work. It runs a fresh analysis of your repository and infrastructure, then starts an MCP server that Claude Code queries during code generation.&lt;/p&gt;

&lt;p&gt;If no analysis cache exists, it runs one automatically. If your infrastructure changes — you add a GSI, a new Lambda, a new table — run &lt;code&gt;infrawise dev&lt;/code&gt; again and the context updates.&lt;/p&gt;

&lt;p&gt;Register it with Claude Code once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add &lt;span class="nt"&gt;--transport&lt;/span&gt; http infrawise http://localhost:3000/mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From that point, every time Claude Code generates infrastructure-touching code, it queries this server first.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Claude Code Can See After You Run Them
&lt;/h2&gt;

&lt;p&gt;Here's what &lt;code&gt;infrawise analyze&lt;/code&gt; surfaces on a real project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Findings  3 total

1.  HIGH  Full table scan detected on DynamoDB table "Orders"
         listAllOrders() scans without any filter — reads every item in the table.
       → Replace Scan with Query using a partition key or add a GSI.

2.  MED   PostgreSQL table "users" has no index on column "email"
         Filtering on "email" causes sequential scans.
       → CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

3.  MED   DynamoDB table "Sessions" accessed by 6 distinct code paths
         High access concentration may create hot partition issues at scale.
       → Consider write sharding or DynamoDB DAX for read-heavy workloads.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These aren't generic warnings. They name the function (&lt;code&gt;listAllOrders&lt;/code&gt;), the table (&lt;code&gt;Orders&lt;/code&gt;), and the exact fix (&lt;code&gt;CREATE INDEX CONCURRENTLY idx_users_email ON users(email)&lt;/code&gt;). The GSI recommendation for DynamoDB includes the exact config — attribute name, key type, projection — not a suggestion to "consider adding an index."&lt;/p&gt;

&lt;p&gt;When Claude Code queries the MCP server and gets this context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It won't suggest &lt;code&gt;.scan()&lt;/code&gt; on &lt;code&gt;Orders&lt;/code&gt; — it knows the table has 50M rows and an existing high-severity finding&lt;/li&gt;
&lt;li&gt;It will use the correct partition key and GSI name when building a &lt;code&gt;.query()&lt;/code&gt; — because it knows both&lt;/li&gt;
&lt;li&gt;It won't recommend adding a GSI on &lt;code&gt;status&lt;/code&gt; — because it knows that GSI already exists&lt;/li&gt;
&lt;li&gt;It will note that &lt;code&gt;Sessions&lt;/code&gt; already has 6 access paths before adding a seventh&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You're not getting a smarter model. You're giving the existing model the facts it was missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The gap between AI-generated code and production-safe code is mostly an information gap. Claude Code is capable of writing correct infrastructure queries — it just doesn't have your infrastructure. &lt;code&gt;infrawise init&lt;/code&gt; connects it. &lt;code&gt;infrawise doctor&lt;/code&gt; validates the connection. &lt;code&gt;infrawise dev&lt;/code&gt; keeps it current.&lt;/p&gt;

&lt;p&gt;Three commands. One config file. No changes to your application code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Claude Code reads code, not cloud — that's the gap infrawise fills&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;infrawise init&lt;/code&gt; runs once and writes a single &lt;code&gt;infrawise.yaml&lt;/code&gt; to your project&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;infrawise doctor&lt;/code&gt; prevents you from trusting analysis built on stale or broken connections&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;infrawise dev&lt;/code&gt; keeps infra context fresh automatically and serves it over MCP&lt;/li&gt;
&lt;li&gt;Findings are specific: function name, table name, exact SQL or GSI config — not generic advice&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>claudecode</category>
      <category>mcp</category>
      <category>opensource</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Stop AI From Recommending Redundant Indexes on Existing GSIs</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Wed, 03 Jun 2026 17:15:28 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/stop-ai-from-recommending-redundant-indexes-on-existing-gsis-5lo</link>
      <guid>https://dev.to/siddharth_pandey_27/stop-ai-from-recommending-redundant-indexes-on-existing-gsis-5lo</guid>
      <description>&lt;h2&gt;
  
  
  Hook — The GSI Your AI Doesn't Know About
&lt;/h2&gt;

&lt;p&gt;You asked Claude Code to fix a slow query on your &lt;code&gt;Orders&lt;/code&gt; table. It came back with a recommendation: add a GSI on &lt;code&gt;customerId&lt;/code&gt; — index name &lt;code&gt;Orders-customerId-index&lt;/code&gt;, projection type &lt;code&gt;ALL&lt;/code&gt;. Clean, well-formatted, ready to paste into Terraform.&lt;/p&gt;

&lt;p&gt;Your &lt;code&gt;Orders&lt;/code&gt; table already has &lt;code&gt;Orders-customerId-index&lt;/code&gt;. Has had it for eight months.&lt;/p&gt;

&lt;p&gt;The AI read your code. It saw a &lt;code&gt;.query()&lt;/code&gt; call filtering on &lt;code&gt;customerId&lt;/code&gt;, noticed you weren't explicitly referencing an index name, and concluded one was missing. It never checked your actual DynamoDB table. It couldn't — it had no way to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;infrawise&lt;/a&gt; fixes this by reading your real infrastructure first, before any code gets written.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI Gets GSIs Wrong Every Time
&lt;/h2&gt;

&lt;p&gt;AI coding assistants are good at reading code. They're not reading your AWS account.&lt;/p&gt;

&lt;p&gt;When Claude Code or Copilot sees this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;docClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Orders&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;KeyConditionExpression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;customerId = :cid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;:cid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;customerId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It has two choices: assume you're using the table's partition key, or flag a potential missing index. Without explicit index name in the code, a cautious AI will suggest one. It's the right instinct — but the wrong answer, because the index already exists.&lt;/p&gt;

&lt;p&gt;The damage isn't just a wasted suggestion. It's the next step: a junior engineer applies the Terraform diff, CloudFormation complains about a duplicate index name, and now you've got an incident ticket. Or worse — the AI generates a second index with a slightly different name (&lt;code&gt;Orders-customerId-gsi&lt;/code&gt;), and now you're paying for duplicate write capacity on every &lt;code&gt;Orders&lt;/code&gt; write.&lt;/p&gt;




&lt;h2&gt;
  
  
  How infrawise Reads Your Actual GSI Definitions
&lt;/h2&gt;

&lt;p&gt;When you run &lt;code&gt;infrawise analyze&lt;/code&gt;, the DynamoDB adapter calls &lt;code&gt;DescribeTable&lt;/code&gt; on every table in your account. The response includes &lt;code&gt;GlobalSecondaryIndexes&lt;/code&gt; — the full list of indexes that actually exist, right now, in production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;GET /  →  DescribeTable { TableName&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Orders'&lt;/span&gt; &lt;span class="err"&gt;}&lt;/span&gt;

&lt;span class="na"&gt;Response&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;GlobalSecondaryIndexes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;IndexName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Orders-customerId-index&lt;/span&gt;
      &lt;span class="na"&gt;KeySchema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[{&lt;/span&gt; &lt;span class="nv"&gt;AttributeName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;customerId&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;KeyType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;HASH&lt;/span&gt; &lt;span class="pi"&gt;}]&lt;/span&gt;
      &lt;span class="na"&gt;Projection&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;ProjectionType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;ALL&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;IndexName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Orders-status-date-index&lt;/span&gt;
      &lt;span class="na"&gt;KeySchema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[{&lt;/span&gt; &lt;span class="nv"&gt;AttributeName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;KeyType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;HASH&lt;/span&gt; &lt;span class="pi"&gt;},&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;AttributeName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;createdAt&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;KeyType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;RANGE&lt;/span&gt; &lt;span class="pi"&gt;}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These index names go directly into the graph as &lt;code&gt;uses_index&lt;/code&gt; edges on the table node. The graph now knows: &lt;code&gt;Orders&lt;/code&gt; has two GSIs, covering &lt;code&gt;customerId&lt;/code&gt; and the &lt;code&gt;status + createdAt&lt;/code&gt; composite pattern.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;MissingGSIAnalyzer&lt;/code&gt; checks for tables with query edges but zero &lt;code&gt;uses_index&lt;/code&gt; edges — tables your code queries that genuinely have no indexes at all. If &lt;code&gt;Orders&lt;/code&gt; has &lt;code&gt;uses_index&lt;/code&gt; edges, the analyzer doesn't fire for it. No false alarm, no redundant suggestion.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the MCP Tools Surface Before You Write Anything
&lt;/h2&gt;

&lt;p&gt;Once &lt;code&gt;infrawise dev&lt;/code&gt; is running, Claude Code connects to it and the workflow changes. Before writing any query logic, the first call is &lt;code&gt;get_infra_overview&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;→ get_infra_overview

Tables:
  Orders          dynamodb
  Products        dynamodb
  UserSessions    dynamodb

High-severity findings: 0
Medium-severity findings: 1
  → UserSessions has no GSIs but is queried by 3 functions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Orders&lt;/code&gt; is there. No finding next to it — because it has indexes. The AI sees this and knows not to suggest new ones.&lt;/p&gt;

&lt;p&gt;If you then call &lt;code&gt;analyze_function&lt;/code&gt; on the function that queries &lt;code&gt;Orders&lt;/code&gt;, the response includes the existing &lt;code&gt;uses_index&lt;/code&gt; edges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;→ analyze_function { function: "getOrdersByCustomer" }

Services accessed:
  Orders  (query, uses_index: Orders-customerId-index)

Findings: none
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The index name is right there. The AI writes the query with &lt;code&gt;IndexName: 'Orders-customerId-index'&lt;/code&gt; — not because it's smart, but because it's reading real data.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;suggest_gsi&lt;/code&gt; tool is explicit about its own limitation. Its description reads: &lt;em&gt;"Does not verify whether the GSI already exists; check the table schema in &lt;code&gt;get_infra_overview&lt;/code&gt; first."&lt;/em&gt; It's intentionally a generation tool, not a verification tool. Verification is &lt;code&gt;get_infra_overview&lt;/code&gt;. The workflow is: look first, generate only if it's missing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The problem isn't that AI is careless. It's that AI is working from code, and code doesn't contain your infrastructure state. A &lt;code&gt;.query()&lt;/code&gt; call doesn't tell you whether the table has an index. A function name doesn't tell you what's deployed.&lt;/p&gt;

&lt;p&gt;infrawise bridges that gap by pulling live infrastructure state — &lt;code&gt;DescribeTable&lt;/code&gt;, real index names, real projection types — and exposing it through MCP before any code gets written. The AI stops suggesting indexes that exist because it can now see the ones that do.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;npm install -g infrawise&lt;/code&gt;, run &lt;code&gt;infrawise init&lt;/code&gt; in your repo, then &lt;code&gt;infrawise dev&lt;/code&gt;. The first time Claude Code calls &lt;code&gt;get_infra_overview&lt;/code&gt; and sees your actual table schema, the redundant GSI suggestions stop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/infrawise" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AI suggests GSIs based on query patterns in code — it has no visibility into indexes that already exist in your AWS account&lt;/li&gt;
&lt;li&gt;infrawise calls &lt;code&gt;DescribeTable&lt;/code&gt; on every DynamoDB table and extracts the full &lt;code&gt;GlobalSecondaryIndexes&lt;/code&gt; list into the infrastructure graph&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MissingGSIAnalyzer&lt;/code&gt; fires only on tables with &lt;strong&gt;zero&lt;/strong&gt; GSI coverage — tables that already have indexes don't trigger it&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_infra_overview&lt;/code&gt; surfaces existing index names before any code is written; &lt;code&gt;analyze_function&lt;/code&gt; shows which index a specific query uses&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;suggest_gsi&lt;/code&gt; is a generation tool — call it only after &lt;code&gt;get_infra_overview&lt;/code&gt; confirms the index doesn't exist&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>typescript</category>
      <category>dynamodb</category>
      <category>aws</category>
    </item>
    <item>
      <title>How infrawise Catches the DynamoDB Scan You Didn't Know You Were Making</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Mon, 01 Jun 2026 15:12:25 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/how-infrawise-catches-the-dynamodb-scan-you-didnt-know-you-were-making-40og</link>
      <guid>https://dev.to/siddharth_pandey_27/how-infrawise-catches-the-dynamodb-scan-you-didnt-know-you-were-making-40og</guid>
      <description>&lt;p&gt;Your Orders table has 50 million rows. Claude Code wrote a &lt;code&gt;listAllOrders()&lt;/code&gt; function that calls &lt;code&gt;.scan()&lt;/code&gt; with no filter. It compiled. Tests passed. Friday morning, your DynamoDB bill had a new line item.&lt;/p&gt;

&lt;p&gt;The problem isn't the AI — it's that the AI had no way to know. &lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;infrawise&lt;/a&gt; solves this by building a deterministic model of your actual infrastructure and exposing it through MCP before any code gets written. This post is about how the scan detection actually works under the hood.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Scanning the Repository with ts-morph
&lt;/h2&gt;

&lt;p&gt;When you run &lt;code&gt;infrawise analyze&lt;/code&gt;, the first pass is a TypeScript/JavaScript AST scan using ts-morph. infrawise walks every source file looking for database client call expressions — DynamoDB &lt;code&gt;DocumentClient.scan&lt;/code&gt;, &lt;code&gt;.query&lt;/code&gt;, &lt;code&gt;.get&lt;/code&gt;; PostgreSQL &lt;code&gt;pg.query&lt;/code&gt;; Mongoose model methods.&lt;/p&gt;

&lt;p&gt;For each call site it finds, it records three things: the containing function name, the target table or collection, and the operation type. A &lt;code&gt;.scan()&lt;/code&gt; call becomes an edge with type &lt;code&gt;scan&lt;/code&gt;. A &lt;code&gt;.query()&lt;/code&gt; call becomes a &lt;code&gt;query&lt;/code&gt; edge. These edges are the raw material for the graph.&lt;/p&gt;

&lt;p&gt;The limitation is real and documented: only TypeScript and JavaScript are supported. Dynamically constructed queries — where the table name or operation is assembled at runtime from a variable — may not resolve. infrawise handles what static analysis can handle and flags the rest.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Infrastructure Introspection
&lt;/h2&gt;

&lt;p&gt;In parallel, infrawise calls your AWS APIs directly. For DynamoDB it reads every table's actual schema: partition key, sort key, every GSI with its projection type and key schema, item count, billing mode. For Lambda it reads function configurations, memory, timeouts, and event source mappings. SQS queues, SNS topics, SSM parameters, Secrets Manager secrets, RDS instances, and CloudWatch log groups are all pulled the same way — deterministic API calls, no inference.&lt;/p&gt;

&lt;p&gt;This is what separates it from passing your Terraform files to an AI. Reading a &lt;code&gt;.tf&lt;/code&gt; file tells you what &lt;em&gt;should&lt;/em&gt; exist. Calling &lt;code&gt;dynamodb.describeTable&lt;/code&gt; tells you what &lt;em&gt;does&lt;/em&gt; exist, right now.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Building the Graph
&lt;/h2&gt;

&lt;p&gt;The graph engine connects the AST output to the infrastructure metadata. Each DynamoDB table, Lambda function, SQS queue, and RDS instance becomes a node. The call sites from the AST scan become typed edges between function nodes and table nodes: &lt;code&gt;scan&lt;/code&gt;, &lt;code&gt;query&lt;/code&gt;, &lt;code&gt;get&lt;/code&gt;, &lt;code&gt;publishes_to&lt;/code&gt;, &lt;code&gt;uses_index&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The result is a queryable graph. You can ask: which function nodes have &lt;code&gt;scan&lt;/code&gt; edges pointing to the &lt;code&gt;Orders&lt;/code&gt; table node? That's exactly the query the &lt;code&gt;FullTableScanAnalyzer&lt;/code&gt; runs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: The 24 Analyzers
&lt;/h2&gt;

&lt;p&gt;infrawise ships 24 rule-based analyzers. Each one is a graph traversal or a schema comparison — no model, no inference.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;FullTableScanAnalyzer&lt;/code&gt; calls &lt;code&gt;getScanEdges&lt;/code&gt;, which filters all graph edges where &lt;code&gt;type === 'scan'&lt;/code&gt;. For each edge that points to a DynamoDB table node, it records the table and the calling function, then emits a &lt;code&gt;HIGH&lt;/code&gt; severity finding. No threshold, no heuristic — any &lt;code&gt;.scan()&lt;/code&gt; on a DynamoDB table is flagged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  1.  HIGH   Full table scan detected on DynamoDB table "Orders"
             The table "Orders" is being scanned without any filter,
             which reads every item. This is expensive and slow for
             large tables. Called from: listAllOrders
             → Replace Scan with a Query operation using a partition
               key or GSI. If filtering is required on non-key
               attributes, add a Global Secondary Index (GSI).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The other analyzers follow the same pattern. &lt;code&gt;MissingGSIAnalyzer&lt;/code&gt; finds tables that have &lt;code&gt;query&lt;/code&gt; edges but no &lt;code&gt;uses_index&lt;/code&gt; edges — tables being queried with no GSI coverage. &lt;code&gt;HotPartitionAnalyzer&lt;/code&gt; counts distinct function nodes with edges to the same table; at five or more, it fires &lt;code&gt;MEDIUM&lt;/code&gt;. &lt;code&gt;MissingIndexAnalyzer&lt;/code&gt; compares PostgreSQL query predicates against the introspected &lt;code&gt;pg_indexes&lt;/code&gt; view. &lt;code&gt;NplusOneAnalyzer&lt;/code&gt; looks for repeated query edges from the same function in a loop pattern. Every rule is structural.&lt;/p&gt;




&lt;h2&gt;
  
  
  How This Reaches Your AI Assistant
&lt;/h2&gt;

&lt;p&gt;Running &lt;code&gt;infrawise dev&lt;/code&gt; starts a Fastify MCP server on Streamable HTTP. Claude Code connects to it and can query 13 tools — &lt;code&gt;get_infra_overview&lt;/code&gt;, &lt;code&gt;analyze_function&lt;/code&gt;, &lt;code&gt;suggest_gsi&lt;/code&gt;, &lt;code&gt;postgres_index_suggestions&lt;/code&gt;, and others.&lt;/p&gt;

&lt;p&gt;When Claude Code is about to write a query against &lt;code&gt;Orders&lt;/code&gt;, it calls &lt;code&gt;analyze_function&lt;/code&gt; first. The response includes the table schema, any existing GSIs, and the scan finding if one was detected. The AI writes a &lt;code&gt;query&lt;/code&gt; with the correct partition key instead of a &lt;code&gt;scan&lt;/code&gt; — not because it's smarter, but because it now has the same information a senior engineer would check before touching the table.&lt;/p&gt;

&lt;p&gt;For Claude Desktop, &lt;code&gt;infrawise stdio&lt;/code&gt; starts the same server on stdio transport.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The scan finding is the most visible output, but the real work is the graph: AST edges from ts-morph connecting function call sites to infrastructure nodes from live AWS APIs, traversed by 24 deterministic rules. No LLM touches the analysis path.&lt;/p&gt;

&lt;p&gt;If you're running Claude Code against a codebase with DynamoDB tables, &lt;code&gt;npm install -g infrawise&lt;/code&gt; and &lt;code&gt;infrawise init&lt;/code&gt; in your repo. The first &lt;code&gt;infrawise analyze&lt;/code&gt; usually finds something your AI assistant would have gotten wrong.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Sidd27/infrawise" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/infrawise" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;infrawise uses ts-morph to parse TypeScript/JavaScript source into a graph of function-to-table edges, typed by operation (&lt;code&gt;scan&lt;/code&gt;, &lt;code&gt;query&lt;/code&gt;, &lt;code&gt;get&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;AWS infrastructure metadata comes from live API calls — not Terraform, not static files — so the graph reflects what actually exists.&lt;/li&gt;
&lt;li&gt;24 rule-based analyzers traverse the graph deterministically; &lt;code&gt;FullTableScanAnalyzer&lt;/code&gt; flags any &lt;code&gt;.scan()&lt;/code&gt; edge to a DynamoDB table as &lt;code&gt;HIGH&lt;/code&gt; with no threshold.&lt;/li&gt;
&lt;li&gt;Context is exposed through an MCP server (Streamable HTTP for Claude Code, stdio for Claude Desktop) so AI tools see findings before they generate code.&lt;/li&gt;
&lt;li&gt;The analysis path contains zero LLMs — every finding is a graph query or schema comparison.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>typescript</category>
      <category>dynamodb</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Git Knows Who. AI Knows What. Nobody Knows Why.</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Sun, 31 May 2026 11:24:18 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/git-knows-who-ai-knows-what-nobody-knows-why-1lha</link>
      <guid>https://dev.to/siddharth_pandey_27/git-knows-who-ai-knows-what-nobody-knows-why-1lha</guid>
      <description>&lt;p&gt;Modern software development has achieved incredible things.&lt;/p&gt;

&lt;p&gt;AI can generate entire features.&lt;/p&gt;

&lt;p&gt;Editors can autocomplete your thoughts before you've finished having them.&lt;/p&gt;

&lt;p&gt;Agents can open PRs while you're still reading the ticket.&lt;/p&gt;

&lt;p&gt;And yet, despite all this progress, there is still one question capable of ruining a senior engineer's afternoon:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why the hell does this code exist?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Consider this innocent little gem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;distance&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks simple.&lt;/p&gt;

&lt;p&gt;AI can explain it.&lt;/p&gt;

&lt;p&gt;Git can tell you who wrote it.&lt;/p&gt;

&lt;p&gt;The PR can tell you when it was merged.&lt;/p&gt;

&lt;p&gt;But nobody can tell you why it was added in the first place.&lt;/p&gt;

&lt;p&gt;Was it reducing GPS noise?&lt;/p&gt;

&lt;p&gt;Was it preventing duplicate events?&lt;/p&gt;

&lt;p&gt;Was it added because thousands of devices were rapidly entering and exiting the same geofence?&lt;/p&gt;

&lt;p&gt;Was it a workaround for a production incident that woke up three engineers on a Sunday morning?&lt;/p&gt;

&lt;p&gt;We may never know.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Archaeology Phase of Software Engineering
&lt;/h2&gt;

&lt;p&gt;Every mature codebase eventually turns developers into archaeologists.&lt;/p&gt;

&lt;p&gt;You discover a mysterious piece of code and begin the sacred ritual:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check Git blame.&lt;/li&gt;
&lt;li&gt;Open commit.&lt;/li&gt;
&lt;li&gt;Open PR.&lt;/li&gt;
&lt;li&gt;Open linked ticket.&lt;/li&gt;
&lt;li&gt;Ticket references a Slack discussion.&lt;/li&gt;
&lt;li&gt;Slack link is dead.&lt;/li&gt;
&lt;li&gt;Original author left the company.&lt;/li&gt;
&lt;li&gt;Team lead moved to another startup.&lt;/li&gt;
&lt;li&gt;Nobody remembers anything.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Congratulations.&lt;/p&gt;

&lt;p&gt;You have reached the end of the knowledge graph.&lt;/p&gt;

&lt;p&gt;The only remaining documentation is a comment that says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Don't remove this&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Thank you, mysterious engineer from 2023.&lt;/p&gt;

&lt;p&gt;Very helpful.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Is About To Make This Much Worse
&lt;/h2&gt;

&lt;p&gt;The old workflow looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Human thinks
    ↓
Human writes code
    ↓
Human forgets why
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The new workflow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Human writes ticket
    ↓
AI writes code
    ↓
Human edits code
    ↓
Another AI refactors code
    ↓
Reviewer requests changes
    ↓
Code reaches production
    ↓
Everyone forgets why
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have successfully automated everything except remembering our decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  We Have Documentation Everywhere
&lt;/h2&gt;

&lt;p&gt;The funny thing is that companies already have mountains of documentation.&lt;/p&gt;

&lt;p&gt;The requirement is in Linear.&lt;/p&gt;

&lt;p&gt;The discussion is in Slack.&lt;/p&gt;

&lt;p&gt;The design is in Notion.&lt;/p&gt;

&lt;p&gt;The implementation is in GitHub.&lt;/p&gt;

&lt;p&gt;The AI conversation is in Cursor.&lt;/p&gt;

&lt;p&gt;The meeting notes are somewhere in a folder named:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Final_v2_Updated_Final_Real_Final
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem isn't missing information.&lt;/p&gt;

&lt;p&gt;The problem is that all the information lives in different universes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Git Answers The Wrong Question
&lt;/h2&gt;

&lt;p&gt;Git is fantastic.&lt;/p&gt;

&lt;p&gt;Ask Git:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Who changed this?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Git:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Bob.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ask Git:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Git:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;February 14th, 2026.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ask Git:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Git:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Commit message: "fix stuff"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Fantastic.&lt;/p&gt;

&lt;p&gt;Outstanding.&lt;/p&gt;

&lt;p&gt;Truly the pinnacle of human knowledge preservation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shortcut We Actually Need
&lt;/h2&gt;

&lt;p&gt;We already have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go To Definition&lt;/li&gt;
&lt;li&gt;Find References&lt;/li&gt;
&lt;li&gt;Rename Symbol&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What we don't have is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ctrl + Y
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why is this here?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Imagine clicking a line of code and seeing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reason:
Ignore GPS drift within 50 meters.

Requirement:
TRACKING-123

Discussion:
Customers reported phantom arrivals and departures.

Implementation:
PR #482

Generated by:
Cursor

Reviewer:
Sarah

Business Assumption:
Location updates within 50 meters are considered noise.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that's useful.&lt;/p&gt;

&lt;p&gt;Because most bugs aren't caused by developers not understanding code.&lt;/p&gt;

&lt;p&gt;They're caused by developers not understanding decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Missing Layer
&lt;/h2&gt;

&lt;p&gt;For decades we've been obsessed with source code.&lt;/p&gt;

&lt;p&gt;Then we became obsessed with documentation.&lt;/p&gt;

&lt;p&gt;Now we're obsessed with AI code generation.&lt;/p&gt;

&lt;p&gt;Meanwhile the most valuable thing keeps disappearing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The reasoning.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The chain actually looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Requirement
    ↓
Discussion
    ↓
Decision
    ↓
AI Generation
    ↓
Code Review
    ↓
  Code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Today we only preserve the last two steps.&lt;/p&gt;

&lt;p&gt;Then six months later we hold a meeting to rediscover the first four.&lt;/p&gt;

&lt;h2&gt;
  
  
  Maybe This Is the Next Developer Tool
&lt;/h2&gt;

&lt;p&gt;What if a VS Code extension could answer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why does this code exist?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not by hallucinating.&lt;/p&gt;

&lt;p&gt;Not by guessing.&lt;/p&gt;

&lt;p&gt;But by building a traceable chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Requirement
    ↓
Discussion
    ↓
AI Session
    ↓
Code Review
    ↓
  Code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Click a line.&lt;/p&gt;

&lt;p&gt;Press Ctrl+Y.&lt;/p&gt;

&lt;p&gt;Get the story.&lt;/p&gt;

&lt;p&gt;Not just the syntax.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Technical Debt
&lt;/h2&gt;

&lt;p&gt;People think AI-generated code is the next challenge.&lt;/p&gt;

&lt;p&gt;I disagree.&lt;/p&gt;

&lt;p&gt;The next challenge is AI-generated code with missing context.&lt;/p&gt;

&lt;p&gt;Code can be read.&lt;/p&gt;

&lt;p&gt;Logic can be reverse-engineered.&lt;/p&gt;

&lt;p&gt;Intent is much harder.&lt;/p&gt;

&lt;p&gt;And every year we're generating more code while preserving less reasoning.&lt;/p&gt;

&lt;p&gt;That's a dangerous trade.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Git knows who.&lt;/p&gt;

&lt;p&gt;AI knows what.&lt;/p&gt;

&lt;p&gt;Nobody knows why.&lt;/p&gt;

&lt;p&gt;And somewhere inside your production codebase is a line that nobody dares delete because the original author left three companies ago and the only documentation says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Trust me&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which, historically, has never caused any problems.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How RAGScope Knows Which Chunks Your LLM Actually Used</title>
      <dc:creator>Siddharth Pandey</dc:creator>
      <pubDate>Sun, 31 May 2026 09:46:24 +0000</pubDate>
      <link>https://dev.to/siddharth_pandey_27/how-ragscope-knows-which-chunks-your-llm-actually-used-e5k</link>
      <guid>https://dev.to/siddharth_pandey_27/how-ragscope-knows-which-chunks-your-llm-actually-used-e5k</guid>
      <description>&lt;p&gt;Your retriever fetched 10 chunks. Your LLM only used 3. RAGScope shows a precision score of 30 out of 100. The question every new user asks: how does it know?&lt;/p&gt;

&lt;p&gt;There is no OpenTelemetry attribute that says "this chunk was in the context window." RAGScope infers it — and the way it does this is the most consequential piece of engineering in the whole tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  There Is No "In Context" Attribute in OTel
&lt;/h2&gt;

&lt;p&gt;The OpenTelemetry semantic conventions for generative AI (&lt;code&gt;gen_ai.*&lt;/code&gt;) define attributes for model, input/output tokens, and retrieved documents. They do not define anything like &lt;code&gt;gen_ai.chunk.reached_llm&lt;/code&gt; or &lt;code&gt;gen_ai.retrieval.used_document_ids&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;When your RETRIEVER span fires, you get a list of documents. When your LLM span fires, you get a prompt and a completion. The two spans are connected by a parent-child trace relationship — but there is no attribute that maps which retrieved documents appear in which prompt.&lt;/p&gt;

&lt;p&gt;This gap matters. A reranker might drop 7 of your 10 chunks. Your application code might apply a token budget and truncate 4 more. From the trace alone, you cannot tell.&lt;/p&gt;

&lt;p&gt;RAGScope needs this information to compute the precision sub-score — the highest-weighted metric at 40% of the overall score. Getting it wrong would make precision meaningless.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Substring Match — How &lt;code&gt;assembleContext&lt;/code&gt; Works
&lt;/h2&gt;

&lt;p&gt;RAGScope's answer is in &lt;code&gt;src/enrichment/pipeline.ts&lt;/code&gt;, in a function called &lt;code&gt;assembleContext&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;assembleContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;RagChunk&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="nx"&gt;llmSpans&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ParsedSpan&lt;/span&gt;&lt;span class="p"&gt;[]):&lt;/span&gt; &lt;span class="nx"&gt;RagChunk&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;llmPrompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;llmSpans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llmPrompts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;position&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;llmPrompts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;inContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;contextPosition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;position&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;inContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;contextPosition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The approach: collect the raw prompt strings from every LLM span in the trace, then check whether each chunk's content appears as a literal substring of any of those prompts.&lt;/p&gt;

&lt;p&gt;If your LLM span records its prompt in the &lt;code&gt;input&lt;/code&gt; attribute — which TraceAI, Traceloop, and OpenTelemetry's gen_ai conventions all do — and your retriever span records the chunk content in &lt;code&gt;gen_ai.retrieval.documents&lt;/code&gt; — RAGScope has everything it needs.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;contextPosition&lt;/code&gt; counter assigns an incrementing index to each in-context chunk in the order they are encountered during the &lt;code&gt;chunks.map()&lt;/code&gt; iteration — which follows retrieval rank, not prompt position. It tracks which retrieved chunks are in context and their relative order among in-context chunks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why substring matching works
&lt;/h3&gt;

&lt;p&gt;Frameworks like LangChain and LlamaIndex build LLM prompts by concatenating retrieved chunk contents, often wrapped in minimal formatting like &lt;code&gt;Context:\n{chunk}\n&lt;/code&gt;. The chunk text itself is usually present verbatim. As long as the chunk content recorded on the RETRIEVER span matches what was injected into the prompt string — which it does when both come from the same retrieval call — substring matching is reliable.&lt;/p&gt;

&lt;p&gt;The constraint: &lt;code&gt;chunk.content&lt;/code&gt; must be non-empty and non-null. RAGScope only stores content when the RETRIEVER span includes it in the documents array. If your instrumentation omits content and only records chunk IDs, &lt;code&gt;assembleContext&lt;/code&gt; cannot match, and precision will read 0% until content is included.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means for Your Precision Score
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;scoreRetrieval&lt;/code&gt; in &lt;code&gt;src/audit/scorer.ts&lt;/code&gt; uses the &lt;code&gt;inContext&lt;/code&gt; flag directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;scoreRetrieval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;RagChunk&lt;/span&gt;&lt;span class="p"&gt;[]):&lt;/span&gt; &lt;span class="nx"&gt;SubScore&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;used&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inContext&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;used&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;precision&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;finding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;used&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; chunks used`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
        &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;`Reduce TOP_K &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;→&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;used&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt; (only &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;used&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; chunks reached LLM)`&lt;/span&gt;
        &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If 3 of 10 chunks appear in the LLM prompt, precision = 30. The recommendation fires automatically: &lt;code&gt;Reduce TOP_K 10→3&lt;/code&gt;. The score contributes 40% to the overall — a 30 on precision alone floors your overall score to at most 43, even if efficiency, redundancy, and coverage are perfect.&lt;/p&gt;

&lt;p&gt;This is the most common cause of FAIL scores: teams set &lt;code&gt;TOP_K=10&lt;/code&gt; during early experimentation and never reduce it. Ten chunks get retrieved. Three reach the LLM. The other seven waste token budget and push the efficiency score down too.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;--verbose&lt;/code&gt; flag makes this explicit. Each sub-score prints with its finding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   ✗  precision    30/100  3/10 chunks used
   ✗  efficiency   45/100  55% tokens wasted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the Recommendations section:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; Recommendations
   → Reduce TOP_K 10→3 (only 3 chunks reached LLM)
   → 55% of retrieved tokens never reached the LLM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When precision reads 0% unexpectedly
&lt;/h3&gt;

&lt;p&gt;If your trace has no LLM spans — for example, you're testing your retriever in isolation — &lt;code&gt;llmPrompts&lt;/code&gt; will be empty and &lt;code&gt;assembleContext&lt;/code&gt; returns all chunks unchanged with &lt;code&gt;inContext: false&lt;/code&gt;. In that case, &lt;code&gt;scoreRetrieval&lt;/code&gt; sees zero used chunks over a non-zero total, and precision reads 0.&lt;/p&gt;

&lt;p&gt;If your trace has no chunks at all, &lt;code&gt;scoreRetrieval&lt;/code&gt; short-circuits to a score of 100 with the finding &lt;code&gt;no chunks&lt;/code&gt; — the assumption being that a trace with no retrieved chunks represents a non-retrieval query that shouldn't be penalized.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;RAGScope's precision score is only meaningful because &lt;code&gt;assembleContext&lt;/code&gt; solves the hardest observability problem in RAG pipelines: figuring out which retrieved chunks actually reached the model. It does this by checking chunk content against LLM prompt strings — no extra instrumentation, no special attributes, no embeddings.&lt;/p&gt;

&lt;p&gt;The implication for your setup: include chunk content in your RETRIEVER spans. Without it, &lt;code&gt;assembleContext&lt;/code&gt; cannot match, precision stays at zero, and the most impactful metric in your audit is blind. With it, you get the exact number that tells you whether your &lt;code&gt;TOP_K&lt;/code&gt; setting is costing you context budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it:&lt;/strong&gt; &lt;a href="https://github.com/Sidd27/ragscope" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/ragscope" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;OTel has no "in context" attribute — RAGScope determines LLM context inclusion by checking if chunk content is a substring of the LLM span's prompt string&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;assembleContext&lt;/code&gt; in &lt;code&gt;src/enrichment/pipeline.ts&lt;/code&gt; performs this matching; &lt;code&gt;contextPosition&lt;/code&gt; tracks relative order among in-context chunks (by retrieval rank, not prompt position)&lt;/li&gt;
&lt;li&gt;Precision is 40% of the overall score — a low precision score is the most common cause of FAIL labels&lt;/li&gt;
&lt;li&gt;If chunk content is missing from your RETRIEVER spans, precision will read 0%; include content in your instrumentation to get accurate scores&lt;/li&gt;
&lt;li&gt;The automatic recommendation (&lt;code&gt;Reduce TOP_K N→M&lt;/code&gt;) fires when precision &amp;lt; 60%, giving a concrete action to take immediately&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>opensource</category>
      <category>observability</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
