DEV Community: Siddharth Pandey

The Right Way to Start Claude Code on an AWS Project

Siddharth Pandey — Tue, 14 Jul 2026 09:40:36 +0000

You know the drill for adding an MCP server to a project: dig the exact command string out of the docs, hand-write a .mcp.json with an absolute path you'll typo once, restart the editor, and discover no tools showed up because the server expected a config file you haven't created yet. Plenty of MCP servers lose their would-be users somewhere inside that loop.

Infrawise collapses the whole loop into one command. It's an open-source tool (npm) that statically analyzes your codebase, AWS infrastructure, and database schemas, then exposes that context to AI coding assistants over MCP — so Claude Code knows your actual partition keys, GSIs, and indexes instead of guessing from source files. This post is about the part that usually kills tools like this before they deliver any value: setup.

Section 1: One command, four steps

npm install -g infrawise   # or skip install and use npx
cd your-project
infrawise start --claude

start does four things, in order:

1. Probes your environment. If there's no infrawise.yaml in the project, it generates one. It reads AWS_PROFILE if set; otherwise it looks at your configured AWS profiles — one profile means zero questions, several means one prompt asking which to use. That's the entire interview. (If you want the full guided wizard instead, infrawise start --interactive runs it.)

2. Runs the analysis. It scans your AWS services, database schemas, and codebase, builds a graph of services, tables, indexes, and query patterns, and runs rule-based analyzers over it. No LLM is involved in this step — extraction and analysis are deterministic, so the same infrastructure always produces the same graph.

3. Writes .mcp.json to your project root. This is the file you'd otherwise write by hand:

{
  "mcpServers": {
    "infrawise": {
      "command": "infrawise",
      "args": [
        "serve",
        "--stdio",
        "--config",
        "/absolute/path/to/infrawise.yaml"
      ]
    }
  }
}

4. Opens Claude Code. Claude Code reads .mcp.json automatically and starts the session with all 21 infrawise tools available — schema lookups, per-function analysis, GSI suggestions, queue and topic details, the lot.

If the claude CLI isn't installed, start tells you exactly that and where to get it, instead of failing silently.

Section 2: Day two — you never type `infrawise` again

This is the part that matters more than the first run. .mcp.json doesn't point at a long-running server you have to remember to start. It tells the editor how to spawn one: every time you launch Claude Code in that project, the editor itself runs infrawise serve --stdio as a child process. There is no port to keep free, no background daemon to babysit, no "is the server running?" debugging session.

So the second-day workflow is:

claude

That's it. No infrawise command anywhere.

Two mechanisms keep the context from going stale underneath you:

A 24-hour analysis cache. The analysis from start is cached. When the editor spawns the server and the cache is fresh, the session begins instantly with the existing graph. Once the cache is older than 24 hours, the next session start refreshes it. The freshness is also visible to the assistant itself: get_infra_overview returns an analyzedAt timestamp, an ageSeconds value, and a stale flag, so Claude can tell you — or decide on its own — when the picture it has is old. (This came out of a reader question on an earlier post about deterministic analysis; it shipped as a proper feature.)

A file watcher inside the session. While the server is running, it watches the repository for file changes and re-runs code analysis on save. Write a new function that calls DynamoDB.scan() mid-session, and the code graph reflects it — you don't need to restart anything for the assistant to see what you just wrote.

Infrastructure changes are the one thing that won't propagate automatically mid-session — if you just added a table or changed a GSI in AWS, run infrawise analyze to force a full re-scan rather than waiting out the cache. And if the config itself has drifted from reality (new AWS account, different profile), infrawise start --rediscover deletes infrawise.yaml and the .infrawise/ cache directory and rebuilds both from scratch.

Section 3: Not a Claude Code shop? Same command, different flag

The --claude flag is one of three editor targets, and none of them changes what gets analyzed — only where the MCP config lands:

infrawise start --cursor writes .cursor/mcp.json and opens Cursor.
infrawise start --vscode writes into .vscode/mcp.json and opens VS Code. This one merges rather than overwrites: if you already have other MCP servers registered in that file, they survive. Infrawise adds its own entry alongside them.
infrawise start with no flag writes .mcp.json, prints the launch command for each editor, and exits — for any other MCP-capable editor, point it at infrawise serve --stdio --config /path/to/infrawise.yaml.

And for the teammates who don't use an AI editor at all, the same analysis runs as a CI gate:

infrawise check --fail-on high

check runs a fresh analysis and exits non-zero when any finding reaches the threshold severity (high is the default; medium and low tighten it). A full-table scan on a production DynamoDB table fails the build whether or not anyone on the team has ever opened Claude Code.

Section 4: What the assistant actually does with it

The payoff for the setup being this short is that the context is there before you need it. Without it, an AI assistant reading only your source files will happily suggest a .scan() on a table with 50 million rows, or recommend adding a GSI you already have. With the MCP tools connected, it can call get_table_schema for the exact columns and keys, analyze_function for a Lambda's real trigger event shape, or suggest_gsi for a ready-to-use index definition matched to your table's billing mode — before writing the query, not after you've reviewed it.

I've written about those analysis capabilities in earlier posts; the point of this one is narrower. A context tool only works if it's actually connected, and "actually connected" has to cost less than the problem it solves. Pasting a schema into a prompt costs thirty seconds, every session, forever. infrawise start --claude costs one command, once.

Conclusion

The gap between "this MCP server exists" and "this MCP server is running in my editor" is where most infrastructure context tools die. Infrawise's answer is to make the editor own the server lifecycle: one start command probes, analyzes, writes the editor config, and launches; from then on the editor spawns infrawise serve --stdio itself, a 24-hour cache keeps session starts instant, and a file watcher keeps the code graph current while you work.

Try it on a real project — the first start on an actual AWS account is where it gets interesting: GitHub · npm

Key Takeaways

infrawise start --claude is the entire setup: probe environment, generate infrawise.yaml, analyze, write .mcp.json, open Claude Code with 21 MCP tools.
After the first run, just launch your editor — it spawns infrawise serve --stdio from .mcp.json on its own. No daemon, no port, no second command.
Analysis is cached for 24 hours and refreshed at session start when stale; the assistant can check freshness itself via get_infra_overview.
File changes are picked up mid-session by a watcher; infrastructure changes need infrawise analyze, and --rediscover rebuilds config from scratch.
--cursor and --vscode target other editors (the VS Code writer merges with existing MCP servers), and infrawise check --fail-on high gates CI with the same analysis.

Why SNS Silently Drops Your Messages and How to Catch It Before You Ship

Siddharth Pandey — Sat, 11 Jul 2026 08:53:46 +0000

Your checkout service publishes an OrderRefunded event to an SNS topic. The publish call returns a MessageId. No exception, no retry, nothing in the dead-letter queue. Three days later a customer emails asking where their refund is, and you discover the refund Lambda was never invoked.

The message wasn't lost. It was filtered. One of the topic's subscriptions has a filter policy that requires an eventType message attribute, and your publish call didn't include it. SNS did exactly what it was configured to do: it evaluated the filter, found no match, and skipped delivery for that subscription. From the publisher's side, everything looks like success.

This is one of the nastiest failure modes in event-driven AWS architectures, because every tool you'd normally reach for reports green. This post walks through why it happens, why neither your code review nor your AI coding assistant can catch it from source code alone, and how to make the contract visible at coding time instead of incident time.

The contract nobody wrote down

SNS filter policies live on the subscription, not the topic. A subscriber says "only deliver messages where the eventType attribute is one of these values":

{
  "eventType": ["order.refunded", "order.cancelled"]
}

For that subscription to ever receive anything, the publisher must include the attribute:

await sns.send(
  new PublishCommand({
    TopicArn: process.env.ORDER_EVENTS_TOPIC,
    Message: JSON.stringify(refund),
    MessageAttributes: {
      eventType: { DataType: 'String', StringValue: 'order.refunded' },
    },
  }),
);

Omit MessageAttributes and the publish still succeeds — SNS accepts the message and returns a MessageId regardless of whether any subscription matches. The message simply isn't delivered to the filtered subscription.

Three things make this failure quiet:

No error surface. A filter mismatch is not a delivery failure. The subscription's redrive policy never kicks in, so the message never lands in a DLQ. There is nothing to retry and nothing to alert on by default.
The only trace is a metric. SNS increments a NumberOfNotificationsFilteredOut CloudWatch metric, but unless you already suspect filtering, nobody looks at it — and by the time you do, you're debugging in production.
Tests pass. Unit tests mock the SNS client and assert that PublishCommand was called. The mock has no filter policy. The gap between your code and the subscription's configuration is exactly the part the test can't see.

There is a second variant of the same trap: filter policies can also match against the message body (FilterPolicyScope: MessageBody) instead of message attributes. Then the required keys must appear in your JSON payload itself. Same silence, different location.

Why your AI assistant writes the broken version

Ask an AI coding assistant to "publish an order refunded event to the order-events topic" and it will produce a perfectly reasonable PublishCommand call — topic ARN from an environment variable, JSON.stringify on the payload, maybe a comment. What it will almost never produce is the MessageAttributes block with exactly the keys your subscriptions filter on.

It can't. The filter policy is not in your repository. It's a JSON document attached to a subscription in your AWS account, set by whoever wired up the consumer — possibly you, six months ago. The assistant reads your source files, sees other publish calls (which may themselves be missing attributes), and generates code that matches the pattern it found. If the pattern is wrong, the wrongness propagates.

The manual fix is miserable: open the AWS console, navigate to the topic, open each subscription one by one, read each filter policy, and paste the required attribute names into your prompt. Every session. For every topic. This is the copy-paste loop that breaks flow — and if you skip it once, you ship the silent drop.

Pulling the contract into the coding session

This is the problem infrawise is built for: it extracts your live infrastructure into a graph and serves it to your AI assistant over MCP, so the assistant queries facts instead of guessing.

cd your-project
npx infrawise start --claude

That analyzes your AWS account and codebase, writes .mcp.json, and opens Claude Code with 21 MCP tools connected. For SNS specifically, the extraction is read-only and direct: infrawise lists your topics, then for every confirmed subscription fetches its attributes, parses the FilterPolicy JSON, and records which attribute keys the policy requires and whether the policy scope is MessageAttributes or MessageBody.

The result is exposed through a tool called get_topic_details. For each topic it returns the subscription count, encryption status, and a filterPolicies array:

{
  "name": "order-events",
  "provider": "aws",
  "subscriptionCount": 3,
  "encrypted": true,
  "filterPolicies": [
    {
      "subscriptionArn": "arn:aws:sns:...:order-events:a1b2...",
      "protocol": "sqs",
      "requiredAttributes": ["eventType"],
      "scope": "MessageAttributes"
    }
  ]
}

Now the workflow changes. When you ask your assistant to write a publish call, it calls get_topic_details first, sees that a subscription on order-events filters on eventType, and writes the MessageAttributes block into the code on the first attempt. The contract that used to live invisibly in a subscription's configuration is now part of the context every generated publish call is checked against.

The tool description itself tells the assistant when to use it — before writing any SNS publish code — so you don't have to remember to prompt for it. And because infrawise also scans your application code (an AST pass that recognizes PublishCommand, PublishBatchCommand, and SDK publish calls and resolves their target topic ARNs), the graph knows which of your functions publish to which topics. Reviewing an existing publisher works the same way: the assistant can cross-reference what the function sends against what the topic's subscriptions require.

A few boundaries worth stating, because tools that read your AWS account should be explicit about them. Infrawise is read-only — the SNS extraction uses GetTopicAttributes, ListSubscriptionsByTopic, and GetSubscriptionAttributes, nothing that writes. It never reads message contents, secret values, or parameter values. And the analysis is deterministic: AST parsing and API introspection, no LLM deciding what your infrastructure looks like. The LLM is only a consumer of the extracted context.

Conclusion

Silent message drops are a configuration-versus-code mismatch, and those never show up in the layer you're staring at. The filter policy is correct. The publish code is correct. Only the combination is broken, and the combination exists nowhere in your repository — until you put it there.

You can do that manually every session by reading subscriptions in the console, or you can have it extracted once and served to your assistant automatically. If you're on the second option: GitHub · npm — npx infrawise start --claude and ask your assistant what the order-events topic requires.

Key Takeaways

An SNS publish that misses a subscription's filter policy succeeds silently: MessageId returned, no DLQ entry, no error. The redrive policy only covers delivery failures, and a filter mismatch isn't one.
The publisher's contract lives on the subscription, in AWS, not in your code — so tests with mocked SNS clients and AI assistants reading source files both miss it.
Check FilterPolicyScope: policies can require message attributes or keys in the message body. The fix is different for each.
Before writing any publish call, enumerate the target topic's filter policies and include every required attribute. With infrawise, get_topic_details gives your AI assistant that list automatically.
The NumberOfNotificationsFilteredOut CloudWatch metric is your post-hoc signal, but the goal is to never need it — catch the missing attribute at coding time.

Stop Sending Your AI Assistant 40 Tables When It Only Needs 3

Siddharth Pandey — Tue, 07 Jul 2026 08:35:15 +0000

Say your service has 40 tables. You ask Claude Code to fix a bug in checkout — a function that touches exactly three of them: orders, payments, inventory_reservations. If your MCP server hands the model your whole schema graph on every call just to answer that, you've spent a few thousand tokens of context on 37 tables nobody asked about, before the model has written a single line of code.

Multiply that by every tool call, every session, every developer on the team, and "just give it the schema" turns into a real line item — slower responses, a noisier context window, and a model more likely to get distracted by a campaigns table that has nothing to do with the bug you're fixing.

Infrawise's MCP tools are built around a specific answer to this: never send more schema than the task needs, and give the agent an explicit way to ask for exactly what it's missing.

The two bad defaults

Without something like this, an AI coding assistant reading your codebase has two options, and both are wrong in a different direction.

Option one: no schema at all. The model reads your source files, sees a DocumentClient.scan() call, and has no way to know that table has 50 million rows and a GSI it isn't using. It writes code that compiles and looks reasonable and is wrong the moment it touches production data — because "wrong" here isn't a syntax problem, it's a missing fact about your infrastructure that isn't in any file it can read.

Option two: dump everything, every time. Paste the full schema — every table, every column, every foreign key, every DynamoDB GSI — into the prompt so the model definitely has what it needs. This works, in the sense that the model now has the fact it's missing. It also means every single request pays for the full graph whether the task touches one table or fifteen. Context windows aren't free, and neither is the model's attention: the more irrelevant schema it has to read past, the more likely it latches onto the wrong table or a stale column name from a service you don't even own.

Infrawise's MCP server is designed around a third option: give the agent a cheap way to see what exists, then let it ask for detail only on the things it's actually going to touch.

How the lookup actually works

The server exposes 21 tools, but three of them define the whole pattern.

get_infra_overview is the entry point. It returns a compact snapshot — every table and collection by name and database type, queue and topic names, secret names, Lambda names, high-severity findings — no columns, no foreign keys, no indexes. It's meant to answer "what exists here" in a few hundred tokens, not "give me everything about it."

get_table_schema is where the actual detail lives, and it's scoped on purpose: it takes a list of 1 to 20 table or collection names and returns, per name, the columns with data types and nullability, primary keys, foreign keys (so an agent building a join knows the path without guessing), indexes, DynamoDB partition/sort keys, or a MongoDB estimated document count. Row data is never included — this is schema, not a data dump. Names are matched case-insensitively and by suffix, so asking for orders matches public.orders without the agent needing to know your schema prefix in advance. Ask for a table that doesn't exist and instead of a bare failure, you get up to five closest name matches back — useful when the agent guessed order instead of orders.

get_graph_summary is still there, and it still returns everything — every node, every edge, every finding, no filtering. It's the tool description that makes the intent explicit: it exists as the tool to reach for when you genuinely need the full picture across services, not the one an agent should reach for to answer "what does the orders table look like." The default path is get_infra_overview for orientation, get_table_schema for the two or three tables actually in scope, and get_graph_summary only when the task is broad enough to need it — reviewing an entire service, tracing relationships across five different tables and functions at once.

What this looks like on a real task

Take the checkout bug from the top. An agent working through it calls get_infra_overview once and learns there are 40 tables across Postgres and DynamoDB, plus a queue and a couple of Lambdas — cheap, one call, no column data yet. It's now looking at orders.ts, sees a query joining on payment_id, and calls get_table_schema with ["orders", "payments"]. Back comes exactly two schemas: column types, the foreign key from orders.payment_id to payments.id, and the indexes on both tables. That's the entire fetched context for the task — two tables, not forty.

If the bug turns out to touch inventory_reservations too, the agent just adds it to the next get_table_schema call. It never had to have asked for it upfront, and it never had to eat the cost of the other 37 tables it was never going to look at.

Compare that to the alternative most teams reach for without a tool like this: pasting the schema export once, keeping it in context, and hoping it doesn't drift as the schema changes. Infrawise's tables come from a fresh analysis (get_infra_overview even reports a freshness field with an age and a stale flag once the cached analysis passes 24 hours), so the agent knows when to ask for a re-run instead of working off of something that quietly went out of date three deploys ago.

Why this matters beyond the token count

The token savings are the visible part, but the more important effect is on accuracy. A model given forty tables' worth of columns has to do implicit filtering — figure out which of them are relevant, hold the rest as noise. Give it three tables that are actually in scope, and there's no filtering step: everything in context is something it's going to use. That's a smaller, sharper problem than "here's a schema, find what you need," and models are measurably better at smaller, sharper problems.

It also composes with what AGENTS.md calls out directly for large-database use cases — a text-to-SQL or query-writing agent should call get_infra_overview once per session for the table inventory, then get_table_schema only for the tables the current query touches, and treat get_graph_summary as the tool of last resort, not the default. That's the same pattern the checkout example walks through, just named as the recommended path for exactly the case where dumping everything hurts most — a database with hundreds of tables where "just paste the schema" was never realistic in the first place.

None of this requires the developer to think about it. You don't decide when to call get_table_schema versus get_graph_summary — the agent does, because the tool descriptions say when to reach for each one. The developer experience is just: ask Claude Code to fix the bug, and it already knows which two tables matter.

Key Takeaways

Two bad defaults exist for schema context: give an AI assistant nothing (it guesses and gets it wrong) or give it everything (it pays for and gets distracted by tables it will never touch).
get_infra_overview is a compact, column-free snapshot meant for orientation — names and types, not detail.
get_table_schema fetches full column, key, and index detail for up to 20 named tables at a time, matched case-insensitively by short name, with fuzzy suggestions on a miss.
get_graph_summary still returns the full graph — it's the explicit escape hatch for cross-service work, not the tool an agent should reach for to answer a question about two tables.
The pattern scales down naturally to large databases: fetch the inventory once per session, then pull schemas only for the tables the current task actually touches.

GitHub · npm

Stop Pasting Your Database Schema Into Every AI Prompt

Siddharth Pandey — Mon, 06 Jul 2026 06:04:42 +0000

Your team builds an internal agent that answers questions like "revenue by product category last quarter." It generates SQL against a Postgres database with 240 tables. The first version does the obvious thing: dump the entire schema into the system prompt. It mostly works, and it is quietly terrible — every single question pays the token cost of 240 table definitions, the agent confuses users.name with the actual column full_name because the real signal is buried in noise, and the schema snapshot in the prompt drifts out of date the moment someone runs a migration.

The fix is not a bigger context window. It is giving the agent the same thing a human analyst has: a way to look up the tables it needs, when it needs them.

This is a use case I did not originally design Infrawise for — it started as infrastructure context for AI coding assistants — but as of v0.15.0 it handles this pattern end to end, for SQL and NoSQL databases. (GitHub · npm)

The schema dump is the problem, not the model

When a query-writing agent gets the whole schema up front, three things go wrong:

Token cost scales with your database, not your question. A 240-table schema rendered as DDL is tens of thousands of tokens. You pay that on every request, forever, even when the question touches two tables.

Irrelevant schema actively hurts accuracy. Models pick columns from tables that merely look plausible. The more near-duplicate tables in context (orders, orders_archive, orders_v2), the more confidently wrong the generated SQL gets.

The dump goes stale. Someone pastes the schema into the prompt template in March. In May a column is renamed. The agent keeps generating SQL against the March schema, and the failures look like model errors instead of what they are: stale context.

The alternative is progressive disclosure — a small inventory up front, full detail on demand. Infrawise exposes exactly that over MCP, the protocol most agent SDKs can already speak.

Inventory first, schema on demand

Infrawise runs a read-only analysis of your infrastructure — Postgres, MySQL, and MongoDB via schema introspection, DynamoDB via the AWS API — and serves the result through an MCP server. It never reads row data, only metadata.

The agent flow is two calls.

Call 1, once per session: get_infra_overview. This returns a compact inventory — every table name with its database type, plus counts and high-severity findings. For a 240-table database this is a fraction of the tokens of a full dump, because it is names, not definitions. The response also carries a freshness object with analyzedAt, ageSeconds, and a stale flag that flips after 24 hours, so the agent knows whether the analysis is current or should be refreshed with infrawise analyze. That freshness contract means you can cache the inventory for the session and trust the server to tell you when it has gone stale.

Call 2, per question: get_table_schema. When the agent decides the question needs orders and payments, it asks for exactly those (up to 20 names per call):

{
  "name": "get_table_schema",
  "arguments": { "tables": ["orders", "payments"] }
}

Names are matched case-insensitively, and short names work — orders matches public.orders, so the agent does not need to know your schema-qualification convention. The response is the full picture for just those tables:

{
  "note": "Row data is never included.",
  "tables": [
    {
      "requested": "orders",
      "found": true,
      "matches": [
        {
          "name": "public.orders",
          "databaseType": "postgres",
          "columns": [
            { "name": "id", "dataType": "integer", "nullable": false },
            { "name": "user_id", "dataType": "integer", "nullable": false },
            { "name": "status", "dataType": "character varying", "nullable": false },
            { "name": "total", "dataType": "numeric", "nullable": false }
          ],
          "primaryKeys": ["id"],
          "indexes": ["orders_pkey"]
        }
      ]
    }
  ]
}

Column names, data types, nullability, primary keys, indexes. If the agent asks for a table that does not exist — a hallucinated name, a typo — it gets found: false, and when near matches exist, up to five suggestions from the real inventory. That turns a silent wrong-table failure into a correctable one.

The prompt for any given question now contains the inventory plus two or three real schemas instead of 240 speculative ones. Costs drop, and the model stops choosing columns from tables that were never relevant.

Foreign keys are the context your agent actually needed

Column lists get you valid syntax. Join paths get you correct queries — and join paths are exactly what schema dumps usually omit, because people paste \d output or ORM models without constraint details.

Infrawise extracts foreign keys directly from information_schema (Postgres and MySQL) during analysis, and get_table_schema returns them per table:

"foreignKeys": [
  {
    "column": "order_id",
    "referencesTable": "public.orders",
    "referencesColumn": "id"
  }
]

When the agent fetches payments and sees order_id → public.orders.id, the join is no longer a guess. Chains compose the same way: fetch the tables the question mentions, follow the referencesTable values one hop out, and the agent has the join graph for the query without ever seeing the other 230 tables.

The same flow works for NoSQL

The tool is polymorphic across database types, because "what does my agent need to know before writing this query" differs by engine:

DynamoDB — the response carries partitionKey and sortKey, which decide whether the access pattern is a cheap Query or an expensive Scan. Index names cover the GSIs.
MongoDB — collections return their indexes and an estimated document count, so the agent can tell a 100-row lookup table from a 50M-document collection before choosing a filter strategy.

One inventory call, one schema tool, four database engines.

Wiring it into an agent

Setup is one command in the project that has your infrawise.yaml (or lets start generate one):

npx infrawise start    # analyze + write .mcp.json for stdio-based editors/agents

For agents that speak HTTP instead of stdio:

infrawise serve        # MCP server at http://localhost:3000/mcp

Any MCP-capable agent framework can then list the tools and call them. The system prompt shrinks to a policy instead of a schema:

Call get_infra_overview once to learn what tables exist. Before writing a query, call get_table_schema with only the tables the question needs. Use the returned foreign keys for joins. Never guess column names.

Two cautions from building this. First, the server has no authentication — it is designed to run locally or inside a private network, next to the agent. Keep it off the public internet; it exposes schema metadata (never data, never secrets, but your table names are your business). Second, the analysis is a snapshot: honor the stale flag rather than assuming the graph is live.

Conclusion

"Text-to-SQL accuracy" problems are very often context problems wearing a trench coat. An agent that receives 240 table definitions is being asked to find a needle in a haystack it paid for by the token. An agent that can look up exactly the three tables a question touches — with types, primary keys, and foreign-key join paths — is doing what a competent analyst does with a schema browser open.

Infrawise was built to stop AI coding assistants from guessing about infrastructure. It turns out the same deterministic extraction, served over MCP, is a schema context provider for any query-writing agent. That flow shipped in v0.15.0: GitHub · npm.

Key Takeaways

Full-schema prompts scale cost with database size, not question size — and the irrelevant 95% of the schema makes column hallucination worse, not better.
Use progressive disclosure: get_infra_overview once per session for the table inventory, get_table_schema for only the tables each question needs.
Foreign keys are the highest-value schema context for SQL generation — they turn joins from guesses into lookups. Infrawise extracts them from information_schema automatically.
Unknown table names return suggestions instead of silent failure, catching hallucinated tables before they become broken SQL.
Respect the built-in freshness contract: the overview reports analysis age and flips a stale flag at 24 hours, so agents know when to trigger a re-analysis instead of trusting old schema.

Your SQS Queue Is Redelivering Messages Your Lambda Is Still Processing

Siddharth Pandey — Sun, 05 Jul 2026 09:41:35 +0000

Your order-processing Lambda starts sending duplicate confirmation emails. Not always — maybe one order in twenty. CloudWatch shows more invocations than messages published. The function code hasn't changed in weeks. What changed is that someone added a fraud check that pushed processing time from 25 seconds to around 45, and your SQS queue is still running the default 30-second visibility timeout.

That combination is the whole bug. When a Lambda pulls a message from SQS, the message isn't deleted — it's hidden for the duration of the visibility timeout. If the function is still working when that window closes, SQS assumes the consumer died and hands the same message to another invocation. Now two Lambdas are processing the same order, both will "succeed," and both will send the email. Nothing errors. Nothing retries. There is no log line that says "this message was delivered twice because your timeouts are misconfigured."

Infrawise (npm) flags this exact mismatch as a high-severity finding before it costs you an afternoon of staring at idempotency-free handler code. This post walks through why the bug is so hard to see, how the detection works, and how to keep an AI assistant from reintroducing it.

Why you never catch this one yourself

Three things make this misconfiguration nearly invisible:

It passes every test. In local tests and staging, your handler processes a synthetic message in two seconds. The 30-second visibility window never comes close to expiring. The bug only exists under production conditions — real payload sizes, real downstream latency, cold starts stacking on top of slow dependencies.

The defaults set the trap. SQS queues default to a 30-second visibility timeout. Lambda functions routinely get their timeout bumped to 60, 120, or 900 seconds as they grow. Nobody bumps the queue at the same time, because the two settings live in different consoles, different IaC resources, and usually different pull requests.

The failure signature points elsewhere. Duplicate processing looks like an application bug. You'll audit your handler for accidental double-sends, check whether the producer published twice, and read DynamoDB conditional-write docs before anyone thinks to compare two timeout values across two services.

The fix is one line of IaC. Finding out you need it is the expensive part.

How infrawise detects the mismatch

Infrawise builds a graph of your actual AWS account and runs rule-based analyzers over it — no LLM involved in the analysis, so a finding either fires or it doesn't (deterministic by design).

For this check, two extractions matter:

Queue attributes. For every queue returned by ListQueues, infrawise calls GetQueueAttributes and records VisibilityTimeout alongside the redrive policy, encryption status, and approximate message counts. A queue node in the graph carries its visibilityTimeoutSec.

Event source mappings. Infrawise paginates through ListEventSourceMappings and attaches each mapping to its Lambda. Every SQS-type mapping becomes a triggers edge in the graph: queue:aws:order-events → lambda:aws:process-order. Disabled mappings are skipped — a queue that used to feed a Lambda doesn't generate noise.

Then the VisibilityTimeoutMismatchAnalyzer walks the graph: for each queue that actually triggers a Lambda, it compares the queue's visibility timeout against that function's configured timeout. If the visibility timeout is smaller, you get a high-severity finding.

Two details keep this precise rather than noisy:

Only wired-up queues are checked. A queue with no active event source mapping is ignored. The analyzer isn't pattern-matching on names or guessing at architecture — it follows the same edge SQS itself uses to deliver messages.
The comparison uses real values from both sides. The Lambda timeout comes from the function configuration; the visibility timeout comes from the live queue attributes. If your Terraform says one thing and someone changed the queue in the console, infrawise reports what's actually deployed.

Running infrawise analyze against the scenario above prints:

  1.  HIGH   Queue "order-events" visibility timeout (30s) is less than Lambda "process-order" timeout (120s)
       If the Lambda takes longer than the visibility timeout, SQS will re-deliver the message to another consumer while the original invocation is still running, causing duplicate processing.
       → Set the visibility timeout for "order-events" to at least 720s (6× the Lambda timeout of 120s), per AWS best practice.

The 6× multiplier follows AWS's own guidance for Lambda event source mappings: the extra headroom covers batch processing and retries within the polling window, not just a single invocation.

The fix — and keeping it fixed when AI writes your infra

The immediate fix is mechanical. In CDK:

const orderEvents = new sqs.Queue(this, 'OrderEvents', {
  visibilityTimeout: cdk.Duration.seconds(720), // 6× the consumer's 120s timeout
});

The longer-term problem is that this class of bug gets reintroduced constantly — and increasingly by AI assistants. Ask an assistant to "add an SQS queue that triggers the order processor" and it will happily emit a queue with default settings wired to a 120-second function. The code is syntactically perfect. It deploys. It has the bug from paragraph one built in.

This is where infrawise's MCP server changes the workflow. Run once:

infrawise start --claude

It scans your account, writes .mcp.json to the project root, and opens Claude Code with the analysis available as tools. From then on the assistant can call get_queue_details, which returns every queue with its visibilityTimeoutSec, DLQ presence, encryption, FIFO flag, and message counts — plus any findings attached to that queue. The tool description explicitly tells the model to verify visibility timeout against Lambda timeout when reviewing messaging architecture, so an assistant asked to touch queue infrastructure checks the real numbers instead of emitting defaults.

Concretely, the before/after looks like this:

Before: "add a queue for order events" → assistant generates new sqs.Queue(...) with no visibility timeout → default 30s ships → duplicates appear weeks later when processing slows down.
After: the assistant calls get_queue_details and get_lambda_overview, sees the consumer's 120s timeout, and generates the queue with visibilityTimeout: Duration.seconds(720) — citing the existing high-severity finding if one is already live.

Two adjacent findings tend to show up in the same report, and they compound. If the mismatched queue also has no dead-letter queue — another high-severity check — your duplicate-processing problem coexists with silent message loss after retries are exhausted. And note the fix's flip side: a properly long visibility timeout means a genuinely failed message stays hidden for that full window before retry, which is exactly why the DLQ check matters alongside this one. Duplicate delivery can still happen in rare cases even with correct timeouts (SQS standard queues are at-least-once), so idempotent handlers remain good practice — but there's a difference between designing for a rare edge case and misconfiguring your way into hitting it on every slow invocation.

Analysis results are cached for 24 hours, and the MCP server refreshes stale analysis at session start, so the numbers your assistant reads track what's actually deployed.

Key takeaways

A Lambda timeout longer than its queue's visibility timeout guarantees duplicate processing under load. SQS redelivers any message whose consumer is still running when the visibility window closes.
The defaults create the bug: queues start at 30 seconds, Lambda timeouts grow over time, and the two values live in different resources that rarely change in the same PR.
Set visibility timeout to 6× the consumer Lambda's timeout, per AWS guidance — and pair every triggered queue with a dead-letter queue so retries can't silently discard messages.
Check deployed values, not IaC intent. Infrawise compares the live queue attribute against the live function configuration, so console drift gets caught too.
Give your AI assistant the real numbers. With infrawise's MCP tools connected, an assistant generating queue infrastructure reads actual timeouts instead of shipping defaults.

Try it against your own account — or against a free LocalStack sandbox if you want to see the findings without touching real AWS: GitHub · npm.

Block AI-Generated Infrastructure Mistakes in CI Before They Hit Production

Siddharth Pandey — Sun, 28 Jun 2026 17:57:30 +0000

Last Tuesday your team merged a PR with 40 lines of clean TypeScript. Code review passed — the function was readable, typed correctly, and had a unit test. Twenty minutes after deploy, CloudWatch alerted: your Orders table was being fully scanned on every request. The merged function called .scan() without a partition key filter. Claude Code wrote it; nobody caught it because nobody — not the reviewer, not the tests, not the linter — had any way to know that Orders has 8 million items.

This is the gap infrawise check fills. It's a CI step that reads your actual DynamoDB schemas, PostgreSQL indexes, and query patterns, then fails the build when AI-generated code introduces anti-patterns against your real infrastructure.

The Gap Between Code Review and Infrastructure Reality

Static analysis tools like ESLint, TypeScript, and Semgrep analyze the code. They can't tell you that listAllOrders() is doing a full table scan, or that your PostgreSQL users table has no indexes and the new query will degrade from milliseconds to seconds as data grows.

AI assistants compound this. They write syntactically correct code that compiles and passes tests — but they're working from source files, not live infrastructure. They don't know your table's partition key distribution. They don't know you already have a GSI on status. They generate code against an imagined infrastructure and get it wrong in ways that only surface at scale.

The consequence isn't just performance. A DynamoDB full scan in a high-traffic Lambda reads capacity units proportional to table size on every invocation. A missing index causes query times to grow linearly with rows. Unit tests won't catch either of these; they show up in production.

infrawise check — a Build Step That Knows Your Tables

infrawise check runs a fresh analysis of your codebase and cloud infrastructure, then sets exit code 1 when blocking findings exist. That exit code is all a CI pipeline needs.

infrawise check

Actual output on violations:

  Blocking findings  2 at or above high

  1.  HIGH   Full table scan detected on DynamoDB table "Orders"
             The table "Orders" is being scanned without any filter, which reads every item.
             → Replace Scan with a Query operation using a partition key or GSI.

  2.  HIGH   Full table scan detected on DynamoDB table "Sessions"
             The table "Sessions" is being scanned without any filter. Called from: cleanupExpiredSessions
             → Replace Scan with a Query operation using a partition key or GSI.

  ✗ Check failed
    2 high+ finding(s) must be resolved before deploy

Findings reference the exact table name and caller function. The recommendation isn't generic — it points to the specific operation to replace.

Add it to GitHub Actions in one step:

- name: Infrastructure check
  run: infrawise check --fail-on high

--fail-on high means only HIGH findings block the build. Medium and low appear in the log but don't fail the step. Use --fail-on medium for stricter enforcement.

Available options for --fail-on: high (default), medium, low.

Writing infrawise.yaml for CI

In local development, infrawise start --claude generates the config and connects your editor. In CI, commit the config and pass credentials through environment variables:

project: payments-service

aws:
  profile: default
  region: us-east-1

dynamodb:
  enabled: true
  includeTables:
    - Orders
    - Users

postgres:
  enabled: true
  connectionString: postgresql://infrawise_ro:${DB_PASSWORD}@host:5432/mydb

lambda:
  enabled: true

The ${DB_PASSWORD} substitution keeps credentials out of the committed config — infrawise expands environment variables at runtime. Set DB_PASSWORD as a CI secret.

For AWS access, infrawise is read-only. The minimum IAM policy it needs for DynamoDB is dynamodb:ListTables and dynamodb:DescribeTable. Create an IAM role for CI with just these permissions and configure OIDC trust for your GitHub Actions runner.

What Blocks, What Warns, What Can Wait

Not every finding warrants a hard block. A starting policy:

Block on HIGH: Full table scans, hotspot patterns where multiple functions hammer the same partition key. These have immediate cost and availability impact.

Warn on MEDIUM: Tables with no GSIs queried by multiple functions, missing indexes on filterable columns in PostgreSQL. These degrade as data grows — log them and schedule remediation.

Log only for LOW: Non-critical index suggestions, rarely-queried pattern gaps. Useful signal for the next sprint.

Start with --fail-on high. Once the team has cleared existing HIGH findings, tighten to --fail-on medium. The goal is catching regressions — new findings introduced by merged code — not blocking work on pre-existing debt.

The includeTables field in infrawise.yaml lets you scope analysis to specific tables. Roll out the gate incrementally: start with the tables your AI assistant touches most.

Conclusion

The table scan that paged your on-call engineer wasn't wrong code. It was a gap in tooling — your pipeline understood TypeScript but not DynamoDB. infrawise check closes that gap by giving CI the same infrastructure context your AI assistant should have had when it wrote the query.

One step in the pipeline. One committed config file. The build fails when generated code would fail in production.

GitHub · npm

Key Takeaways

AI-generated code passes code review and unit tests but can introduce DynamoDB full scans and missing index queries that only appear at scale
infrawise check runs deterministic infrastructure analysis and exits non-zero on blocking findings — drop-in CI gate
--fail-on high|medium|low controls what severity blocks the build; high is the default
infrawise.yaml uses ${ENV_VAR} substitution so database passwords never need to be committed
infrawise is read-only — minimum IAM for DynamoDB is dynamodb:ListTables and dynamodb:DescribeTable

Your DB Is Still Red After Adding a Cache — Here's Why

Siddharth Pandey — Tue, 23 Jun 2026 12:21:17 +0000

You deployed a cache in front of your database three weeks ago. The DB is still running at 90% utilization. Traffic doubled last month and you're wondering if the cache is doing anything at all.

It is — just not as much as you expected, because cache hit rate is not something you configure. It emerges from two things: how much of your working set fits in memory, and how skewed your access patterns are.

Paperstack is a free system design simulator that makes this visible. Sketch an architecture, press play, watch utilization numbers and node colors update live. The demo below walks through the cache problem using it.

Hit Rate Isn't a Setting

A cache absorbs reads by serving them from memory instead of forwarding them to the database. The fraction it absorbs — hit rate — depends on one thing: whether the data a request needs is in memory.

Two variables determine that:

Working-set vs memory. If your active data is 100,000 keys and your cache holds 50,000, only half the requests can possibly hit — the rest miss and forward to the DB. Your cache isn't broken. It's undersized for the working set.

Access skew. If 80% of requests hit 10% of keys (common in social or content workloads), a much smaller cache can achieve a high hit rate because the hot keys stay warm and rarely get evicted. Paperstack models this directly via the skew parameter on the Cache node: with an LFU eviction policy, higher skew boosts hit rate beyond raw memory coverage. With LRU, skew gives no benefit — the eviction algorithm doesn't take access frequency into account, so cold pages get evicted as readily as hot ones.

This is why you can't just set hitRate: 0.9 in the config panel — Paperstack doesn't expose hit rate as a field. You set memory and workingSet; hit rate is computed. If the simulation let you enter a hit rate directly, it would be lying to you about what your architecture actually does.

The Comparative in Action

Here's the experiment worth running in Paperstack: sketch Traffic → App → Cache → Database. Run the simulation. Watch which nodes go red.

With cache memory smaller than workingSet, most reads miss and forward to the DB. The DB stays red. The cache stays green — it has throughput headroom, it's just not absorbing much.

Now increase the cache memory past the working-set size. The hit rate climbs. Fewer reads reach the DB. At some threshold the DB color shifts from red to orange to green — and a different node becomes the bottleneck. Maybe the App Server. Maybe the cache's own throughput cap.

This is what Paperstack calls the comparative: when you change one variable, the bottleneck doesn't disappear — it moves. That relocation is the lesson. Scaling the cache fixed your DB problem and revealed your next one.

The inverse is just as instructive. Remove the Cache node and rerun. Watch the DB immediately redline. This is how you build intuition for what a cache is actually doing — not by reading about hit rates, but by watching the utilization delta before and after.

When Write Policy Matters

Paperstack exposes three write patterns on the Cache node: Cache-aside, Write-through, and Write-behind. The choice affects both latency and what happens when you kill the cache node.

Cache-aside (the default) separates reads from writes entirely. Reads check the cache first; misses go to the DB. Writes bypass the cache and go directly to the DB. The cache is populated on read-miss, not on write. Kill the cache: reads start missing entirely, DB load spikes, but the write path was already going to DB — no disruption there.

Write-through keeps the cache and DB in sync on every write. Writes pay the cache's latency and the DB's latency on the write path, making writes more expensive than cache-aside. Kill the cache: reads fall through to DB, but every write was already reaching the DB, so nothing is lost.

Write-behind is where the kill scenario gets interesting. In this mode, the cache absorbs writes entirely — they never reach the DB during normal operation. Only read-misses reach the DB. The DB is effectively shielded from the write load.

Kill the cache node in write-behind mode: Paperstack's passThroughOnKill behavior makes the cache transparent — all traffic falls straight through. The DB suddenly receives the write workload that was never reaching it before. If the DB was sized assuming writes were handled by the cache, it may not have the writeCap headroom to absorb the sudden change. The simulation shows this directly as DB utilization spiking and requests dropping.

This failure mode is invisible on a static architecture diagram. The diagram shows cache → DB regardless of write policy. The simulation shows what breaks.

Conclusion

The cache-doesn't-help problem is usually a mismatch between memory and working set, not a configuration error. Once hit rate is computed from real inputs rather than typed in, the DB utilization behavior makes sense.

Paperstack makes the relationship between working-set size, cache memory, write policy, and DB utilization visible without deploying anything. Sketch the architecture, tune the numbers, kill nodes, and watch the bottleneck move. When the DB finally turns green, you know exactly why.

Try it at Paperstack — it runs in the browser, no account needed.

Key Takeaways

Cache hit rate emerges from memory vs workingSet and access skew — it's computed, not configured. Undersizing cache memory caps hit rate regardless of traffic volume.
The comparative (change one variable, watch the bottleneck move) is how you build cache intuition: adding cache makes the DB green, revealing the next bottleneck.
LFU eviction benefits high-skew workloads (popular keys stay warm); LRU does not — skew only matters with the right eviction policy.
Write-behind shields the DB from writes during normal operation; kill the cache and the DB suddenly receives the write load it was never sized for.
Write-through and Cache-aside are safe to kill (writes were already reaching DB); write-behind changes the DB's workload profile on failure.

Why Your Reranker Isn't Helping Your RAG Pipeline (And How to Prove It)

Siddharth Pandey — Sun, 21 Jun 2026 07:57:40 +0000

You add a cross-encoder reranker to your RAG pipeline, measure answer quality on a test set, see a marginal improvement on 3 of 8 questions, and ship it. Six weeks later your p99 retrieval latency has climbed 200ms per query and you're paying Cohere API costs on every call. Nobody has revisited the decision because there's no data to revisit. The reranker is in the pipeline now. It probably helps.

That "probably" is the problem. RAGScope · npm gives you a per-query metric that tells you exactly whether your reranker is earning its cost — or actively making things worse.

What "Reranker Gain" Actually Measures

When your RAG app runs a query, it emits OpenTelemetry spans: a retrieval span carrying chunk IDs, scores, and content; an optional reranking span; and an LLM span containing the full prompt text. RAGScope receives these via OTLP on port 4321 and analyzes the full trace end-to-end.

The rerank-gain metric answers one question: did the reranker pull the chunks the LLM actually used toward the top of the list? RAGScope compares each chunk's retrieval rank (its position before reranking) against its reranked rank (its position after), then measures the average rank improvement of the chunks that ended up in the LLM's prompt. A chunk that was retrieved 8th but reranked to 2nd and appeared in the prompt counts as a large positive gain. A chunk that was retrieved 2nd, reranked to 9th, and got dropped from the prompt counts as a loss.

The metric only appears in the score when the trace contains a reranker span. When it does, the weights renormalize automatically — precision drops from 40% to 35%, efficiency from 30% to 25%, and rerank-gain takes a 15% slice alongside uniqueness at 15% and coverage at 10%. Traces without a reranker span score exactly as before, so you can compare directly.

Reading the Signal — What Good and Bad Look Like

A reranker earning its cost looks like this:

│  ✓  rerank-gain  88  █████████░  used chunks promoted avg +3.0 ranks

The chunks the LLM actually used were promoted an average of 3 positions by the reranker. That means the reranker is doing its job: surfacing the relevant material higher so it reaches the prompt and lands near the edges where the LLM attends most.

A reranker not earning its cost:

│  ✗  rerank-gain  25  ███░░░░░░░  used chunks demoted avg -2.0 ranks
│  → Reranker is not surfacing the chunks the LLM actually uses

The chunks the LLM used were demoted by the reranker. They reached the prompt despite the reranker, not because of it. The reranker added latency and cost and then moved the useful material further back in the queue.

The key insight is that RAGScope measures gain on the chunks that actually appeared in the LLM's prompt — not on the full ranked list. A reranker can shuffle 10 results around impressively while consistently pushing the 3 chunks the LLM uses toward position 7, 8, and 9. That's not a reranker working; that's a reranker actively degrading retrieval for this query type.

What to Do When the Reranker Is Hurting

The first step is query segmentation. RAGScope scores every query your pipeline processes individually. Run it for a day and you'll have a distribution of rerank-gain scores broken down by query type. If your reranker earns a score of 80+ on factual lookups but consistently scores below 30 on comparison queries, you have a model-query-type mismatch, not a broken reranker.

The second step is checking your reranker's training domain. Cross-encoders trained on MS MARCO work well for web-search-style queries. If your documents are internal API docs, legal contracts, or medical literature, the reranker may be applying a relevance signal that's semantically misaligned with your content. A low rerank-gain score on a specific document type is a strong signal to evaluate a domain-specific model.

If the rerank-gain score is consistently low across query types, the simplest intervention is removing the reranker entirely and routing that latency budget into a higher TOP_K with tighter similarity thresholds. RAGScope's precision metric will tell you immediately whether that trade works: if precision improves and efficiency holds, you've recovered the latency without losing quality.

Conclusion

A reranker is not always additive. It introduces latency, API cost, and an additional failure mode on every query — and most teams have no per-query signal to determine whether it's paying for itself. Aggregate quality metrics on a test set don't expose query-level degradation.

RAGScope's rerank-gain metric gives you that signal query by query, live in your terminal as the pipeline runs. Start it with npx ragscope start, add OTLP instrumentation to your retrieval and reranker calls, and you'll know within the first few queries whether the reranker is earning its place in your pipeline.

GitHub · npm

Key Takeaways

rerank-gain measures the average rank improvement of the chunks the LLM actually used — not the full ranked list, which can mask per-query degradation.
The metric only appears when the trace contains a reranker span; weights renormalize automatically (precision 35%, efficiency 25%, rerank-gain 15%, uniqueness 15%, coverage 10%).
A reranker with consistently negative rerank-gain is demoting the chunks the LLM uses — adding cost and latency for a net-negative retrieval outcome.
Query segmentation reveals whether the reranker works for some query types but not others, pointing to model-query-type mismatch.
If rerank-gain is consistently low across query types, removing the reranker and increasing TOP_K is often a better trade — RAGScope's precision score will validate it immediately.

Fix N+1 Trigger Patterns Where Lambda Functions Hammer the Same DynamoDB Partition Key

Siddharth Pandey — Sat, 20 Jun 2026 18:15:52 +0000

You add a sixth Lambda trigger to your OrderEvents table, deploy it, and within 20 minutes your SLA dashboard goes red. Latency on order writes jumps from 4ms to 40ms. The function itself is fine. The table is fine. The problem is that five other Lambdas are already hitting the same partition key on every write, and you just made it six. DynamoDB's internal partition throttling doesn't care that each function looks clean in isolation.

This is an N+1 trigger problem, and your AI coding assistant cannot catch it. Not because it lacks intelligence, but because the fact that five Lambdas already target that table lives in your AWS account and your full codebase — not in the file your assistant has open.

Infrawise · npm

Why the LLM Can't See the Pattern

When you ask Claude to write a new order processing Lambda, it reads the file you have open and generates code that looks correct — because in the context of that one file, it is correct. It doesn't know about ProcessRefundsLambda, NotifyFulfillmentLambda, SyncInventoryLambda, UpdateAnalyticsLambda, and AuditTrailLambda, all of which you wrote in previous sprints and which all write to the Orders table.

This is a category of failure that model quality doesn't fix. A better model produces a more fluent explanation for why your latency spiked. The fact that five functions converge on the same table is a lookup, not a prediction. The source of truth is a combination of your code (which functions exist) and your infrastructure (what they access).

Infrawise draws that boundary explicitly. It extracts the answer from your code using AST parsing and from your infrastructure using API calls, then hands that graph to the model as structured context — it never generates the answer.

How Infrawise Traces Trigger Chains to the Same Table

When Infrawise scans your repository, it uses ts-morph to walk every CallExpression in every source file. It's not searching for the string "DynamoDB" — it matches call structure against a known set of SDK patterns in a DYNAMO_OPERATIONS set: both v2 method names (getItem, query, putItem, updateItem, deleteItem, batchWriteItem) and v3 command classes (QueryCommand, PutItemCommand, UpdateItemCommand, DeleteItemCommand). Each matched call becomes an extracted operation: this function performs this operation against this table.

That list feeds into a SystemGraph. Nodes represent tables, functions, indexes, queues, and topics. Edges represent query, scan, and write relationships. The graph is what makes the N+1 pattern visible: not just "six functions exist" and "a table exists," but "six functions all write to Orders with no distribution across paths."

The HotPartitionAnalyzer walks the graph and fires when a table receives five or more distinct access edges from separate code paths. The threshold is configurable per-table via hotPartitionThresholds in infrawise.yaml — Issue #57 resolved false positives on high fan-in systems by making this a per-table setting rather than a single global value. A finding looks like:

Medium severity
Potential hot partition detected on DynamoDB table "Orders"
  Table "Orders" is accessed by 6 distinct code paths, which may create
  hot partition issues at scale. High access concentration on the same
  partition key can throttle requests.
  Recommendation: Consider adding a random suffix or timestamp to partition
  keys (write sharding). Use DynamoDB DAX for read-heavy workloads.

This runs deterministically. Feed it the same graph, get the same findings. There's no sampling temperature involved.

The infrawise check --fail-on medium command gates CI on this finding. Since HotPartitionAnalyzer emits medium severity, you need --fail-on medium (the default --fail-on high won't catch it). When violations are found, infrawise check exits with code 1 — your build fails before the sixth Lambda merges, and the engineer who wrote it sees the finding in the PR, not on a latency dashboard at 11pm.

Fixing It — Restructuring the Key or Sharding the Access Pattern

Once Infrawise surfaces the pattern, you have two practical options.

Write sharding adds a random suffix to the partition key — distributing writes across logical partitions. Reads require scatter-gather or a deterministic suffix derived from the order ID. This is the right choice when all six functions are pure writers and reads are handled by a separate query path.

Access pattern separation restructures which functions need direct table access at all. If SyncInventoryLambda and UpdateAnalyticsLambda are consuming state that flows through the Orders table, they shouldn't write to it directly — they should react to a DynamoDB stream and write to their own tables. The fan-in often exists because multiple services treat the same source-of-truth table as a synchronization point when they should be downstream consumers.

The analyze_function tool helps here. Point it at any function and it traces the full access path: which tables the function reads and writes, which indexes it uses, what event shapes trigger it, and what queues or topics it publishes to. That trace makes it clear which functions can be moved to stream consumption and which genuinely need direct write access.

Conclusion

The N+1 trigger problem is invisible to any tool that works only from your open files. It's not a reasoning failure — no amount of context about a single Lambda reveals that five others already saturate the same table. That fact lives in the intersection of your code and your infrastructure.

Infrawise puts that intersection in a graph, runs deterministic analyzers over it, and surfaces the finding before it becomes a production incident. The model's job is to decide what to do — restructure the key, introduce a stream, separate the access pattern. The detection is never generated; it's extracted.

If your AI assistant is writing Lambda functions against DynamoDB, give it the access graph first: GitHub · npm.

Key Takeaways

A hot partition problem requires knowing how many code paths hit the same table — that fact lives in your AWS account and your full codebase, not in the file your AI assistant has open.
Infrawise's HotPartitionAnalyzer counts distinct code paths hitting each DynamoDB table and fires at a configurable threshold, with per-table overrides via hotPartitionThresholds in infrawise.yaml.
Hot partition findings emit medium severity; use infrawise check --fail-on medium to gate CI builds on them (the default --fail-on high won't catch them).
analyze_function traces the full access path for any function — tables, indexes, event shapes, queues — making it easy to separate writers from downstream consumers.
Write sharding and event-stream separation are the two practical fixes; which one to pick depends on whether converging functions genuinely need to write or are just consuming state.

Stop Paying For Retrieval Latency On Chunks You Never Use In The Prompt

Siddharth Pandey — Tue, 16 Jun 2026 08:12:31 +0000

Your pipeline fetched 10 chunks. Your LLM saw 3.

You set TOP_K=10 on your vector store. Ten candidate chunks means more signal for the model — that's the logic. Then you run npx ragscope and the audit prints:

  WARN   51/100  █████░░░░░  my-rag-service
  │  "what are the pricing tiers?"
  │
  │  ✗  precision     30  ███░░░░░░░  3/10 chunks used
  │  ✗  efficiency    30  ███░░░░░░░  70% tokens wasted
  │  ✓  uniqueness   100  ██████████  chunks are distinct
  │  ✓  coverage     100  ██████████  all chunks scored
  │
  │  → Reduce TOP_K 10→3 (only 3 chunks reached LLM)
  │  → 70% of retrieved tokens never reached the LLM
  │

Three of ten chunks made it into the final prompt. You paid for ten round-trips to the vector store and seven went nowhere. The other seven were fetched, scored, and discarded somewhere between your retrieval step and your prompt assembly code.

This is the gap ragscope (npm) was built to surface.

Where the Chunks Disappear

Retrieval and prompt assembly are separate steps, but most teams treat them as one. The gap is where the waste lives.

A typical RAG pipeline:

Embed the query
Fetch TOP_K chunks from the vector store
(Optional) Rerank
Assemble the prompt — filter by score threshold, truncate to fit the context window
Send to the LLM

Step 4 is where chunks disappear. Post-retrieval filtering — a score threshold, a hard token budget, a deduplication pass — silently drops chunks you already paid to retrieve. If your prompt assembly filters out anything below a certain confidence score and your vector store returns six chunks that don't clear it, those six fetches were wasted. The network round-trip still happened. The latency still accumulated.

The problem compounds over time. TOP_K=10 gets set as a safe default, the pipeline ships, and the setting never gets revisited. LLM eval scores look fine because the three chunks that do reach the prompt are the right ones. The waste is invisible in your evals — it only shows up in latency and cost.

Vector stores typically scale retrieval latency with TOP_K. Fetching ten results takes measurably longer than fetching three, especially at tail latencies. When seven of those ten are discarded before the prompt, you're paying that latency premium on every query for nothing.

How ragscope Measures `precision`

ragscope runs as a local server (default port 4321) and receives spans from your pipeline via OTLP at http://localhost:4321/v1/traces. No changes to your RAG code needed — configure your OpenTelemetry exporter to point there.

When both a retrieval span and an LLM span arrive for the same trace, ragscope compares them. The key field is inContext on each RagChunk. It inspects the full text of the LLM span's assembled prompt and checks whether each retrieved chunk's content appears in it — positionally, by string match. A chunk either appears in the prompt or it doesn't. No LLM, no heuristics, no sampling.

precision starts as:

base = (chunks where inContext is true / total retrieved chunks) × 100

There's one additional penalty: if high-retrieval-rank chunks land in the middle of a long context window — the zone where LLM attention typically falls off — ragscope subtracts 12 points per buried chunk, capped at 36. This is the lost-in-the-middle detection. A pipeline where 3/10 chunks are used but two of those three are buried mid-context will score lower than one where the same 3 chunks sit at the prompt edges.

When the base falls below 60, ragscope generates a specific recommendation with the exact numbers: Reduce TOP_K 10→3 (only 3 chunks reached LLM). It prints this directly below the score bars, in the terminal, at the time the trace arrives — not in a cloud dashboard after a deploy.

efficiency is a related metric: the fraction of retrieved tokens that reached the LLM. If precision is 30 and your chunks are roughly uniform in size, efficiency will also be around 30 — meaning 70% of the tokens you transferred from the vector store never appeared in a prompt. That shows up in retrieval latency and, depending on your pipeline, in processing time downstream.

Tuning TOP_K Based on the Audit

Once you have precision data, tuning is mechanical.

Follow the recommendation. ragscope prints Reduce TOP_K 10→3 because Math.max(used_chunks, 3) gives you the minimum viable retrieval count with a floor of 3. Start there and re-run the audit against a few real queries.

Check efficiency alongside. Low efficiency paired with low precision means your unused chunks are large — a chunking strategy problem as much as a TOP_K problem. If efficiency is high but precision is low, your unused chunks are small and the token cost is modest; fixing TOP_K will mostly recover the latency.

Check uniqueness too. If the chunks that do reach your prompt include near-duplicates, the uniqueness metric will flag them. Two near-duplicate chunks in the prompt means one is redundant regardless of your TOP_K setting — that's a deduplication-at-ingest problem, not a retrieval count problem. ragscope computes overlap between chunk pairs and surfaces high-overlap counts in the audit output.

The typical path after a low-precision audit:

Audit shows 3/10 chunks used — follow the Reduce TOP_K 10→3 recommendation
Re-run against real queries — precision should move above 75 (PASS)
If uniqueness also flagged duplicates, fix chunking and re-run
If efficiency is still low after fixing TOP_K, chunks may be too large for the token budget

Going from TOP_K=10 to TOP_K=3 on a pipeline that was only ever using 3 chunks means 70% fewer vector store round-trips on every query. No model changes, no prompt rewriting, no reranker added.

The Number You Should Check First

The assumption that more retrieval means better answers is almost never validated against what the model actually sees. TOP_K defaults get set once and forgotten. Eval scores stay flat because the chunks that do reach the LLM are the right ones — the waste doesn't affect quality, just cost and latency.

npx ragscope gives you precision, efficiency, and uniqueness in the time it takes to run a dev command. If you haven't checked what fraction of your retrieved chunks survive to the prompt, that's the number to look at first.

GitHub · npm

Key Takeaways

precision measures what fraction of retrieved chunks appear in the final LLM prompt — anything below 60/100 triggers a Reduce TOP_K recommendation with exact numbers
Vector store round-trips typically scale with TOP_K; fetching chunks you discard is pure overhead with no quality benefit
ragscope detects in-prompt chunk presence by matching content against the OTLP LLM span's prompt text — deterministic, no LLM needed
Low efficiency paired with low precision points to a chunking strategy problem, not just a TOP_K problem
Low uniqueness means near-duplicate chunks in the prompt — fix at ingest, not by adjusting retrieval count

Why Infrawise Uses Deterministic Analysis Instead of an LLM

Siddharth Pandey — Sat, 13 Jun 2026 10:58:44 +0000

Ask your AI coding assistant which Global Secondary Indexes exist on your Orders table. It will read your repository, find a few QueryCommand calls, and answer — fluent, specific, and confident. It also has no way to know. GSI definitions live in AWS, not in your source files. The model isn't lying; the fact simply isn't available to it, so it generates the most statistically plausible substitute and delivers it in the same tone it uses for things it actually knows.

That failure mode is why Infrawise (npm) — an MCP server that gives AI coding assistants infrastructure context — contains no LLM calls at all. Every answer it serves comes from AST parsing, schema introspection, rule-based analyzers, and graph correlation. The LLM is only ever a consumer of that context, never a producer of it. This post is about why that boundary exists, and what it looks like in code.

Infrastructure questions are lookups, not generation

There are two kinds of questions you can ask a tool. "How should I model sessions in DynamoDB?" is a judgment question — many defensible answers, context matters, an LLM is genuinely useful. "Does the Sessions table have a GSI on userId?" is a fact question. It has exactly one correct answer, and that answer is sitting in a DescribeTable response.

When you route a fact question through a generative model, you convert a lookup with a perfectly accurate source into a prediction with an unknown error rate. The motivating examples in the Infrawise README are all of this shape: an assistant suggesting a .scan() on an Orders table with 50 million rows, recommending a GSI on status that already exists, or not noticing that five functions are already hammering the same partition key. None of these are reasoning failures. They are missing-fact failures, and no amount of model quality fixes them — a better model just produces a more convincing wrong answer.

So Infrawise draws a hard line: facts get extracted deterministically, and the model receives them through MCP tool calls instead of guessing.

What deterministic extraction looks like

Infrawise builds its picture of your system from three sources, none of which involve a model.

Your code, through the compiler's eyes. scanRepository() in src/context/index.ts loads the repo with ts-morph — using your own tsconfig.json when one exists — and walks every CallExpression node in every source file. It doesn't regex for the word "scan". It matches call structure against known client patterns: a DYNAMO_OPERATIONS set covering both SDK v2 method names (query, scan, getItem) and SDK v3 command classes (ScanCommand, QueryCommand, PutItemCommand), query/execute/exec calls on PostgreSQL and MySQL clients, and MongoDB collection methods — where find and aggregate are classified as scan-type operations and the rest as queries. The output is a list of extracted operations: this function performs this operation type against this table.

Your databases, through their own catalogs. The PostgreSQL adapter doesn't ask a model to summarize your schema. It runs the same introspection queries you would run by hand — information_schema.tables for tables, information_schema.columns for columns, pg_indexes for indexes, and the constraint tables for keys. The docs recommend pointing it at a dedicated read-only user, and the DynamoDB side needs only dynamodb:ListTables and dynamodb:DescribeTable permissions. What comes back isn't a description of your schema; it is your schema.

Correlation, through a graph. Both streams land in a SystemGraph: typed nodes for tables, functions, indexes, queues, topics, lambdas, buckets, secrets, parameters, and log groups, connected by typed edges like query, scan, and uses_index. The graph is what turns two boring fact lists into something an analyzer can interrogate — not just "this table exists" and "this function scans something," but "listAllOrders() scans the Orders table, and no index covers that access."

Rules, not vibes

The analysis layer is where most tools would reach for a model — and where Infrawise stays deterministic. The analyzer index exports 27 rule classes covering DynamoDB, PostgreSQL, MySQL, MongoDB, SQS, S3, Lambda, RDS, secrets, log retention, and Terraform drift. Each one is an ordinary class with an analyze(graph) method that walks the graph and emits findings.

FullTableScanAnalyzer follows scan-type edges to DynamoDB table nodes and emits a high-severity finding naming the table and every calling function. MissingGSIAnalyzer flags tables that receive query edges but have no uses_index edge — medium severity, because it might be intentional. HotPartitionAnalyzer fires when a table is accessed by five or more distinct code paths (the threshold is a constructor parameter, defaulting to 5).

Two properties fall out of this design that a model can't give you:

Findings are testable. Every analyzer is a pure function of the graph. Feed it a fixture, assert on the output, done. There's no eval harness, no sampling temperature, no "run it three times and hope." If FullTableScanAnalyzer regresses, a unit test catches it.

Failures are contained and honest. runAllAnalyzers() wraps each analyzer in its own try/catch — one analyzer crashing logs a warning while the rest keep running. The combined findings are then sorted by a fixed severity order: high, medium, low, and notably verify — a severity that exists precisely so a deterministic system can say "I detected a pattern but can't confirm the intent" instead of bluffing. An LLM has no equivalent of verify; everything it says arrives with the same confident fluency.

The LLM is the consumer, not the analyst

None of this means LLMs are useless here. It means they belong at a specific layer. Infrawise exposes the graph and findings through 15 MCP tools: get_infra_overview for a quick snapshot, analyze_function to trace a single function's tables, queues, secrets, and trigger event shapes, suggest_gsi to generate a ready-to-use GSI definition for a table and attribute, postgres_index_suggestions for index advice, and so on. The assistant decides when to ask and what to do with the answer. It never produces the answer.

The plumbing is deliberately boring: analysis results are cached as JSON files under .infrawise/cache, and the infrawise stdio process your editor spawns re-runs the analysis when the cache is older than 24 hours. Run infrawise start --claude once and it writes .mcp.json so Claude Code reconnects automatically on every future launch.

This division of labor generalizes well beyond one project. The model handles intent ("the user wants this query to be cheaper") and synthesis ("given these findings, here's the migration plan"). The deterministic layer handles every claim that has a ground truth. The test is simple: if asking the same question twice should yield the same answer, don't generate the answer — look it up.

If your AI assistant writes code against AWS or a database, give it facts instead of letting it guess: GitHub · npm.

Key takeaways

A fact question routed through a generative model turns a lookup with a perfect source into a prediction with an unknown error rate. Route facts around the model, not through it.
AST-level extraction (ts-morph walking CallExpression nodes) catches what schema introspection alone can't see — which function scans which table, and how.
Rule-based analyzers are unit-testable and fail loudly per rule; model-based analysis is neither.
A deterministic system can emit a verify severity when it isn't sure. A model can't reliably tell you when it's guessing.
Put the LLM at the boundary: it consumes structured facts over MCP and decides what to do next — it never gets to invent the facts.

Give Your AI Assistant Infrastructure Eyes Before It Writes Another Query

Siddharth Pandey — Tue, 09 Jun 2026 18:18:02 +0000

You asked Claude Code to add pagination to your order history endpoint. It generated a clean function — listOrdersByUser() — using a DynamoDB Scan with a Limit parameter. It compiled. Tests passed. You shipped it.

Three days later your AWS bill had a line item you didn't recognize: 47 million read capacity units consumed in 72 hours. The Orders table has 50M rows. Scan reads every one of them regardless of Limit — Limit only controls how many results come back, not how many items DynamoDB reads.

Claude Code didn't know your table had 50M rows. It didn't know you had a GSI on userId. It guessed, and the guess was expensive.

infrawise · npm

What AI Assistants Don't Know About Your Infrastructure

AI coding assistants read your source files. They understand function signatures, TypeScript types, and import chains. What they cannot see is the infrastructure those functions run against.

When Claude Code looks at a file that calls dynamoClient.scan({ TableName: "Orders" }), it has no idea that:

The Orders table has 50M items
There is already a GSI named userId-index on the userId attribute
Three other functions are already using Query against that same GSI
The Sessions table is accessed by 6 separate code paths, making it a hot partition candidate

Without that context, the assistant fills the gap with generic patterns. It recommends Scan because it has no reason not to. It suggests adding a GSI on status because it doesn't know one exists. It writes SELECT * because it has no idea which columns are expensive to pull.

This isn't a bug in the model. It's a missing input. The model was never given your infrastructure.

What Happens When infrawise Is in the Loop

infrawise statically analyzes your codebase, your DynamoDB tables, and your PostgreSQL schemas, then exposes that context to your editor through MCP. Claude Code gets 15 tools that answer questions like: which tables exist, what are their partition keys and sort keys, which GSIs are already defined, which functions are already scanning, and which patterns are flagged as high severity.

The difference in output is concrete. Here's what infrawise surfaces before any code gets written:

Findings  3 total

  1.  HIGH   Full table scan detected on DynamoDB table "Orders"
             listAllOrders() scans without any filter — reads every item in the table.
             → Replace Scan with Query using a partition key or add a GSI.

  2.  MED    PostgreSQL table "users" has no index on column "email"
             Filtering on "email" causes sequential scans.
             → CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

  3.  MED    DynamoDB table "Sessions" accessed by 6 distinct code paths
             Hot partition risk — multiple functions hammer the same key.
             → Review access patterns and consider partition key design.

When Claude Code has this context, its suggestions change. It knows userId-index exists and recommends Query against it instead of Scan. It knows the email column has no index and includes the exact CREATE INDEX CONCURRENTLY statement rather than a generic suggestion. It knows which functions are already hitting a partition hard before it adds another one.

The recommendations become specific to your actual tables, not generic advice copied from documentation.

infrawise does none of this with an LLM. The analysis is entirely deterministic: TypeScript AST parsing via ts-morph for the code graph, schema introspection for the database layer, rule-based analyzers for pattern detection, and graph correlation to connect code paths to tables. No model is involved in the analysis — models are only consumers of the output.

Wiring It Up — infrawise start --claude

npm install -g infrawise
cd your-project
infrawise start --claude

On first run, infrawise asks a few questions and generates infrawise.yaml. It then scans your AWS services, databases, and codebase, writes .mcp.json so your editor auto-connects, and opens Claude Code with all 15 MCP tools ready.

Every session after that:

claude

No infrawise command needed. The editor manages the MCP connection. Analysis is cached for 24 hours; when the cache goes stale, infrawise stdio — spawned automatically at session start — refreshes it. File changes are detected within the session and the code graph updates automatically without re-running AWS extraction.

For PostgreSQL, infrawise uses a read-only connection. Create the user with these four statements:

CREATE USER infrawise_ro WITH PASSWORD 'yourpassword';
GRANT CONNECT ON DATABASE yourdb TO infrawise_ro;
GRANT USAGE ON SCHEMA public TO infrawise_ro;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO infrawise_ro;

If you want to check findings without opening an editor:

infrawise analyze --severity high
infrawise analyze --severity high --output report.md

The --severity flag accepts high, medium, low, or verify. The --output flag saves findings as a markdown report.

Conclusion

The problem isn't that AI coding assistants write bad code. The problem is that they write code for an infrastructure they've never seen. A Scan on an empty dev table and a Scan on a 50M-row production table look identical in source — the model has no way to tell them apart unless something provides that context.

infrawise makes that context available deterministically, before the code gets written. The assistant stops guessing about your GSIs, your partition keys, and your missing indexes because it no longer needs to guess.

Try it: GitHub · npm

Key Takeaways

AI coding assistants have no knowledge of your actual infrastructure — they infer from source files and fill gaps with generic patterns
A Scan with Limit still reads every item in DynamoDB before applying the limit — the model won't know this unless it knows your table's access patterns
infrawise exposes your exact schemas, GSIs, partition keys, and flagged patterns to your editor through MCP — 15 tools Claude Code can query before writing a single line
All analysis is deterministic: TypeScript AST parsing, schema introspection, rule-based detection — no LLM in the analysis path
Setup is one command: infrawise start --claude generates config, writes .mcp.json, and opens your editor with full context ready

DEV Community: Siddharth Pandey

The Right Way to Start Claude Code on an AWS Project

Section 1: One command, four steps

Section 2: Day two — you never type infrawise again

Section 3: Not a Claude Code shop? Same command, different flag

Section 4: What the assistant actually does with it

Conclusion

Key Takeaways

Why SNS Silently Drops Your Messages and How to Catch It Before You Ship

The contract nobody wrote down

Why your AI assistant writes the broken version

Pulling the contract into the coding session

Conclusion

Key Takeaways

Stop Sending Your AI Assistant 40 Tables When It Only Needs 3

The two bad defaults

How the lookup actually works

What this looks like on a real task

Why this matters beyond the token count

Key Takeaways

Stop Pasting Your Database Schema Into Every AI Prompt

The schema dump is the problem, not the model

Inventory first, schema on demand

Foreign keys are the context your agent actually needed

The same flow works for NoSQL

Wiring it into an agent

Conclusion

Key Takeaways

Your SQS Queue Is Redelivering Messages Your Lambda Is Still Processing

Why you never catch this one yourself

How infrawise detects the mismatch

The fix — and keeping it fixed when AI writes your infra

Key takeaways

Block AI-Generated Infrastructure Mistakes in CI Before They Hit Production

The Gap Between Code Review and Infrastructure Reality

infrawise check — a Build Step That Knows Your Tables

Writing infrawise.yaml for CI

What Blocks, What Warns, What Can Wait

Conclusion

Key Takeaways

Your DB Is Still Red After Adding a Cache — Here's Why

Hit Rate Isn't a Setting

The Comparative in Action

When Write Policy Matters

Conclusion

Key Takeaways

Why Your Reranker Isn't Helping Your RAG Pipeline (And How to Prove It)

What "Reranker Gain" Actually Measures

Reading the Signal — What Good and Bad Look Like

What to Do When the Reranker Is Hurting

Conclusion

Key Takeaways

Fix N+1 Trigger Patterns Where Lambda Functions Hammer the Same DynamoDB Partition Key

Why the LLM Can't See the Pattern

How Infrawise Traces Trigger Chains to the Same Table

Fixing It — Restructuring the Key or Sharding the Access Pattern

Conclusion

Key Takeaways

Stop Paying For Retrieval Latency On Chunks You Never Use In The Prompt

Your pipeline fetched 10 chunks. Your LLM saw 3.

Where the Chunks Disappear

How ragscope Measures precision

Tuning TOP_K Based on the Audit

The Number You Should Check First

Key Takeaways

Why Infrawise Uses Deterministic Analysis Instead of an LLM

Infrastructure questions are lookups, not generation

What deterministic extraction looks like

Rules, not vibes

The LLM is the consumer, not the analyst

Key takeaways

Give Your AI Assistant Infrastructure Eyes Before It Writes Another Query

What AI Assistants Don't Know About Your Infrastructure

What Happens When infrawise Is in the Loop

Wiring It Up — infrawise start --claude

Conclusion

Key Takeaways

Section 2: Day two — you never type `infrawise` again

How ragscope Measures `precision`