Audit your AI agent's blind spots in 4 commands

#ruby #rails #ai #tooling

Don't take my word for any of it. Run this on a repo you know cold and read the diff yourself. Four commands, about five minutes, and you'll know exactly what your coding agent can and can't see in your own codebase.

No theory below this line. Just the steps.

Pick the model first

Open the app you know best. Pick the one model you'd schedule a careful afternoon to refactor, the hub half the app leans on. Inbox, Order, Account, Booking, MergeRequest. The one nobody volunteers to touch on a Friday because nobody's sure what depends on it.

That's your test subject. Hold it in mind for all four steps.

Step 1, ask your agent cold

In your normal agent (Claude Code, Cursor, Copilot, whatever you drive), ask the structural question with no tools beyond what it has today:

before I change how the Inbox model is torn down,
find every place in the codebase that depends on it.

Swap Inbox for your model. Then watch how it answers, not just what it says. It'll grep a token, open what it lands in, sample a few neighbors, and rebuild the relationships in its head. You'll see a chain that looks like this:

grep -rin "inbox" app/ lib/        # hundreds of hits, most irrelevant
grep -rinE "belongs_to :inbox|has_many :inboxes"   # the named associations, found fast
grep -rin "inbox_id"               # and now the guessing starts

Save the answer it gives you. Count the dependents it lists. It'll read as complete, confident, well-structured. That feeling is the thing we're about to test.

Step 2, install the map

curl -fsSL https://luuuc.github.io/sense/install.sh | sh

One binary. It builds a local index of your codebase's call-and-dependency graph and serves it to your agent over MCP, the same wire your editor already uses to reach the model. Nothing leaves the machine. It never edits a line.

Step 3, scan the repo

sense scan

In the root of your app. This is the one-time build. When it finishes it'll print the shape of your codebase, files, symbols, edges between them, the graph your agent was trying to reconstruct from greps a moment ago. On a large app that's a number no human holds in their head, which is the whole reason the cold run missed.

Step 4, connect your agent and ask again

sense setup

This wires the structural tools into your agent so it reaches for them on its own. (That last part matters, an agent that never calls the map scores like one that never had it.)

Now ask the identical question from Step 1 again. This time the agent makes one structural call instead of a hundred greps. Something like:

> sense_blast Inbox
Inbox  (app/models/inbox.rb)
  110 symbols in blast radius
  app/services/...    app/workers/...    lib/...

It gets the resolved dependent set back in one shot and spends its budget reading and pinning each file instead of hunting for it.

Now diff the two answers

Put the cold audit from Step 1 next to the mapped one. The gap between them is the answer.

It won't be the obvious associations, the agent found those cold, the named ones grep catches. It'll be the dependents that reach your model sideways, through a concern, a service, a worker that loads it by id three calls deep, a config-string registry. The ones with no shared token to search for. On the benchmark that produced this experiment, a frontier agent found 2 of 11 of those cold on one well-built Rails app and 11 of 11 with the map. Your numbers will be your own. The shape will be familiar.

Two things to notice in your own diff:

→ The cold agent almost certainly didn't invent anything. Everything it listed was real. Its failure is omission, not invention. Nothing in the output warns you it stopped early.

→ If your repo is small and colocated, the diff might be tiny or zero. That's not a failed experiment. That's a real, trustworthy answer: your agent can already see your whole structure, and you can stop wondering. The gap opens on the big apps, which is exactly where you couldn't afford it.

What you just proved

The map didn't make your model smarter. It gave it something to read that was never in any single file, the edges between them. And it'll keep doing that across model upgrades, because it's built from your code at this commit (which no model memorized), it re-indexes as your code changes, and it serves whatever agent you point at it. It's not a bet on today's model. It's the structure your agent was missing under any model.

The diff you just ran is the part of your codebase your agent has been quietly guessing at. Now you've seen the number.

The benchmark behind this, the methodology, the raw data for thirteen real repos.

Disclosure: I build the map (Sense). It's all open, so check it instead of trusting me. But the only check that matters here is the one you just ran on your own code.