DevGab

Posted on Mar 20

How AI Agents Verify Cross-Service Integrations: The 4-Step Verification Pattern

#ai #architecture #codereview #devops

The Bug Code Review Often Misses

Imagine implementing a feature perfectly — clean code, passing tests, approved PR. Now imagine that feature is completely unreachable because the service that was supposed to call it never did. No error. No failing test. Just silent, dead code sitting in production.

This is the class of bug that lives at integration boundaries — between repositories, services, teams, async producers and consumers, or config layers. It's largely invisible to traditional code review because reviewers usually operate closer to the PR boundary than the architectural boundary. The failure often isn't obvious in any one file — it's in the missing link between pieces of code that each look correct in isolation.

What I want to dig into here is a structured verification pattern I've found effective for AI-assisted audits of these gaps. You can apply the same logic manually or build tooling around it.

Why Grep Isn't Enough (But You Still Need It)

When auditing integration boundaries, one naive approach is to grep for method names across repos. Found it? Great, it exists. Didn't find it? Flag it as missing. This misses half the problem.

Existence isn't the same as reachability. A method can exist but be private. It can exist but only be called in a test context. It can exist but be conditionally gated behind a feature flag that's disabled everywhere. Verification has to go deeper than a filename match.

The 4-Step Verification Pattern

When auditing an integration boundary, a structured checklist I've found works well — whether you're prompting an AI agent or doing it manually — follows this chain:

Step 1: Confirm the Contract Exists

Start with the simplest question: does the method, endpoint, or hook actually exist in the codebase? What you're searching for depends on the boundary type:

In-process: method definition exists
HTTP/API: route and controller action exist
Async: event is emitted, consumer/worker is subscribed
Config: flag or env var is defined and wired into deployment
Data: writers and readers agree on keys and shape

# Does the method exist anywhere in this repo?
grep -r "def render_for_all_platforms" ./services/

# Does a route/action exist for it?
grep -rn "render_for_all_platforms\|render_all" ./config/routes.rb ./app/controllers/

This is table stakes. If the target doesn't exist, nothing else matters. But passing this step means almost nothing on its own.

Step 2: Check the Access Modifier

This is the step most manual reviews skip. In ordinary Ruby code, a private method can't be called with an explicit receiver — so it's a strong signal that it's not intended as a cross-boundary entry point. (Ruby being Ruby, send and metaprogramming can bypass this, but if you're relying on that for a planned integration, something has gone wrong.)

# Red flag: method is private but expected to be called from a job
class VideoRenderService
  def trigger_render(platform:)
    # public, callable — but only handles single-platform renders
  end

  private

  def render_for_all_platforms
    # intended to be called by RenderJob, but private makes it unreachable
  end
end

An agent or reviewer auditing this would flag it: the plan says render_for_all_platforms should be invoked from the job scheduler, but the access modifier prevents normal invocation. A reviewer looking only at the job scheduler PR would likely miss this.

Step 3: Find the Caller

Even if the method is public, you need to verify something actually calls it. Search across all affected repositories, not just the one where the method lives.

# Search across multiple repos (run from a parent directory)
grep -r "render_for_all_platforms" ./cms-service/
grep -r "render_for_all_platforms" ./analytics-service/
grep -r "render_for_all_platforms" ./renderer/

# If using a monorepo or shared tooling:
grep -r "render_for_all_platforms" . --include="*.rb" --include="*.ts"

Zero results across all repos when you expect a caller? That's a strong signal the feature may be unreachable. It's implemented, it works in isolation, but nothing triggers it. (Caveats: dynamic dispatch, reflection, generated clients, or external callers outside the repos you searched can produce false negatives — but zero grep hits should always trigger investigation.) This is exactly the kind of issue that slips through when PRs are reviewed in sequence by different engineers across different weeks.

Step 4: Trace the Wiring

The final step is confirming the plumbing — environment variables, feature flags, configuration values — actually connects the caller to the implementation.

# Does the calling code depend on a flag that's never set?
class RenderJob
  def perform(video_id)
    if ENV['MULTI_PLATFORM_RENDERING_ENABLED'] == 'true' # ← Is this set?
      VideoRenderService.new.render_for_all_platforms
    else
      VideoRenderService.new.render_single_platform
    end
  end
end

# Check if the env var is defined anywhere
grep -r "MULTI_PLATFORM_RENDERING_ENABLED" ./config/
grep -r "MULTI_PLATFORM_RENDERING_ENABLED" ./.env*
grep -r "MULTI_PLATFORM_RENDERING_ENABLED" ./infrastructure/

Finding the caller but not the configuration wiring means the feature is structurally present but effectively disabled. Depending on your defaults, this could mean the new code never runs — or worse, always runs when it shouldn't.

Applying This to AI Agent Design

If you're building or prompting AI agents to do this kind of audit, the pattern translates directly into a structured task definition. Here's a minimal prompt template for a phase-level audit agent:

You are auditing Phase [N]: [Phase Name].

For each integration point listed in the plan document:
1. Confirm the method/endpoint exists (exact name match)
2. Check its access modifier — is it callable by the expected caller?
3. Search all provided repositories for callers of this method
4. Verify any required configuration (env vars, flags) is wired up

Report findings as JSON with fields:
- integration_point: string
- exists: boolean
- access_modifier: "public" | "private" | "protected" | "unknown"
- callers_found: string[] (file paths)
- config_wired: boolean
- risk: "none" | "low" | "medium" | "high"
- notes: string

Asking the agent to return structured JSON means you can aggregate findings across multiple agents running in parallel on different phases, deduplicate overlapping issues, and prioritise by risk before handing off to human engineers for validation.

The Data Path Variant: JSONB and Schema Drift

The 4-step pattern works for method-level integrations, but there's a nastier variant: data contract mismatches. In systems that share a database — or where multiple code paths within one application write to the same semi-structured column — schema drift can be silent and painful.

# Service A writes metadata like this:
video.update!(metadata: { narration_url: url, timing_data: timings })

# Service B reads metadata like this:
video.metadata['tts_url'] # ← Different key, silently returns nil

For this variant, the verification pattern shifts. Instead of tracing callers, you trace writers and readers of the same data structure. Search for all write paths to the column, all read paths from the column, and map out which keys are written versus which keys are read. Mismatches are your bugs.

Running Agents in Parallel, Not Sequence

One design choice that can improve coverage: run agents for different phases simultaneously rather than one after another. Sequential auditing risks each agent's findings influencing the framing of the next, and it's just slower.

Parallel agents produce independent findings that you aggregate afterward. This catches issues that a single sequential pass might rationalise away — if one agent flags a dead method and a separate agent (with no knowledge of that finding) independently identifies that the expected caller has a bug, those two findings together paint a much clearer picture than either does alone.

Human Validation Is Non-Negotiable

It's worth being direct about this: AI agents surface candidates for issues. They're not a replacement for a human engineer confirming the finding against a live codebase.

Static search — whether by grep, an AI agent, or an AST tool — has real blind spots: dynamic dispatch, reflection, metaprogramming, framework conventions that invoke methods by naming pattern, generated clients, runtime config injection, and external callers that live outside the repos you searched. Any of these can make reachable code look unreachable, or vice versa.

Treat agent output as a prioritised investigation list, not a definitive bug report. The value is in the structured, cross-repo coverage — catching the categories of issues that would slip through — not in the infallibility of each individual finding.

Building Your Own Cross-Repo Audit

You don't need a sophisticated AI setup to apply this pattern today. Even a rough script that runs the first few verification steps across repos will surface issues. Here's an illustrative starting point (not production-grade, but enough to demonstrate the approach):

#!/usr/bin/env bash
set -euo pipefail
# Usage: ./audit_integration.sh render_for_all_platforms ./cms ./analytics ./renderer

METHOD="${1:?method name required}"
shift

printf '=== Auditing: %s ===\n' "$METHOD"

for REPO in "$@"; do
  printf '\n--- Searching in %s ---\n' "$REPO"
  # Step 1: Definitions
  printf 'Definitions:\n'
  grep -R --include='*.rb' -nE "def[[:space:]]+${METHOD}\b" "$REPO" || true
  # Step 3: References (excluding specs)
  printf 'References:\n'
  grep -R --include='*.rb' -n "${METHOD}" "$REPO" \
    | grep -vE '(_spec\.rb|/spec/)' || true
done

This is obviously a starting point — you'd want to add access modifier parsing, config checking, and structured output for aggregation. But it demonstrates the core idea: systematically checking existence and reachability across repo boundaries.

The Payoff

The methodology described here found ten real issues in one internal audit — including a fully-implemented feature that was never invoked. That's not a minor edge case catch; that's weeks of engineering work sitting invisible in production.

Integration boundary bugs are expensive because they're hard to find and easy to miss at review time. A structured, phase-level verification approach — whether run by AI agents or methodical engineers — is one of the more effective ways to catch this class of issue before it reaches users.

This article builds on a deeper exploration at devgab.com.

Have you run into integration bugs that slipped through code review because the gap lived between PRs? What's your current approach to auditing work that spans multiple repos — and would you trust an AI agent to surface those gaps?

DEV Community