DEV Community

Cover image for Catching agent repo drift before evals
Amanda
Amanda

Posted on

Catching agent repo drift before evals

After covering basic linting checks in my previous post, there is another layer worth adding before the more costly behavioral evals.

You can catch repo drift and convention violations with deterministic checks before paying for slower behavioral eval runs.

Reference integrity check

One place to start is reference integrity, for example:

  • referenced code still exists
  • referenced code still contains real implementation
# reference integrity
if [ -n "$resolved_ref" ]; then
  case "$ref" in
    *.ts|*.js|*.tsx|*.jsx)
      if rg -q '^\s*(export\s+)?(async\s+)?(function|class|interface|type|const|let|var|enum)\s' "$resolved_ref"; then
        log_pass "$relctx -> $ref exists and has declarations"
      else
        log_error "$relctx -> $ref exists but has no clear declarations"
      fi
      ;;
  esac
else
  log_warn "$relctx references '$ref' which no longer exists"
fi
Enter fullscreen mode Exit fullscreen mode

This is useful because files can still exist after a refactor, but stop being useful references for your agent.

Architecture drift checks

Agents tend to crawl directory structure to infer where new code should be placed. If the architecture rules drift, the agent may generate code in the wrong location.

# architecture rule in this repo: routes should live in src/routes
if [ -d "$REPO_ROOT/src/routes" ]; then
  log_pass "src/routes exists"
else
  log_fail "Missing src/routes"
fi

# architecture rule:  import from ./routes, not ./route
if rg -n "from ['\"]\\./routes/" "$REPO_ROOT/src/app.ts" >/dev/null 2>&1; then
  log_pass "app.ts imports routes from ./routes"
else
  log_fail "app.ts route imports do not match ./routes architecture"
fi
Enter fullscreen mode Exit fullscreen mode

Instruction drift checks

This example check is looking for contradictory guidance which can confuse agents depending on prompt interpretation.

# ensure AGENTS.md route guidance matches repo structure
if rg -n "src/routes" "$REPO_ROOT/AGENTS.md" >/dev/null 2>&1 \
  && [ ! -d "$REPO_ROOT/src/routes" ]; then
  log_warn "AGENTS.md references src/routes but the directory no longer exists"
else
  log_pass "Route guidance matches repo structure"
fi
Enter fullscreen mode Exit fullscreen mode

Deterministic Anti-Pattern Checks

I think we've all seen coding agents clearly favor certain frameworks based on training data. You can use these types of checks to enforce project conventions especially in cases of common violations.

Some examples of patterns could be:

  • no raw try/catch in service business logic
  • no NestJS decorators in an Express codebase
  • no chai in tests where Jest is the standard

This check will obviously vary greatly by project, but here are some example code snippets.

# no try/catch in services
if rg -n 'try\s*\{' "$REPO_ROOT/src/services/" >/dev/null 2>&1; then
  log_fail "Service files contain try/catch blocks (should use Result<T>)"
else
  log_pass "No try/catch in service files"
fi

# no chai in tests
if rg -n "from ['\"]chai['\"]" "$REPO_ROOT/src/test/" >/dev/null 2>&1; then
  log_fail "Test files import chai, should use Jest"
else
  log_pass "Test files don't use chai"
fi

# no NestJS decorators in routes/services
if rg -n '@(Controller|Get|Post|Injectable)' "$REPO_ROOT/src/routes/" "$REPO_ROOT/src/services/" >/dev/null 2>&1; then
  log_fail "Found NestJS decorators in Express layers"
else
  log_pass "No NestJS decorator drift"
fi
Enter fullscreen mode Exit fullscreen mode

These checks enforce repo contracts before you run behavioral LLM-as-judge evals.

When to add an eval platform

This post is intentionally script-first and focused on things that can be checked deterministically in your environment.

Capability drift is different and needed to measure if the agent is getting better or worse over time. This gets into behavioral evals and deserves its own exploration.

Regardless, keep your more deterministic repo checks. They are a low lift and valuable!

I'd love to hear more about how you are approaching checks and evals in your projects. Leave me a comment below.

Top comments (0)