After covering basic linting checks in my previous post, there is another layer worth adding before the more costly behavioral evals.
You can catch repo drift and convention violations with deterministic checks before paying for slower behavioral eval runs.
Reference integrity check
One place to start is reference integrity, for example:
- referenced code still exists
- referenced code still contains real implementation
# reference integrity
if [ -n "$resolved_ref" ]; then
case "$ref" in
*.ts|*.js|*.tsx|*.jsx)
if rg -q '^\s*(export\s+)?(async\s+)?(function|class|interface|type|const|let|var|enum)\s' "$resolved_ref"; then
log_pass "$relctx -> $ref exists and has declarations"
else
log_error "$relctx -> $ref exists but has no clear declarations"
fi
;;
esac
else
log_warn "$relctx references '$ref' which no longer exists"
fi
This is useful because files can still exist after a refactor, but stop being useful references for your agent.
Architecture drift checks
Agents tend to crawl directory structure to infer where new code should be placed. If the architecture rules drift, the agent may generate code in the wrong location.
# architecture rule in this repo: routes should live in src/routes
if [ -d "$REPO_ROOT/src/routes" ]; then
log_pass "src/routes exists"
else
log_fail "Missing src/routes"
fi
# architecture rule: import from ./routes, not ./route
if rg -n "from ['\"]\\./routes/" "$REPO_ROOT/src/app.ts" >/dev/null 2>&1; then
log_pass "app.ts imports routes from ./routes"
else
log_fail "app.ts route imports do not match ./routes architecture"
fi
Instruction drift checks
This example check is looking for contradictory guidance which can confuse agents depending on prompt interpretation.
# ensure AGENTS.md route guidance matches repo structure
if rg -n "src/routes" "$REPO_ROOT/AGENTS.md" >/dev/null 2>&1 \
&& [ ! -d "$REPO_ROOT/src/routes" ]; then
log_warn "AGENTS.md references src/routes but the directory no longer exists"
else
log_pass "Route guidance matches repo structure"
fi
Deterministic Anti-Pattern Checks
I think we've all seen coding agents clearly favor certain frameworks based on training data. You can use these types of checks to enforce project conventions especially in cases of common violations.
Some examples of patterns could be:
- no raw
try/catchin service business logic - no NestJS decorators in an Express codebase
- no
chaiin tests where Jest is the standard
This check will obviously vary greatly by project, but here are some example code snippets.
# no try/catch in services
if rg -n 'try\s*\{' "$REPO_ROOT/src/services/" >/dev/null 2>&1; then
log_fail "Service files contain try/catch blocks (should use Result<T>)"
else
log_pass "No try/catch in service files"
fi
# no chai in tests
if rg -n "from ['\"]chai['\"]" "$REPO_ROOT/src/test/" >/dev/null 2>&1; then
log_fail "Test files import chai, should use Jest"
else
log_pass "Test files don't use chai"
fi
# no NestJS decorators in routes/services
if rg -n '@(Controller|Get|Post|Injectable)' "$REPO_ROOT/src/routes/" "$REPO_ROOT/src/services/" >/dev/null 2>&1; then
log_fail "Found NestJS decorators in Express layers"
else
log_pass "No NestJS decorator drift"
fi
These checks enforce repo contracts before you run behavioral LLM-as-judge evals.
When to add an eval platform
This post is intentionally script-first and focused on things that can be checked deterministically in your environment.
Capability drift is different and needed to measure if the agent is getting better or worse over time. This gets into behavioral evals and deserves its own exploration.
Regardless, keep your more deterministic repo checks. They are a low lift and valuable!
I'd love to hear more about how you are approaching checks and evals in your projects. Leave me a comment below.
Top comments (0)