137Foundry

Posted on Apr 17

How to Build a Code Quality Gate for AI-Assisted Pull Requests

#webdev #programming #productivity

Code quality gates exist to automate the mechanical checks so reviewers can focus on judgment calls. That premise becomes more valuable when a significant portion of the code is AI-generated, because AI tools produce more code per developer than before, and the failure modes are different from what reviewers are trained to look for.

This guide covers how to build a quality gate pipeline specifically calibrated to AI-assisted development: what to automate, what to leave to human review, and how to sequence the checks to keep feedback loops fast. The goal is a process that scales with increased PR volume without requiring proportionally more review time.

Step 1: Define What AI-Assisted Means for Your Team

Before building anything, agree on what counts as AI-generated or AI-assisted code in your workflow. The practical definition matters for deciding which checks to apply at which thresholds, and it creates accountability that wouldn't otherwise exist.

Some teams require authors to tag PRs as AI-assisted when more than 50% of the diff is AI-generated. Others apply the same checks to all PRs. The labeling approach has a useful side effect: it makes explicit what was AI-generated, which changes how reviewers approach the diff.

A simple PR template addition:

## AI Assistance
- [ ] This PR contains significant AI-generated code (>25% of diff)
- [ ] I have verified all external library method calls against the installed version
- [ ] I have run all tests locally and reviewed test output, not just pass/fail status

The checkboxes can be enforced as required gates before merging if you configure branch protection rules to require all PR template items to be checked. This creates a lightweight but meaningful author accountability checkpoint.

Step 2: Set Up Static Analysis in CI

Static analysis should run on every PR. The rule configuration can be tuned for AI-generated failure modes specifically, beyond general code quality checks.

For JavaScript/TypeScript projects, combine ESLint with TypeScript's type-aware rules. Type-aware rules catch method calls on incorrect types - a common AI generation error. Run on changed files only to keep CI time under two minutes:

# .github/workflows/lint.yml
- name: Lint changed files
  run: |
    CHANGED=$(git diff --name-only origin/main...HEAD -- '*.ts' '*.tsx')
    if [ -n "$CHANGED" ]; then
      npx eslint --parser-options project:./tsconfig.json $CHANGED
    fi

For Python projects, add Semgrep alongside flake8 or pylint. Semgrep's community rules include checks for common AI-generated patterns like deprecated API usage and security antipatterns. The configuration is minimal:

- name: Semgrep
  uses: returntocorp/semgrep-action@v1
  with:
    config: p/default p/security-audit

Step 3: Require Branch Coverage in Tests

Line coverage misses a category of behavioral errors that AI-generated code commonly contains: correct handling of one branch but absent handling of another. Switching to branch coverage requirements catches these gaps automatically.

For Python:

pytest --cov=src --cov-branch --cov-report=term-missing --cov-fail-under=85 tests/

For JavaScript with Jest:

// jest.config.js - coverage thresholds
coverageThreshold: {
  global: {
    branches: 80,
    functions: 85,
    lines: 90,
    statements: 90
  }
}

Set the threshold at what your current codebase achieves, then enforce it as a minimum. AI-generated code that significantly drops coverage metrics is a signal that the tests don't exercise the new branches. The branch coverage report also shows which specific conditions aren't tested, making it actionable for reviewers.

Step 4: Automate Dependency Verification

AI coding tools sometimes generate import statements for library versions that differ from what's pinned in your dependency file, or for packages that are similar-sounding but incorrect. Add dependency audit steps to CI:

# For Node.js projects
- name: Dependency audit
  run: npm audit --audit-level=moderate

# For Python projects  
- name: pip-audit
  run: pip install pip-audit && pip-audit

Additionally, a check that verifies any new dependencies introduced in the PR are explicitly listed in the dependency file catches transitive dependencies that AI tools sometimes generate as if they were direct:

# Detect imports not in requirements.txt (Python - simplified check)
python -m pip check

A package that appears in code but not in the dependency file is either a transitive dependency the AI incorrectly treated as direct, or a package name that doesn't exist under that name.

Step 5: Add Integration Test Requirements for System Boundaries

Static analysis and unit tests verify code in isolation. The highest-value checks for AI-generated code verify behavior at system boundaries, where the new code interacts with a database, an external API, or another service. AI models consistently miss assumptions about system state, concurrent access, and error propagation across boundaries.

Add a label-triggered CI workflow for integration tests on code touching system boundaries:

# Label-based CI trigger for integration tests
- name: Run integration tests if needed
  if: contains(github.event.pull_request.labels.*.name, 'touches-system-boundary')
  run: pytest tests/integration/ -v

Require that PRs touching database models, API clients, or message queue producers and consumers carry the label. Reviewers add it during the review process when they identify that a system boundary is involved. The label triggers the integration test suite for that PR.

Step 6: Set Up Code Complexity Tracking

AI models often generate higher-complexity code than the problem requires, because they optimize for completeness rather than simplicity. Tracking cognitive complexity over time reveals whether AI adoption is increasing technical debt at the function level.

SonarSource community edition provides cognitive complexity tracking as part of its free tier. For smaller teams, radon for Python is a lightweight alternative:

# Flag functions with high cognitive complexity
radon cc src/ -nc --min B

The goal isn't to block PRs on complexity - it's to track whether average complexity is trending upward as AI-generated code accumulates. Establishing a baseline before AI adoption and reviewing the trend quarterly provides early warning before complexity becomes a maintenance problem.

Step 7: Write a Focused Pre-Merge Checklist for Reviewers

Automation handles the mechanical checks. Human reviewers handle the things automation can't: system-level context, business rule correctness, and whether the code does what the system actually needs. A focused checklist directs reviewer attention to these categories specifically.

For AI-assisted PRs, a five-item checklist covers the high-value review work:

Pre-merge checklist for AI-assisted code:
[ ] Verified all new external library calls against installed versions
[ ] Traced the primary error path from start to finish
[ ] Confirmed test names describe behavior, not implementation
[ ] Checked integration points: what calls this? What does this call?
[ ] Read the PR description in the author's own words (not AI-generated)

The last item - requiring a PR description in the author's own words - is a lightweight accountability check. An engineer who can't explain AI-generated code in a paragraph is merging code they don't understand. That accountability gap surfaces as expensive debugging work later.

"Quality gates work when they redirect attention, not just add gates. The checklist should tell reviewers where to look, not just give them more boxes to check." - Dennis Traina, founder of 137Foundry

Putting It Together

A complete quality gate for AI-assisted PRs includes: PR template with author confirmation checkboxes, static analysis on changed files in CI, branch coverage thresholds, dependency auditing, label-triggered integration tests for system boundary changes, complexity trend tracking, and a focused five-item reviewer checklist.

The automation handles the mechanical verification. The human checklist handles the judgment calls. Together they address the specific categories of issues that AI-generated code introduces without adding significant overhead to the review process.

For the broader organizational and process questions around AI coding tools in production - how to set team norms, handle AI-generated code in security-sensitive areas, and measure whether tools are improving or degrading quality over time - see A Practical Framework for Using AI Coding Tools in Production Codebases.

137Foundry helps engineering teams design and implement processes for AI-assisted development that maintain production quality.

DEV Community