zaxion

Posted on Mar 9

I Broke 50 PRs With One Config Change. Here's How I Built a Time Machine to Prevent It.

#github #opensource #programming #devops

We've all been there. You decide it's time to improve code quality. "No more console.log in production code," you declare. You add a simple ESLint rule, push the config, and merge.

Ten minutes later, your Slack blows up.

"Why is the build failing on my PR?"
"I can't deploy the hotfix!"
"Who turned on the fun police?"

You just broke 50 open pull requests because you didn't know how widespread the "violation" was. You revert the change, apologize, and the codebase remains messy.

This fear of "Policy Shock"—the disruption caused by enforcing new rules—is why many teams are afraid to tighten their governance.

But what if you could time-travel? What if you could test your new rule against the last 100 PRs in your repo before you merged it?

That's exactly what we built. Here is the technical deep dive into how we created a Policy Impact Simulator for GitHub.

The Problem: Governance is a Guessing Game

Most CI/CD pipelines are binary: pass or fail. When you introduce a new check, it applies to everything immediately. There is no "try before you buy."

We needed a system that could:

Draft a policy (e.g., "Max PR size: 20 files").
Fetch historical data (snapshots of past PRs).
Replay the draft policy against that history.
Visualize the "Blast Radius"—how many legit PRs would have been blocked?

The Architecture

We built this using a Node.js backend (Express) and a React frontend. The core logic resides in a PolicySimulationService that acts as our time machine.

1. The Snapshot Engine

The first challenge is getting data. We don't want to clone repos and run npm install 100 times—that's too slow. Instead, we fetch metadata snapshots via the GitHub API.

We treat a PR as a collection of facts:

File count
Extensions used (.ts, .js, .py)
Test coverage ratios
Diff stats (additions/deletions)

Here is a simplified view of our snapshot collector:

// backend/src/services/policySimulation.service.js

async function collectSnapshots(repo, daysBack) {
  // 1. Fetch merged PRs from the last N days
  const prs = await github.fetchHistoricalPRs(repo, daysBack);

  // 2. Extract lightweight "Fact Snapshots"
  return prs.map(pr => ({
    id: pr.number,
    files_count: pr.changed_files,
    has_tests: pr.files.some(f => f.filename.includes('.test.')),
    extensions: [...new Set(pr.files.map(f => path.extname(f.filename)))],
    // ... other metadata
  }));
}

By abstracting the code into metadata "facts," we can run thousands of simulations in seconds without touching the filesystem.

2. The Simulation Loop (The "Judge")

Once we have the snapshots, we feed them into our evaluation engine. This is where the magic happens. We call this "The Judge."

The Judge takes a Draft Policy (JSON logic) and a Snapshot, and returns a verdict: PASS or BLOCK.

// The core simulation loop
async function executeSimulation(draftRules, snapshots) {
  const results = {
    blocked: 0,
    passed: 0,
    impacted_prs: []
  };

  for (const snapshot of snapshots) {
    // The Judge evaluates the rule
    const verdict = evaluate(draftRules, snapshot);

    if (verdict === 'BLOCK') {
      results.blocked++;
      results.impacted_prs.push({
        pr: snapshot.id,
        reason: `Violated rule: ${draftRules.type} (Limit: ${draftRules.value})`
      });
    } else {
      results.passed++;
    }
  }

  return results;
}

This deterministic loop allows us to tweak a threshold—say, changing max file count from 20 to 50—and see the impact graph update instantly.

3. Frontend Visualization

On the frontend, we use React to make this data actionable. We built a PolicySimulation component that lets users:

Select a target repo.
Configure a draft policy (e.g., "Require 2 reviewers").
Hit "Simulate".

The results are rendered using Recharts to show the "Blast Radius."

// frontend/src/components/governance/PolicySimulation.tsx

export const PolicySimulation = () => {
  const [result, setResult] = useState<SimulationResult | null>(null);

  // ... setup logic ...

  return (
    <div className="grid grid-cols-3 gap-6">
      <Card>
        <CardTitle>Simulation Configuration</CardTitle>
        <Select onValueChange={setPolicy}>
          <SelectItem value="pr_size">Max PR Size</SelectItem>
          <SelectItem value="coverage">Test Coverage</SelectItem>
        </Select>
        <Button onClick={runSimulation}>
          <Play className="mr-2" /> Simulate Impact
        </Button>
      </Card>

      <div className="col-span-2">
        {result && (
           <Alert variant={result.blast_radius > 50 ? "destructive" : "default"}>
             <AlertTitle>Blast Radius Alert</AlertTitle>
             <AlertDescription>
               This policy would have blocked {result.total_blocked} out of {result.total_scanned} PRs.
               {result.blast_radius > 50 ? " This is too disruptive!" : " Safe to merge."}
             </AlertDescription>
           </Alert>
        )}
        {/* Charts go here */}
      </div>
    </div>
  );
};

We intentionally calculate a "Friction Index". If a policy blocks >20% of historical PRs, we flag it as "High Friction." This simple heuristic has saved us from merging overly aggressive rules countless times.

Lessons Learned

Building this tool taught us three key lessons about developer experience (DX):

Metadata > Source Code: You rarely need the full AST to make high-level governance decisions. Metadata (file types, sizes, authors) covers 80% of use cases and is 100x faster to process.
Feedback Loops Matter: When you can see the impact of a rule immediately, you write better rules. It turns governance from a bureaucratic "gate" into a design problem.
JSON Schema is Powerful: Defining policies as JSON (rather than hardcoded functions) allows us to version them, diff them, and—crucially—simulate them without deploying code.

Future Work: AI Analysis

Our next step is integrating LLMs to explain why a policy failed. Instead of just saying "Blocked," we want the system to look at the PR description and say, "Blocked because this PR touches the payment gateway but lacks a 'Security' label."

We have a prototype running using a translate-natural-language endpoint that converts plain English ("Block PRs with no tests") into our JSON schema.

// Transforming English to Policy Config
const result = await api.post('/v1/policies/translate-natural-language', {
  description: "Block huge PRs"
});
// Output: { type: "pr_size", max_files: 50 }

Try It Yourself

This simulator is part of our broader initiative to make governance invisible and helpful, rather than painful.

If you're tired of guessing whether your new lint rule will cause a revolt, I highly recommend building a simple "dry run" script for your CI. Even a basic script that greps through your last 50 PRs can save you a headache.

What tools do you use to test your dev processes? Let me know in the comments—I'd love to see how others are solving the "Policy Shock" problem.

Thanks for reading! If you found this technical breakdown useful, drop a star or comment below.

DEV Community