Description
Technical deep-dive on building a GitHub Action that prevents institutional amnesia by surfacing past architectural decisions on Pull Requests
I spent 3 years building systems. During that time, I watched teams waste months repeatedly debating architectural decisions they'd already made. We use ADRs in our company, but enforcement is inconsistent - only senior devs remember to update them.
So I built Decision Guardian - a GitHub Action that surfaces past architectural decisions directly on Pull Requests.
Here's how I did it, and the technical challenges I solved along the way.
The Solution: Decision Guardian
Decision Guardian is a GitHub Action that:
- Parses architectural decisions from markdown files
- Matches them against PR file changes
- Surfaces relevant context automatically as a comment
Example decision file (.decispher/decisions.md):
<!-- DECISION-DB-001 -->
## Decision: Database Choice for Billing
**Status**: Active
**Date**: 2024-03-15
**Severity**: Critical
**Files**:
- `src/db/pool.ts`
- `config/database.yml`
### Context
We chose Postgres over MongoDB because billing requires
ACID compliance for financial transactions.
**Alternatives rejected:**
- MongoDB: No ACID guarantees
- Redis: Added unnecessary complexity
**Before modifying:** Consult with @tech-lead
---
When someone opens a PR touching src/db/pool.ts, Decision Guardian reacts.
Technical Challenge #1: Pattern Matching at Scale
The Naive Approach (O(N×M))
My first implementation was simple:
// For each file in PR
for (const file of prFiles) {
// Check against every decision
for (const decision of decisions) {
if (decision.files.includes(file.path)) {
matches.push(decision);
}
}
}
Problem: With 100 files and 500 decisions = 50,000 comparisons
On a large PR (3000 files), this took 12+ seconds and sometimes hit GitHub Actions timeout.
The Solution: Prefix Trie
I built a prefix trie to index decisions by file patterns:
interface TrieNode {
children: Map;
decisions: Decision[];
wildcardDecisions: Decision[];
}
class PatternTrie {
private root: TrieNode;
constructor(decisions: Decision[]) {
this.root = this.createNode();
for (const decision of decisions) {
for (const pattern of decision.files) {
this.insert(pattern, decision);
}
}
}
private createNode(): TrieNode {
return {
children: new Map(),
decisions: [],
wildcardDecisions: [],
};
}
private insert(pattern: string, decision: Decision): void {
const parts = pattern.split('/');
this.insertRecursive(this.root, parts, decision);
}
private insertRecursive(node: TrieNode, parts: string[], decision: Decision): void {
if (parts.length === 0) {
node.decisions.push(decision);
return;
}
const part = parts[0];
const remaining = parts.slice(1);
// Handle wildcards specially
if (part === '**') {
node.wildcardDecisions.push(decision);
if (remaining.length > 0) {
this.insertRecursive(node, remaining, decision);
}
return;
}
// Handle glob patterns
if (
part.includes('*') ||
part.includes('?') ||
part.includes('{') ||
part.includes('}') ||
part.includes('[') ||
part.includes(']')
) {
node.wildcardDecisions.push(decision);
return;
}
// Exact match - traverse deeper
let child = node.children.get(part);
if (!child) {
child = this.createNode();
node.children.set(part, child);
}
this.insertRecursive(child, remaining, decision);
}
/**
* Returns a set of unique decisions that *might* match the given file path.
*/
findCandidates(file: string): Set {
const parts = file.split('/');
const candidates = new Set();
this.collectCandidates(this.root, parts, candidates);
return candidates;
}
private collectCandidates(node: TrieNode, parts: string[], candidates: Set): void {
// Collect wildcard matches at this level
for (const decision of node.wildcardDecisions) {
candidates.add(decision);
}
if (parts.length === 0) {
// Reached the end - collect exact matches
for (const decision of node.decisions) {
candidates.add(decision);
}
return;
}
const part = parts[0];
const child = node.children.get(part);
if (child) {
this.collectCandidates(child, parts.slice(1), candidates);
}
}
}
Performance improvement:
- Before: O(N×M) → 12 seconds for 3000 files
- After: O(M × log D) → 2.8 seconds for same PR
That's a 4.3x speedup on large PRs.
Technical Challenge #2: Security (ReDoS Protection)
The Problem
Users can define custom patterns using regex:
{
"type": "file",
"pattern": "src/**/*.ts",
"content_rules": [{
"mode": "regex",
"pattern": "(a+)+b" // ⚠️ Evil regex
}]
}
That pattern (a+)+b is vulnerable to ReDoS (Regular Expression Denial of Service).
When tested against aaaaaaaaaaaaaaaaaaaa (no 'b'), it creates exponential backtracking:
- 20 'a's: ~1 second
- 25 'a's: ~30 seconds
- 30 'a's: freeze forever
This could DOS the entire GitHub Action.
The Solution: Multi-Layer Protection
Layer 1: Safe-Regex Check
import safeRegex from 'safe-regex';
function validatePattern(pattern: string): void {
if (!safeRegex(pattern)) {
throw new Error(`Unsafe regex pattern: ${pattern}`);
}
}
Layer 2: VM Sandbox with Timeout
Even safe-regex can miss some cases, so I added a VM sandbox:
import vm from 'vm';
function runRegexWithTimeout(pattern: string, flags: string, text: string, timeoutMs: number): boolean {
const sandbox = Object.create(null);
sandbox.result = false;
sandbox.text = String(text);
sandbox.pattern = String(pattern);
sandbox.flags = String(flags || '');
const context = vm.createContext(sandbox, {
name: 'RegexSandbox',
codeGeneration: {
strings: false,
wasm: false,
},
});
const code = `
'use strict';
try {
const regex = new RegExp(pattern, flags);
result = regex.test(text);
} catch (e) {
result = false;
}
`;
try {
vm.runInContext(code, context, {
timeout: timeoutMs,
displayErrors: false,
});
return Boolean(sandbox.result);
} catch (e) {
return false;
}
}
Key security features:
- ✅ Isolated VM context - No access to Node.js globals or filesystem
- ✅ Hard timeout - Kills execution after 5 seconds
- ✅ No code generation - Prevents
eval()and WebAssembly escapes - ✅ String coercion - Prevents prototype pollution
Layer 3: Input Size Limits
const MAX_CONTENT_SIZE = 1_000_000; // 1MB
const MAX_REGEX_LENGTH = 1000;
if (content.length > MAX_CONTENT_SIZE) {
throw new Error('Content too large for regex matching');
}
if (pattern.length > MAX_REGEX_LENGTH) {
throw new Error('Regex pattern too long');
}
Layer 4: Result Caching
import crypto from 'crypto';
class ContentMatchers {
private resultCache = new Map();
private readonly MAX_CACHE_SIZE = 500;
private createCacheKey(pattern: string, flags: string, content: string): string {
const contentHash = crypto
.createHash('sha256')
.update(content)
.digest('hex')
.substring(0, 16);
return `${pattern}:${flags}:${contentHash}`;
}
async matchRegex(rule: ContentRule, fileDiff: FileDiff): Promise {
const changedContent = this.getChangedLines(fileDiff.patch).join('\n');
const cacheKey = this.createCacheKey(rule.pattern!, rule.flags || '', changedContent);
const cached = this.resultCache.get(cacheKey);
if (cached !== undefined) {
return { matched: cached, matchedPatterns: cached ? [rule.pattern!] : [] };
}
try {
const matched = this.runRegexWithTimeout(
rule.pattern!,
rule.flags,
changedContent,
5000
);
this.updateCache(cacheKey, matched);
return { matched, matchedPatterns: matched ? [rule.pattern!] : [] };
} catch (error) {
return { matched: false, matchedPatterns: [] };
}
}
private updateCache(key: string, value: boolean): void {
if (this.resultCache.size >= this.MAX_CACHE_SIZE) {
// LRU eviction
const firstKey = this.resultCache.keys().next().value;
if (firstKey) this.resultCache.delete(firstKey);
}
this.resultCache.set(key, value);
}
}
Result: Zero ReDoS vulnerabilities in production. ✅
Technical Challenge #3: Handling Massive PRs
The Problem
Some PRs modify 3000+ files (dependency updates, refactors, migrations).
GitHub's API returns all changed files, but:
- Loading 3000 file diffs into memory → OOM (Out of Memory)
- Processing them serially → timeout
- Posting a comment with all matches → exceeds GitHub's 65KB limit
Solution 1: Streaming Processing
Instead of loading all files at once:
async function* streamFileDiffs(
token: string
): AsyncGenerator {
const octokit = github.getOctokit(token);
const { owner, repo, pull_number } = github.context;
let page = 1;
const MAX_PAGES = 30;
while (page <= MAX_PAGES) {
const { data } = await octokit.rest.pulls.listFiles({
owner,
repo,
pull_number,
per_page: 100,
page,
});
if (data.length === 0) break;
yield data.map((f) => ({
filename: f.filename.replace(/\\/g, '/'),
status: f.status as FileDiff['status'],
additions: f.additions,
deletions: f.deletions,
changes: f.changes,
patch: f.patch || '',
previous_filename: f.previous_filename,
}));
if (data.length < 100) break;
page++;
}
}
// Usage
const matches: DecisionMatch[] = [];
for await (const batch of streamFileDiffs(token)) {
const batchMatches = await matcher.findMatchesWithDiffs(batch);
matches.push(...batchMatches);
core.info(`Processed ${batch.length} files, found ${matches.length} matches so far`);
}
Memory usage:
- Before: High memory usage for 3000 files → OOM risk
- After: Constant memory (processes 100 files at a time) → No crashes ✅
Solution 2: Progressive Truncation
If the comment exceeds GitHub's limit, truncate intelligently:
function truncateComment(decisions: Decision[], maxLength = 65000): string {
let comment = formatComment(decisions);
if (comment.length <= maxLength) {
return comment;
}
// Layer 1: Show 20 decisions in detail, summarize rest
comment = formatComment(decisions, { detailLimit: 20 });
if (comment.length <= maxLength) return comment;
// Layer 2: Show 10 decisions in detail
comment = formatComment(decisions, { detailLimit: 10 });
if (comment.length <= maxLength) return comment;
// Layer 3: Show 5 decisions in detail
comment = formatComment(decisions, { detailLimit: 5 });
if (comment.length <= maxLength) return comment;
// Layer 4: Show 2 decisions in detail
comment = formatComment(decisions, { detailLimit: 2 });
if (comment.length <= maxLength) return comment;
// Layer 5: Show counts only
comment = formatCommentCounts(decisions);
if (comment.length <= maxLength) return comment;
// Layer 6: Hard truncate as last resort
return hardTruncate(comment, maxLength);
}
Result: Never hit comment size limit, even with 1000+ matched decisions. ✅
Technical Challenge #4: Idempotent Comments
The Problem
GitHub Actions can run multiple times for a single PR:
- New commit pushed
- Workflow re-run
- Manual trigger
Without proper handling:
- 3 runs = 3 duplicate comments
- Spams the PR thread
- Confuses reviewers
The Solution: Content Hash
async function upsertComment(
prNumber: number,
content: string
): Promise {
const hash = crypto
.createHash('sha256')
.update(content)
.digest('hex')
.substring(0, 16);
const marker = ``;
const hashMarker = ``;
const fullContent = `${marker}\n${hashMarker}\n\n${content}`;
// Find existing comment
const comments = await octokit.issues.listComments({
owner,
repo,
issue_number: prNumber,
});
const existing = comments.data.find(c =>
c.body?.includes('decision-guardian-v1')
);
if (existing) {
const existingHash = existing.body?.match(/hash:([a-f0-9-]+)/)?.[1];
if (existingHash === hash) {
console.log('Comment unchanged, skipping update');
return;
}
// Update existing comment
await octokit.issues.updateComment({
owner,
repo,
comment_id: existing.id,
body: fullContent,
});
} else {
// Create new comment
await octokit.issues.createComment({
owner,
repo,
issue_number: prNumber,
body: fullContent,
});
}
}
Result:
✅ Single comment per PR
✅ Updates in-place when decisions change
✅ No spam, no duplicates
Architecture Overview
┌────────────────────────────────────────────────────┐
│ DECISION GUARDIAN │
├────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ DECISION PARSER (AST-based) │ │
│ │ - Markdown parsing with remark │ │
│ │ - JSON rule extraction & validation │ │
│ └──────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────┐ │
│ │ DECISION INDEX (Prefix Trie) │ │
│ │ - O(log n) file lookup │ │
│ │ - Wildcard pattern optimization │ │
│ └──────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────┐ │
│ │ FILE MATCHER (Rule Evaluator) │ │
│ │ - Glob pattern matching │ │
│ │ - Content diff analysis │ │
│ │ - ReDoS protection │ │
│ └──────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────┐ │
│ │ COMMENT MANAGER (Idempotent) │ │
│ │ - Hash-based update detection │ │
│ │ - Progressive truncation │ │
│ │ - Retry with exponential backoff │ │
│ └──────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────┘
High-level flow:
PR Created → Parse Decisions → Match Files → Post Comment → Check Status
Key components:
-
Parser (
parser.ts): Markdown → structured data -
Matcher (
matcher.ts): Trie-based file matching -
Rule Evaluator (
rule-evaluator.ts): Advanced rules -
Comment Manager (
comment.ts): Idempotent PR comments
Lessons Learned
1. Performance matters from day 1
I could have shipped with the O(N×M) algorithm and optimized later.
But teams with large PRs would have hit timeouts immediately and never come back.
Lesson: Build for scale early, especially in tools that run on every PR.
2. Security is not optional
The ReDoS vulnerability wasn't theoretical - during testing, a user accidentally created a pattern that froze the action for 5 minutes.
Lesson: Validate all user input, especially anything that can loop or recurse.
3. Idempotency prevents pain
Early versions created duplicate comments. Users reported it as "spammy" and disabled the action.
Adding content hashing fixed this and improved adoption.
Lesson: Make actions side-effect-free and repeatable.
4. Documentation > features
I spent 60% of development time on README, examples, and error messages.
Users still ask "how do I use this?" constantly.
Lesson: You can never document enough.
Try It Yourself
Install:
- uses: DecispherHQ/decision-guardian@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
Example decision:
## Decision: Database Choice
**Status**: Active
**Date**: 2024-03-15
**Files**: `src/db/**`
### Context
We chose Postgres for ACID compliance.
Rejected: MongoDB (no ACID), Redis (complexity)
Links:
- GitHub: https://github.com/DecispherHQ/decision-guardian
- Docs: https://decision-guardian.decispher.com
- Marketplace: https://github.com/marketplace/actions/decision-guardian
What's Next?
Short-term:
- GitLab/Bitbucket support (if demand exists)
- Decision templates
Long-term:
- VS Code extension (show decisions inline)
- Analytics dashboard
- Cross-repository rules
Want to contribute? Open an issue or start a discussion.
Conclusion
Decision Guardian is free, open source (MIT), and takes 2 minutes to set up.
What architectural decisions does your team repeatedly debate?
Drop a comment - I'd love to hear your stories.
Made with ❤️ by Ali Abbas

Top comments (0)