Engineering Safety: A Layered Governance Architecture for GitHub

#aisafety #githubcopilot #aiguardrails #agenticai

Building safe AI agents requires more than just a good system prompt. It requires infrastructure that enforces constraints at every stage of the development lifecycle.

This week, we merged three contributions into the github/awesome-copilot repository (#755, #756, #757). Together, they implement a layered governance architecture designed to help developers build secure agentic workflows by default.

Here is the technical breakdown of the implementation.

Layer 1: Pre-Computation Safety (The Hook)

Component: governance-audit

We implemented a client-side hook that intercepts userPromptSubmitted events. This is a shell-based scanner that analyzes prompts against a regex library of known threat signatures before the request leaves the developer's machine.

Threat Categorization: We classify signals into 5 buckets: data_exfiltration ("curl -d"), privilege_escalation ("chmod 777"), system_destruction ("rm -rf"), prompt_injection, and credential_exposure.
Local Execution: Privacy was a strict constraint. The scanning logic (audit-prompt.sh) runs entirely locally, ensuring no prompt data is sent to a third-party logger.
Configurable Severity: The hook supports four governance levels (open, standard, strict, locked), allowing teams to balance friction vs. safety.

This layer prevents “accidental” unsafe code generation by catching intent before it reaches the model.

Layer 2: In-Context Pattern Matching (The Skill)

Component: agent-governance

To generate secure code, the model needs to understand valid security patterns. We added a skill definition that injects specific governance context into Copilot's retrieval path.

Key patterns covered:

Policy-as-Code: Standardizing on declarative YAML for allowlists/blocklists rather than hardcoding logic.
Trust Scoring: Implementing decay-based trust models for multi-agent delegation. (e.g., If Agent A fails a task, its score degrades; if it succeeds, it increments).
Auditability: Enforcing append-only logging for all tool invocations.

By formalizing these as a “Skill,” we ensure Copilot retrieves high-quality examples for PydanticAI and CrewAI rather than hallucinating insecure implementations.

Layer 3: Post-Generation Verification (The Agent)

Component: agent-governance-reviewer

The final layer is verification. We introduced a specialized Copilot agent (agents/agent-governance-reviewer.agent.md) configured to act as a security linter.

Unlike a standard linter, this agent reviews semantic safety:

Decorator Audits: Checks if sensitive tools are wrapped with the @govern decorator.
Secret Detection: Scans for hardcoded secrets in agent configuration blocks.
Trust Boundary Analysis: Verifies that multi-agent handoffs include explicit identity verification steps.

Conclusion

This work represents a shift from “ad-hoc” safety to structural safety. By embedding these patterns directly into the developer’s IDE via the awesome-copilot standard, we reduce the friction of implementing robust governance.

This aligns with our broader work on Agent-OS, creating a standardized control plane for autonomous systems.