Copilot Workflow Composer: I built an 8-Layer Safety Engine that generates RLHF data

#devchallenge #githubchallenge #cli #githubcopilot

GitHub Copilot CLI Challenge Submission

*This is a submission for the [GitHub Copilot CLI Challenge]
What I Built
I built Copilot Workflow Composer (CWC), a production-grade orchestration engine that turns AI code suggestions into safe, multi-step workflows.

While tools like GitHub Copilot CLI are amazing for generating single commands, running them in production requires safety. CWC wraps the AI in an 8-Layer Safety Architecture that validates every command before execution.

The "Killer Feature": CWC isn't just a safety tool; it's a data engine. Every time a human intervenes to fix or steer the AI (via our "Steering Interface"), CWC logs that correction. Over time, this builds a proprietary RLHF (Reinforcement Learning from Human Feedback) dataset that captures your organization's specific engineering culture.

Key Technical Highlights:

Architect-Builder Pattern: Routes planning to fast models (Haiku) and execution to smart models (Sonnet), reducing AI costs by 62%.

8-Layer Safety: Includes schema validation, recursive descent condition parsing (O(n) complexity), and 18+ malicious pattern detectors.

1,200+ Tools: Integrates with the Model Context Protocol (MCP) to access 1,241 external tools.

Production Ready: 100% test coverage with 434 passing tests.

Demo
You can find the full source code and documentation here:
https://github.com/Ayush-CS-89112521/Copilot-Workflow-Composer-CWC-

The "Hero" Demo
Here is CWC in action. Watch as the Architect plans a complex workflow, the Safety Layer validates it, and the Steering Interface allows me to guide the execution.

My Experience with GitHub Copilot CLI
Building a security-critical tool required precision, and GitHub Copilot CLI was my partner in "defense-in-depth."

Generating Regular Expressions for Safety
The hardest part of this project was Layer 5 (The Pattern Library). I needed to detect malicious obfuscated bash commands (like base64 encoded payloads). I used gh copilot suggest to generate robust Regex patterns that catch these edge cases without blocking legitimate code.
Optimizing the Parser
For Layer 3, I needed a recursive descent parser for condition evaluation that wouldn't crash the event loop. I asked Copilot to "Explain how to implement an LL(1) grammar parser in TypeScript," and it helped me structure the tokenization logic to ensure O(n) time complexity.
Test-Driven Development
Achieving 100% test coverage for 114+ tests was daunting. I used Copilot to scaffold the Red Team attack vectors (the "stress-test.yaml"), asking it to "Generate 8 common bash obfuscation techniques for security testing." It saved me days of research.

This project represents the future of AI development: Humans acting as the "Architect" and "Safety Gate," while AI handles the execution.

DEV Community

Copilot Workflow Composer: I built an 8-Layer Safety Engine that generates RLHF data

Top comments (0)