Request for Comments: Meridian
I'm building an AI-powered code review agent for the GitLab AI Hackathon and would love feedback from practicing engineers.
The Hypothesis
Problem 1: MRs get merged without fully implementing acceptance criteria, causing requirement drift and rework.
Problem 2: Developers change code without understanding historical design constraints, causing regressions.
Cost: Estimated 20-30% of merged code needs follow-up work (based on anecdotal observation).
The Proposed Solution
An autonomous agent that:
Acceptance Criteria Validation
Issue #123:
criteria:
- Export to CSV ✓
- Export to JSON ✗
- Include all fields ✗
MR Analysis:
implemented: 1/3 criteria
action: Block merge
recommendation: Complete remaining criteria or update issue scope
Historical Context Surfacing
File: auth_flow.py
Lines changed: 45-67
Historical Context:
original_mr: #89 (8 months ago)
design_decision: "SSO requires token refresh every 30s"
edge_case: "Enterprise customers need persistent sessions"
warning: "Your changes remove refresh logic. SSO may break."
Technical Approach
- LLM: Anthropic Claude 3.5 Sonnet (semantic understanding)
- Platform: GitLab Duo Agent Platform
- Architecture: Event-driven (webhooks → async analysis → automated comments)
- Stack: Python, FastAPI, PostgreSQL, Redis
Questions for You
1. Problem Validation
Does this problem exist in your team?
- [ ] Yes, constantly
- [ ] Yes, occasionally
- [ ] Rarely
- [ ] No, not a problem
2. Solution Validation
Would automated blocking help or create friction?
Scenarios:
- MR implements 3/5 criteria → Agent blocks merge
- Dev changes old code → Agent warns about design constraint
- Both scenarios happen
Your reaction:
- [ ] This would save us hours
- [ ] This would be annoying
- [ ] Depends on accuracy
3. Workflow Fit
Do you document acceptance criteria in a parseable format?
- [ ] Yes (checkboxes, bullet points in issues)
- [ ] Partially (sometimes)
- [ ] No (verbal/Slack/tribal knowledge)
4. Alternative Solutions
What have you tried?
- PR templates with checklists?
- Manual gating process?
- Code ownership + tribal knowledge?
- Nothing?
5. False Positive Tolerance
How accurate would this need to be?
- 50% accurate → Would you use it?
- 70% accurate → Would you use it?
- 90% accurate → Would you use it?
- 100% accurate or nothing?
Why This Matters
Building for GitLab AI Hackathon (45-day timeline). Targeting $10K prize, but more importantly:
- Learning distributed systems
- Leveling up engineering practices
- Building something people actually want
I'd rather pivot now than build something useless.
How to Give Feedback
Comment below with:
- Your role (engineer/lead/manager)
- Team size
- Answers to questions above
- Any other thoughts
Thanks for your time! 🙏
Top comments (2)
This is a really interesting approach to the "acceptance criteria drift" problem. I've seen this exact issue in multiple teams — the PR looks good, tests pass, code is clean, but it only implements 60% of what was actually requested.
A few thoughts from the trenches:
1. The blocking approach needs nuance
I'd suggest a "confidence score" rather than a binary block. 90%+ confidence = block; 70-89% = warning with required acknowledgment; below 70% = just a comment. This gives teams a dial to tune based on their false positive tolerance.
2. Historical context is the real killer feature
The acceptance criteria validation is useful, but the historical design constraint surfacing is where the 10x value lives. "You just broke the enterprise SSO flow that was carefully architected 8 months ago" — that's the kind of thing that saves days of debugging and customer escalations.
3. Consider the "why" not just the "what"
If you can capture the rationale behind design decisions (not just the decisions themselves), you'd have something truly powerful. "Don't just tell me we need token refresh every 30s — tell me why (enterprise security policy #47) so I know if my workaround is acceptable."
Team context: Former tech lead, 12-person team, GitLab self-hosted. We absolutely have this problem — probably 15-20% of merged MRs need follow-up work for missed edge cases.
Would definitely try this at 70%+ accuracy. Great RFC, good luck with the hackathon!
This feedback is incredible.
Thank you!
You just valdated the entire hypothesis AND gave me a road map for making this production-worthy.
Thank you!