AI Agents Write Production Code in CI — Full Test Pyramid, Multi-Perspective Review, and Bug Investigation
Your team gets a refined Jira ticket with a full Execution Plan. Who implements it?
Not you. A CI agent does.
We built a GitHub Actions workflow that takes a Jira ticket and delivers a ready-to-review pull request. No human writes the code. The agent does. Then it reviews its own work from seven perspectives before pushing.
This is Part 3 of our Jira-agent automation series. Parts 1–2 covered ticket creation and refinement. Part 3 explains how we delegate implementation to Cursor CLI agents in CI.
Why CI-Based Agents?
Most agent demos run on a developer's laptop. That breaks when you need:
- Secrets — Jira, Confluence, API tokens
- Integration tests — Postgres, Redis, Kafka, ClickHouse
- E2E tests — Playwright against a running stack
- Environment parity — same as production
We run on a self-hosted runner with the full ecosystem. The agent doesn't just write unit tests — it validates against production-like dependencies.
Our CI environment includes:
- Services: Postgres 15, Redis, ClickHouse 23.3, Kafka (Apache Kafka in KRaft mode)
- Tooling: Typesense for search, NGINX proxy for internal APIs, Foundry Anvil for blockchain minting E2E tests
- Test pyramid: Unit tests (Jest/Go), integration tests (real databases), E2E tests (Playwright)
When the agent writes code, it can run the full test suite immediately. No "works on my machine" excuses.
The Agent Chain
The integration job in our GitHub Actions workflow orchestrates multiple agents:
1. Implement Agent
Model: Cursor CLI with composer-2-fast
Inputs:
-
.agent-context.md— Jira ticket key, summary, description, issue type - Execution Plan from Confluence (fetched in CI, appended to context)
- PR review comments (if the branch/PR already exists from a re-run)
Deliverables (from implement-SKILL.md):
- Implementation — Analyze ticket and Execution Plan, implement changes, run lint and tests
- Tests — Unit, integration, or E2E (decide based on feasibility)
-
Documentation — Create or update
CLAUDE.mdin each affected directory -
Self-review gate — Run
git diff, check against code review checklist. Do NOT commit until review passes. -
PR description — Write
.pr-description.mdwith H1 title (becomes GitHub PR title) and body -
Commit — Conventional format:
feat(DPA2-1234): descriptionorfix(DPA2-1234): descriptionfor bugs
The agent does NOT push. The workflow handles that after internal review.
2. Internal Review Agents
After the implement agent commits, the workflow builds pr-context/ (diff.patch, files.json, description.md).
Perspective classification: Based on changed files, the workflow determines which review perspectives are active:
-
Frontend — if
apps/*-webapp/*orlibs/ui-*/*changed -
Backend — if
apps/*(non-webapp) orlibs/shared-nestjs/*orlibs/go-*/*changed - Architecture — if backend code changed
- Security — if app code changed AND (backend code OR security-relevant files)
- Observability — if app code changed AND observability-relevant files
- QA — if any app code changed
- PO — if any Jira ticket is linked
Model: claude-4.6-opus-high
Each perspective agent writes result-<perspective>.json:
{
"perspective": "Senior Backend Engineer",
"action": "changes_required" | "clean",
"summary": "One paragraph.",
"comments": [
{
"path": "apps/portal-backend/src/service.ts",
"line": 42,
"side": "RIGHT",
"body": "[Backend] [Critical] Missing error handling for Prisma query"
}
]
}
Rules:
- Max 3 inline comments per perspective
- Only
[Critical]and[Major]get inline comments - Findings only — no praise
3. Synthesize Review Results
The workflow merges inline comments from all perspectives:
- Deduplicate by
path + line + side - Sort by severity (
[Critical]first, then[Major]) - Merge tags when multiple perspectives flag the same line
- Cap at 10 inline comments
Output: synthesized-comments.json and .review-feedback.md.
If any perspective returned action: "changes_required", the workflow runs the fix agent.
4. Fix Agent
Model: Cursor CLI with composer-2-fast
Inputs:
-
.review-feedback.md— perspective summaries + synthesized inline comments
Deliverables:
- Address each finding
- Run
pnpm nx format:write, lint, tests - Commit with
fix(DPA2-XXXX): address code review findings
5. Push PR and Transition Jira
The workflow:
- Applies
pnpm nx format - Pushes the branch
- Creates a PR with title/body from
.pr-description.md - Dispatches
build-pr.yamlworkflow (full CI with E2E tests) - Transitions Jira to "In Review"
Bug Investigation Flow
For Bug tickets, the refine agent (runs before implement) gets special treatment.
If the issue type is Bug AND the Grafana MCP server is configured, the refine agent:
- Reads
investigate-SKILL.md - Calls Grafana MCP tools:
-
list_datasources— get Loki, Prometheus, Tempo UIDs -
query_loki_logs— fetch error logs -
query_prometheus— error rates, latency -
tempo_traceql-search— find error traces
-
- Creates a Confluence Investigation Report with:
- Error log analysis (categorized patterns with sample log lines)
- Metric analysis (error rate trends, latency changes, Grafana Explore deeplinks)
- Trace analysis (representative error traces showing failure path)
- Root cause hypothesis (ranked with confidence levels)
- Reproduction steps (derived from traces and logs)
- Recommended fix strategy
- Links the Investigation Report to the Jira ticket
- Writes
investigation-report-url.txtfor the refine skill to reference in the Execution Plan
Evidence-based fixes, not guesses.
What We Learned
Agent-written PRs need structure
Skills (implement-SKILL.md, fix-review-SKILL.md) guide agents through deliverables. Freeform prompts produce junk.
Review loops need synthesis
Seven separate reviews → one merged feedback doc. Deduplicated, tagged by perspective, capped at 10 inline comments. Users don't read seven JSON files. They read one markdown summary.
Environment setup is heavyweight
Postgres, Redis, Kafka, ClickHouse, typesense, NGINX proxy, Foundry Anvil. Self-hosted runner pays off.
Full ecosystem in CI enables meaningful test coverage
Most agent demos stop at unit tests. Our agent writes unit, integration, AND E2E tests against production-like dependencies.
Bug investigation flow is a forcing function
Without the Investigation Report, developers guess. With it, they have error patterns, traces, metrics, and a ranked root cause hypothesis.
From Ticket to PR in One Workflow Run
The workflow runs from Jira ticket to GitHub PR without human intervention. The agent:
- ✅ Reads ticket and Execution Plan
- ✅ Implements changes
- ✅ Writes unit, integration, and E2E tests
- ✅ Self-reviews
- ✅ Runs internal review (7 perspectives)
- ✅ Addresses review findings
- ✅ Pushes branch and creates PR
- ✅ Transitions Jira to "In Review"
Your team reviews the PR, not the ticket.
What Breaks When You Try This?
- Environment setup — You need a self-hosted runner with services. GitHub-hosted runners won't cut it.
- Agent commit quality — Skills and deliverables are mandatory. Freeform prompts produce junk.
- Reviewer trust — Do your reviewers trust the agent less than junior devs? After 20 PRs, ours stopped caring who wrote the code.
This is Part 3 of our Jira-agent automation series. Part 4 (coming soon) will cover PR review automation and merge policies.
Mitko Tschimev — Technical lead at 1inch. I write about engineering leadership, architecture, and automation.
Top comments (0)