Mitko Tschimev

Posted on Apr 8

AI Agents Write Production Code in CI — Full Test Pyramid, Multi-Perspective Review, and Bug Investigation

#cicd #ai #githubactions #testing

AI Agents Write Production Code in CI — Full Test Pyramid, Multi-Perspective Review, and Bug Investigation

Your team gets a refined Jira ticket with a full Execution Plan. Who implements it?

Not you. A CI agent does.

We built a GitHub Actions workflow that takes a Jira ticket and delivers a ready-to-review pull request. No human writes the code. The agent does. Then it reviews its own work from seven perspectives before pushing.

This is Part 3 of our Jira-agent automation series. Parts 1–2 covered ticket creation and refinement. Part 3 explains how we delegate implementation to Cursor CLI agents in CI.

Why CI-Based Agents?

Most agent demos run on a developer's laptop. That breaks when you need:

Secrets — Jira, Confluence, API tokens
Integration tests — Postgres, Redis, Kafka, ClickHouse
E2E tests — Playwright against a running stack
Environment parity — same as production

We run on a self-hosted runner with the full ecosystem. The agent doesn't just write unit tests — it validates against production-like dependencies.

Our CI environment includes:

Services: Postgres 15, Redis, ClickHouse 23.3, Kafka (Apache Kafka in KRaft mode)
Tooling: Typesense for search, NGINX proxy for internal APIs, Foundry Anvil for blockchain minting E2E tests
Test pyramid: Unit tests (Jest/Go), integration tests (real databases), E2E tests (Playwright)

When the agent writes code, it can run the full test suite immediately. No "works on my machine" excuses.

The Agent Chain

The integration job in our GitHub Actions workflow orchestrates multiple agents:

1. Implement Agent

Model: Cursor CLI with composer-2-fast

Inputs:

.agent-context.md — Jira ticket key, summary, description, issue type
Execution Plan from Confluence (fetched in CI, appended to context)
PR review comments (if the branch/PR already exists from a re-run)

Deliverables (from implement-SKILL.md):

Implementation — Analyze ticket and Execution Plan, implement changes, run lint and tests
Tests — Unit, integration, or E2E (decide based on feasibility)
Documentation — Create or update CLAUDE.md in each affected directory
Self-review gate — Run git diff, check against code review checklist. Do NOT commit until review passes.
PR description — Write .pr-description.md with H1 title (becomes GitHub PR title) and body
Commit — Conventional format: feat(DPA2-1234): description or fix(DPA2-1234): description for bugs

The agent does NOT push. The workflow handles that after internal review.

2. Internal Review Agents

After the implement agent commits, the workflow builds pr-context/ (diff.patch, files.json, description.md).

Perspective classification: Based on changed files, the workflow determines which review perspectives are active:

Frontend — if apps/*-webapp/* or libs/ui-*/* changed
Backend — if apps/* (non-webapp) or libs/shared-nestjs/* or libs/go-*/* changed
Architecture — if backend code changed
Security — if app code changed AND (backend code OR security-relevant files)
Observability — if app code changed AND observability-relevant files
QA — if any app code changed
PO — if any Jira ticket is linked

Model: claude-4.6-opus-high

Each perspective agent writes result-<perspective>.json:

{
  "perspective": "Senior Backend Engineer",
  "action": "changes_required" | "clean",
  "summary": "One paragraph.",
  "comments": [
    {
      "path": "apps/portal-backend/src/service.ts",
      "line": 42,
      "side": "RIGHT",
      "body": "[Backend] [Critical] Missing error handling for Prisma query"
    }
  ]
}

Rules:

Max 3 inline comments per perspective
Only [Critical] and [Major] get inline comments
Findings only — no praise

3. Synthesize Review Results

The workflow merges inline comments from all perspectives:

Deduplicate by path + line + side
Sort by severity ([Critical] first, then [Major])
Merge tags when multiple perspectives flag the same line
Cap at 10 inline comments

Output: synthesized-comments.json and .review-feedback.md.

If any perspective returned action: "changes_required", the workflow runs the fix agent.

4. Fix Agent

Model: Cursor CLI with composer-2-fast

Inputs:

.review-feedback.md — perspective summaries + synthesized inline comments

Deliverables:

Address each finding
Run pnpm nx format:write, lint, tests
Commit with fix(DPA2-XXXX): address code review findings

5. Push PR and Transition Jira

The workflow:

Applies pnpm nx format
Pushes the branch
Creates a PR with title/body from .pr-description.md
Dispatches build-pr.yaml workflow (full CI with E2E tests)
Transitions Jira to "In Review"

Bug Investigation Flow

For Bug tickets, the refine agent (runs before implement) gets special treatment.

If the issue type is Bug AND the Grafana MCP server is configured, the refine agent:

Reads investigate-SKILL.md
Calls Grafana MCP tools:
- list_datasources — get Loki, Prometheus, Tempo UIDs
- query_loki_logs — fetch error logs
- query_prometheus — error rates, latency
- tempo_traceql-search — find error traces
Creates a Confluence Investigation Report with:
- Error log analysis (categorized patterns with sample log lines)
- Metric analysis (error rate trends, latency changes, Grafana Explore deeplinks)
- Trace analysis (representative error traces showing failure path)
- Root cause hypothesis (ranked with confidence levels)
- Reproduction steps (derived from traces and logs)
- Recommended fix strategy
Links the Investigation Report to the Jira ticket
Writes investigation-report-url.txt for the refine skill to reference in the Execution Plan

Evidence-based fixes, not guesses.

What We Learned

Agent-written PRs need structure

Skills (implement-SKILL.md, fix-review-SKILL.md) guide agents through deliverables. Freeform prompts produce junk.

Review loops need synthesis

Seven separate reviews → one merged feedback doc. Deduplicated, tagged by perspective, capped at 10 inline comments. Users don't read seven JSON files. They read one markdown summary.

Environment setup is heavyweight

Postgres, Redis, Kafka, ClickHouse, typesense, NGINX proxy, Foundry Anvil. Self-hosted runner pays off.

Full ecosystem in CI enables meaningful test coverage

Most agent demos stop at unit tests. Our agent writes unit, integration, AND E2E tests against production-like dependencies.

Bug investigation flow is a forcing function

Without the Investigation Report, developers guess. With it, they have error patterns, traces, metrics, and a ranked root cause hypothesis.

From Ticket to PR in One Workflow Run

The workflow runs from Jira ticket to GitHub PR without human intervention. The agent:

✅ Reads ticket and Execution Plan
✅ Implements changes
✅ Writes unit, integration, and E2E tests
✅ Self-reviews
✅ Runs internal review (7 perspectives)
✅ Addresses review findings
✅ Pushes branch and creates PR
✅ Transitions Jira to "In Review"

Your team reviews the PR, not the ticket.

What Breaks When You Try This?

Environment setup — You need a self-hosted runner with services. GitHub-hosted runners won't cut it.
Agent commit quality — Skills and deliverables are mandatory. Freeform prompts produce junk.
Reviewer trust — Do your reviewers trust the agent less than junior devs? After 20 PRs, ours stopped caring who wrote the code.

This is Part 3 of our Jira-agent automation series. Part 4 (coming soon) will cover PR review automation and merge policies.

Mitko Tschimev — Technical lead at 1inch. I write about engineering leadership, architecture, and automation.

X: https://x.com/MTschimev
LinkedIn: https://linkedin.com/in/mitko-tschimev

DEV Community

AI Agents Write Production Code in CI — Full Test Pyramid, Multi-Perspective Review, and Bug Investigation

AI Agents Write Production Code in CI — Full Test Pyramid, Multi-Perspective Review, and Bug Investigation

Why CI-Based Agents?

The Agent Chain

1. Implement Agent

2. Internal Review Agents

3. Synthesize Review Results

4. Fix Agent

5. Push PR and Transition Jira

Bug Investigation Flow

What We Learned

Agent-written PRs need structure

Review loops need synthesis

Environment setup is heavyweight

Full ecosystem in CI enables meaningful test coverage

Bug investigation flow is a forcing function

From Ticket to PR in One Workflow Run

What Breaks When You Try This?

Top comments (0)