DEV Community: Mitko Tschimev

What Reviewing AI-Generated PRs Actually Looks Like (Part 4)

Mitko Tschimev — Sun, 17 May 2026 10:01:01 +0000

What Reviewing AI-Generated PRs Actually Looks Like (Part 4)

In Parts 1-3, I covered how we built AI agents that write production code: the architecture, workflow, and CI implementation.

Part 4 is what happens after the agent pushes a PR.

This isn't theory. Here's what we learned about code review, trust, and what works (and what doesn't).

The Review Shift

Before agents

Code review covered:

Syntax and style
Lint/format compliance
Test coverage
Logic correctness
Architecture fit

Reviewers asked: "Did you check coverage for the new code?" "Why are the E2E tests failing?" "Do you follow all best practices and principles?" Multiple review cycles per PR.

With agents

Agent PRs arrive pre-reviewed:

Lint + format already run
Tests written (unit + integration + E2E when feasible)
Self-review checklist passed
7-perspective internal review completed
Inline comments (max 10) from internal review addressed

Reviewers focus on:

Edge cases (race conditions, payment retries)
Business logic (is this the right approach?)
Architecture boundaries (should this be a separate service?)
Rollback risk (what breaks if we revert?)

Review cycles dropped significantly.

Why Human Review Is Mandatory

Agents can't be fully trusted yet

We're working on processes to guarantee it, but we're not there yet.

Even after passing internal review (7 perspectives, self-review, full test suite), agent PRs need human eyes.

What agents miss:

Edge cases — Race conditions, payment retry logic, cascading failures. Agents write the happy path. Reviewers ask "what breaks this?"
Architecture patterns — We use repository pattern. Agents sometimes try direct DB queries. Reviewers catch violations before merge.
Clean code principles — Naming, abstraction levels, SRP violations. Agents optimize for "works" not "maintainable."
Business context — "This works, but is this the right solution?" Agents implement the ticket. Reviewers question the approach.

The review process is a living document

We don't treat agent review as static. Every week, the team updates:

.cursorrules — Add architecture patterns agents violated
implement-SKILL.md — Expand review checklist based on what reviewers flag

Example evolution:

Week 1: Agent added hardcoded API key → added to checklist
Week 4: Agent skipped rate limiting on new endpoint → added to checklist
Week 8: Agent wrote vague PR descriptions → updated skill with required fields

The pattern: Reviewers teach the agent. Agent self-checks improve over time. Fewer review comments on repeat issues.

This isn't "set and forget." It's continuous improvement. Agents learn from human feedback, encoded into rules and checklists.

Trust Evolution

Phase 1: Skepticism (Early PRs)

Every agent PR got:

Line-by-line review
"Did the agent understand the requirement?" questions
Manual re-testing in local env

Team treated agent code as inherently suspicious.

Phase 2: Calibration (Mid-Volume)

Team started pattern-matching:

Agent PRs with good .pr-description.md → trust faster
Agent PRs that passed internal review with clean → skim, approve
Agent PRs with changes_required from security/architecture → deep review

Trust built on consistency, not individual PRs.

Phase 3: Routine (Later PRs)

Nobody asks "Did an agent write this?"

They ask: "Does this solve the problem?"

Review focuses on substance, not authorship.

What Changed in Practice

Agent PRs consistently arrive cleaner than human PRs:

Zero lint/format fixes required — Agent runs them before pushing
Fewer revision cycles — Agent handles mechanical issues internally
Faster reviews — Reviewers skip style checks, focus on logic
Similar (or higher) merge rates — Quality isn't worse; often better

The shift: reviewers spend less time on mechanical checks, more on architectural fit.

What Still Needs Humans

Agents handle:

✅ Syntax, style, format
✅ Test coverage (unit/integration/E2E)
✅ Self-review checklist (defined in implement-SKILL.md)
✅ Internal review (7 perspectives)

Humans handle:

❌ Edge case discovery (race conditions, retries, cascading failures)
❌ Product decisions (UX, error messages, feature scope)
❌ Architecture boundaries (when to split a service, introduce a queue)
❌ Rollback risk assessment (what breaks if we revert this?)

The shift: Agents don't replace reviewers. They shift what reviewers spend time on.

PR Description Quality Matters

Agent PRs include .pr-description.md with:

Title (H1) — becomes GitHub PR title, follows conventional commits
What changed — file-level summary
Why — links to Jira ticket + Confluence Execution Plan
Testing approach — which tests run, manual steps if needed
Rollback plan — what to revert if this breaks production

Why this matters

Reviewers read the PR description before opening the diff.

Good description → trust the approach → skim implementation → approve faster.

Bad description → question the approach → deep dive every file → request changes.

Agent-written descriptions are consistent. Human-written descriptions vary wildly.

Review Checklist Changes Over Time

We iterated the review checklist (in implement-SKILL.md) based on what reviewers flagged most:

v1 (initial)

Code compiles
Tests pass
No lint errors

v2 (after 20 PRs)

Added: "Check for hardcoded secrets or API keys"
Added: "Verify error messages are user-friendly"

v3 (after 50 PRs)

Added: "Ensure new endpoints have rate limiting"
Added: "Check Confluence plan matches implementation"

v4 (current)

Added: "Verify observability: logs, metrics, traces"
Added: "Check for breaking changes (DB schema, API contracts)"

The pattern: Reviewers teach the agent what to self-check by updating the checklist. Over time, fewer review comments on those topics.

What Breaks (and How We Fixed It)

Problem 1: Agent ignores architectural constraints

Example: Agent added a direct DB query in the API layer (we use repository pattern).

Fix: Added to .cursorrules:

Repository pattern is mandatory. Never write `prisma.query()` in controllers or routes.

Result: Hasn't happened since.

Problem 2: Agent writes tests that pass but don't test the right thing

Example: Agent mocked the entire service layer, so integration test didn't actually hit the database.

Fix: Updated implement-SKILL.md:

Integration tests must use real DB (testcontainers). No service-layer mocks in integration tests.

Result: Agent now writes meaningful integration tests.

Problem 3: PR descriptions were too vague

Example: "Fixed the bug" (no context on what broke or how it's fixed).

Fix: Added to implement-SKILL.md:

.pr-description.md MUST include:
- Root cause (1-2 sentences)
- What changed to fix it
- How to verify the fix

Result: PR descriptions now include enough context for reviewers to skip reading the full Jira ticket.

Key Takeaways

Human review is mandatory. Agents miss edge cases, architecture violations, and clean code principles. Review isn't optional.
Review time improves. Reviewers skip mechanical checks and focus on substance.
Revision cycles drop. Agent handles lint/format/test internally before pushing.
Trust builds gradually. Eventually team stops caring who wrote the code.
The review process is a living document. Update .cursorrules and implement-SKILL.md weekly. Agents learn from reviewer feedback.
PR descriptions matter more than you think. Good description = faster approval.

What's Next

This concludes our 4-part series on AI task automation:

Part 1: Architecture (JIRA webhooks → GitHub → Cursor agents)
Part 2: 5-stage workflow (human-in-the-loop transitions)
Part 3: CI implementation (agent chain, 7-perspective review, bug investigation)
Part 4: Code review and trust evolution (this post)

If you're building AI automation for your team, the lesson is:

Agents don't replace reviewers. They shift what reviewers spend time on.

Design for that shift. Update your checklists weekly. Let trust build gradually. And never skip human review—agents can't catch everything.

Mitko Tschimev — Technical lead at 1inch. I write about engineering leadership, architecture, and automation.

X: https://x.com/MTschimev
LinkedIn: https://linkedin.com/in/mitko-tschimev

Have you reviewed AI-generated PRs? What's the hardest part—trust, quality, or something else? Drop a comment.

AI Agents Write Production Code in CI — Full Test Pyramid, Multi-Perspective Review, and Bug Investigation

Mitko Tschimev — Wed, 08 Apr 2026 15:10:34 +0000

AI Agents Write Production Code in CI — Full Test Pyramid, Multi-Perspective Review, and Bug Investigation

Your team gets a refined Jira ticket with a full Execution Plan. Who implements it?

Not you. A CI agent does.

We built a GitHub Actions workflow that takes a Jira ticket and delivers a ready-to-review pull request. No human writes the code. The agent does. Then it reviews its own work from seven perspectives before pushing.

This is Part 3 of our Jira-agent automation series. Parts 1–2 covered ticket creation and refinement. Part 3 explains how we delegate implementation to Cursor CLI agents in CI.

Why CI-Based Agents?

Most agent demos run on a developer's laptop. That breaks when you need:

Secrets — Jira, Confluence, API tokens
Integration tests — Postgres, Redis, Kafka, ClickHouse
E2E tests — Playwright against a running stack
Environment parity — same as production

We run on a self-hosted runner with the full ecosystem. The agent doesn't just write unit tests — it validates against production-like dependencies.

Our CI environment includes:

Services: Postgres 15, Redis, ClickHouse 23.3, Kafka (Apache Kafka in KRaft mode)
Tooling: Typesense for search, NGINX proxy for internal APIs, Foundry Anvil for blockchain minting E2E tests
Test pyramid: Unit tests (Jest/Go), integration tests (real databases), E2E tests (Playwright)

When the agent writes code, it can run the full test suite immediately. No "works on my machine" excuses.

The Agent Chain

The integration job in our GitHub Actions workflow orchestrates multiple agents:

1. Implement Agent

Model: Cursor CLI with composer-2-fast

Inputs:

.agent-context.md — Jira ticket key, summary, description, issue type
Execution Plan from Confluence (fetched in CI, appended to context)
PR review comments (if the branch/PR already exists from a re-run)

Deliverables (from implement-SKILL.md):

Implementation — Analyze ticket and Execution Plan, implement changes, run lint and tests
Tests — Unit, integration, or E2E (decide based on feasibility)
Documentation — Create or update CLAUDE.md in each affected directory
Self-review gate — Run git diff, check against code review checklist. Do NOT commit until review passes.
PR description — Write .pr-description.md with H1 title (becomes GitHub PR title) and body
Commit — Conventional format: feat(DPA2-1234): description or fix(DPA2-1234): description for bugs

The agent does NOT push. The workflow handles that after internal review.

2. Internal Review Agents

After the implement agent commits, the workflow builds pr-context/ (diff.patch, files.json, description.md).

Perspective classification: Based on changed files, the workflow determines which review perspectives are active:

Frontend — if apps/*-webapp/* or libs/ui-*/* changed
Backend — if apps/* (non-webapp) or libs/shared-nestjs/* or libs/go-*/* changed
Architecture — if backend code changed
Security — if app code changed AND (backend code OR security-relevant files)
Observability — if app code changed AND observability-relevant files
QA — if any app code changed
PO — if any Jira ticket is linked

Model: claude-4.6-opus-high

Each perspective agent writes result-<perspective>.json:

{
  "perspective": "Senior Backend Engineer",
  "action": "changes_required" | "clean",
  "summary": "One paragraph.",
  "comments": [
    {
      "path": "apps/portal-backend/src/service.ts",
      "line": 42,
      "side": "RIGHT",
      "body": "[Backend] [Critical] Missing error handling for Prisma query"
    }
  ]
}

Rules:

Max 3 inline comments per perspective
Only [Critical] and [Major] get inline comments
Findings only — no praise

3. Synthesize Review Results

The workflow merges inline comments from all perspectives:

Deduplicate by path + line + side
Sort by severity ([Critical] first, then [Major])
Merge tags when multiple perspectives flag the same line
Cap at 10 inline comments

Output: synthesized-comments.json and .review-feedback.md.

If any perspective returned action: "changes_required", the workflow runs the fix agent.

4. Fix Agent

Model: Cursor CLI with composer-2-fast

Inputs:

.review-feedback.md — perspective summaries + synthesized inline comments

Deliverables:

Address each finding
Run pnpm nx format:write, lint, tests
Commit with fix(DPA2-XXXX): address code review findings

5. Push PR and Transition Jira

The workflow:

Applies pnpm nx format
Pushes the branch
Creates a PR with title/body from .pr-description.md
Dispatches build-pr.yaml workflow (full CI with E2E tests)
Transitions Jira to "In Review"

Bug Investigation Flow

For Bug tickets, the refine agent (runs before implement) gets special treatment.

If the issue type is Bug AND the Grafana MCP server is configured, the refine agent:

Reads investigate-SKILL.md
Calls Grafana MCP tools:
- list_datasources — get Loki, Prometheus, Tempo UIDs
- query_loki_logs — fetch error logs
- query_prometheus — error rates, latency
- tempo_traceql-search — find error traces
Creates a Confluence Investigation Report with:
- Error log analysis (categorized patterns with sample log lines)
- Metric analysis (error rate trends, latency changes, Grafana Explore deeplinks)
- Trace analysis (representative error traces showing failure path)
- Root cause hypothesis (ranked with confidence levels)
- Reproduction steps (derived from traces and logs)
- Recommended fix strategy
Links the Investigation Report to the Jira ticket
Writes investigation-report-url.txt for the refine skill to reference in the Execution Plan

Evidence-based fixes, not guesses.

What We Learned

Agent-written PRs need structure

Skills (implement-SKILL.md, fix-review-SKILL.md) guide agents through deliverables. Freeform prompts produce junk.

Review loops need synthesis

Seven separate reviews → one merged feedback doc. Deduplicated, tagged by perspective, capped at 10 inline comments. Users don't read seven JSON files. They read one markdown summary.

Environment setup is heavyweight

Postgres, Redis, Kafka, ClickHouse, typesense, NGINX proxy, Foundry Anvil. Self-hosted runner pays off.

Full ecosystem in CI enables meaningful test coverage

Most agent demos stop at unit tests. Our agent writes unit, integration, AND E2E tests against production-like dependencies.

Bug investigation flow is a forcing function

Without the Investigation Report, developers guess. With it, they have error patterns, traces, metrics, and a ranked root cause hypothesis.

From Ticket to PR in One Workflow Run

The workflow runs from Jira ticket to GitHub PR without human intervention. The agent:

✅ Reads ticket and Execution Plan
✅ Implements changes
✅ Writes unit, integration, and E2E tests
✅ Self-reviews
✅ Runs internal review (7 perspectives)
✅ Addresses review findings
✅ Pushes branch and creates PR
✅ Transitions Jira to "In Review"

Your team reviews the PR, not the ticket.

What Breaks When You Try This?

Environment setup — You need a self-hosted runner with services. GitHub-hosted runners won't cut it.
Agent commit quality — Skills and deliverables are mandatory. Freeform prompts produce junk.
Reviewer trust — Do your reviewers trust the agent less than junior devs? After 20 PRs, ours stopped caring who wrote the code.

This is Part 3 of our Jira-agent automation series. Part 4 (coming soon) will cover PR review automation and merge policies.

Mitko Tschimev — Technical lead at 1inch. I write about engineering leadership, architecture, and automation.

X: https://x.com/MTschimev
LinkedIn: https://linkedin.com/in/mitko-tschimev

The Five Stages That Make AI Task Automation Work (Part 2)

Mitko Tschimev — Mon, 06 Apr 2026 15:35:07 +0000

In Part 1, I walked through the architecture: JIRA webhooks → GitHub → Cursor agent. Today I'm covering the process — the five stages that turn a rough ticket into a merged PR without losing human oversight.

Most teams try "ticket → AI → code" and it breaks. The agent misunderstands the requirement, or devs lose trust after one bad PR. The fix isn't better prompts — it's structured handoffs.

The Five Transitions

We built this with two agent stages and three human review gates. Agents never jump from a vague ticket straight to implementation.

1. Refinement (human)

A team member (BA, PO, or dev) writes the initial ticket. It can be rough:

"Add error handling for the payment webhook timeout case."

Status: Refinement. Next step: agent formatting.

2. Agent: Refine

A Cursor agent reads the ticket and generates:

Acceptance criteria (pass/fail conditions)
Definition of Done (checklist: tests, docs, deploy)
How to test (manual or automated outline)
Implementation plan (file-by-file breakdown, like Cursor's plan mode)

Everything saves to a Confluence page linked from JIRA. The agent clarifies what we're building, but doesn't write code yet.

Ticket moves to: Plan: Review.

3. Plan: Review (human)

Team reviews the Confluence plan. If something's off — wrong approach, missing edge case, unclear criteria — we comment in JIRA or Confluence.

Then we move the ticket back to Agent: Refine. The agent reads feedback, updates the plan. This loop can run multiple times.

Once approved, ticket moves to: Agent: Implement.

4. Agent: Implement

Agent writes code using the Confluence plan. Runs tests (if configured), opens a pull request.

PR links to the JIRA ticket and Confluence plan. Reviewers see the requirement, approach, and code changes — all connected.

If the PR needs changes, devs comment in GitHub and move the ticket back to Agent: Implement. Agent reads PR feedback, updates code, pushes a new commit.

5. Review (human)

Standard code review. If it's good, merge. If not, send back to step 4 with feedback.

After merge, ticket moves to Test (outside this workflow), then Done.

Why This Works

Agents work from approved plans. They don't guess. When they get it wrong, they iterate based on structured feedback.

Humans review before code is written. Step 3 (plan review) catches bad approaches early. A 10-minute review saves hours of rework.

Feedback loops are cheap. Sending a ticket back to "Agent: Refine" or "Agent: Implement" takes minutes. Agent re-runs with context. No senior dev escalation needed.

Trust builds gradually. Start with small tickets. Expand to complex work as the team gains confidence.

What We Learned

Rovo didn't cut it. Atlassian's AI tooling was unusable for our workflow. Cursor agents + GitHub gave us the control we needed.
Plan review is not optional. Skipping step 3 always backfires. It's the cheapest gate and the highest ROI.
PR comments > ticket comments for implementation feedback. Devs already write PR comments. The agent reads them natively. No translation layer.
Confluence as the plan artifact is key. JIRA description fields are too limited. Confluence gives us version history, inline comments, and space for a real implementation roadmap.

What's Next

In Part 3, I'll dive into the Agent: Implement stage — how we configure the Cursor agent, the repo structure (rules, skills, agents), and how it generates PRs that don't need heavy rewrites.

For now, if you're automating dev work with AI: Don't let agents write code until you've reviewed their plan. That one gate will save you more debugging time than any other optimization.

Find me on X or LinkedIn.

How We Built AI Task Automation That Actually Works

Mitko Tschimev — Sat, 04 Apr 2026 01:11:15 +0000

The Problem with AI Task Automation

AI-powered task automation tools promise seamless integration: understand JIRA tickets, connect to your codebase, ship features faster. For engineering teams, this sounds like the answer to constant context-switching and manual ticket translation.

In practice? They struggle with nuanced tickets, miss team conventions, and need constant supervision. Tools optimize for demos, not production complexity.

At 1inch, we built this:

JIRA automation → GitHub webhook → GitHub runner → Cursor agent (with full repo context)

The Flow

JIRA fires a webhook on specific ticket events
GitHub receives it and triggers a custom runner
Cursor agent (pre-configured with rules, skills, and context) connects to the repo
Agent reads the ticket, understands the codebase, and ships a PR

No manual handoff. No "AI tried but got confused." Just working automation.

Where the Real Magic Happens

The webhook setup is straightforward. The breakthrough is the repo design.

Cursor and Claude are only as good as the context you provide. Our Cursor agent succeeds because the repo is designed for AI collaboration:

1. Cursor Rules (`.cursorrules`)

We define:

Coding standards
Naming conventions
Testing requirements
Architectural patterns

When the agent writes code, it already knows:

What our API responses look like
How we structure components
Commit message format
Which libraries to use (and avoid)

2. Skills Directory (`skills/`)

Domain knowledge documentation:

Common patterns (auth flows, error handling)
Edge cases we've solved
Integration quirks (third-party APIs, legacy systems)

The agent references this before touching code—it's not guessing, it's using institutional knowledge.

3. Agent Context (ADRs + Architecture)

We include:

Architecture decision records
Service boundaries
Deployment constraints
Performance considerations

When evaluating a JIRA ticket, the agent understands why our system is shaped the way it is—not just what the code does.

Why This Works

AI tools try to be everything to everyone. They promise "AI that understands your business" but deliver:

Shallow codebase context
Generic responses that miss team conventions
Product demo polish, not production depth

Our approach inverts the problem: we shaped our repo to work with AI instead of waiting for vendors to catch up.

The result? Cursor agents that:

✅ Understand our architecture from day one

✅ Write code that passes review without major rewrites

✅ Learn from documented patterns instead of re-inventing solutions

Real Results

Since deploying this system:

Faster ticket-to-PR cycles: Initial PRs ship within minutes, not hours
Fewer review cycles: PRs match our conventions—reviewers focus on logic, not style
Better knowledge capture: Writing skills and rules forced us to document tribal knowledge

The system isn't perfect. The agent still needs human review. But it shifts work from "write the code" to "review and refine"—a massive productivity gain.

Key Takeaways

AI tools optimize for demos, not production complexity
The breakthrough is repo design, not webhook plumbing
Context (rules + skills + architecture) makes AI useful
Build the glue yourself—don't wait for vendors

What's Next

This is Part 1 of a series. Coming up:

Part 2: The JIRA → GitHub webhook architecture (setup, failures, wins)
Part 3: GitHub runner + Cursor agent config (rules, skills, agent setup)
Part 4: Results, trade-offs, and iterations

The Lesson

If you're building AI automation, the lesson is simple: design your systems to work with AI, not against it.

Tools optimize for breadth. You need depth. The pieces exist (GitHub, JIRA, Cursor, Claude). The missing part is context design—and that's something only you can build.

Have you built AI automation for your team? What worked (or didn't)? Drop a comment—we'd love to hear what other teams are doing.

Why Atlassian Rovo Failed Us (and What We Built Instead)

Mitko Tschimev — Fri, 03 Apr 2026 15:11:27 +0000

Why Atlassian Rovo Failed Us (and What We Built Instead)

The Problem with AI Task Automation

Atlassian Rovo promised seamless AI-driven task automation: understand JIRA tickets, connect to your codebase, ship features faster. For engineering teams, this sounded like the answer to constant context-switching and manual ticket translation.

As technical lead at 1inch, we tried it. It didn't work.

Rovo is polished in demos but struggles in production. It can't handle nuanced tickets, doesn't understand team conventions, and needs constant supervision. For a tool marketed as "AI automation," it felt like another integration to babysit.

What We Built Instead

After weeks of frustration, I stopped waiting for enterprise tools and built this:

JIRA automation → GitHub webhook → GitHub runner → Cursor agent (with full repo context)

The Flow

JIRA fires a webhook when a ticket hits "Ready for Dev" or gets updated with specs
GitHub receives it and triggers a custom runner
Cursor agent (pre-configured with rules, skills, and context) connects to the repo
Agent reads the ticket, understands the codebase, and ships a PR

No manual handoff. No "AI tried but got confused." Just working automation.

Where the Real Magic Happens

The webhook setup is straightforward. The breakthrough is the repo design.

Cursor and Claude are only as good as the context you provide. Rovo fails because it tries to be everything. Our Cursor agent succeeds because the repo is designed for AI collaboration:

1. Cursor Rules (`.cursorrules`)

We define:

Coding standards
Naming conventions
Testing requirements
Architectural patterns

When the agent writes code, it already knows:

What our API responses look like
How we structure components
Commit message format
Which libraries to use (and avoid)

2. Skills Directory (`skills/`)

Domain knowledge documentation:

Common patterns (auth flows, error handling)
Edge cases we've solved
Integration quirks (third-party APIs, legacy systems)

The agent references this before touching code—it's not guessing, it's using institutional knowledge.

3. Agent Context (ADRs + Architecture)

We include:

Architecture decision records
Service boundaries
Deployment constraints
Performance considerations

When evaluating a JIRA ticket, the agent understands why our system is shaped the way it is—not just what the code does.

Why This Beats Enterprise Tools

Rovo and similar tools try to be everything to everyone. They promise "AI that understands your business" but deliver:

Shallow codebase context
Generic responses that miss team conventions
Product demo polish, not production depth

Our approach inverts the problem: we shaped our repo to work with AI instead of waiting for vendors to catch up.

The result? Cursor agents that:

✅ Understand our architecture from day one

✅ Write code that passes review without major rewrites

✅ Learn from documented patterns instead of re-inventing solutions

Key Takeaways

Enterprise AI tools optimize for demos, not production complexity
The breakthrough is repo design, not webhook plumbing
Context (rules + skills + architecture) makes AI useful
Build the glue yourself—don't wait for vendors

What's Next

This is Part 1 of a series. Coming up:

Part 2: The JIRA → GitHub webhook architecture (setup, failures, wins)
Part 3: GitHub runner + Cursor agent config (rules, skills, agent setup)
Part 4: Results, trade-offs, and iterations

If you're building AI automation, the lesson is simple: design your systems to work with AI, not against it.

Have you tried Rovo or built your own automation? What worked (or didn't)? Drop a comment—I'd love to hear what other teams are doing.

Why Atlassian Rovo Failed Us (and What We Built Instead)

Mitko Tschimev — Fri, 03 Apr 2026 14:53:10 +0000

Why Atlassian Rovo Failed Us (and What We Built Instead)

The Problem with AI Task Automation

We tried it. It didn't work.

What We Built Instead

After weeks of frustration, I stopped waiting for enterprise tools and built this:

JIRA automation → GitHub webhook → GitHub runner → Cursor agent (with full repo context)

The Flow

JIRA fires a webhook when a ticket hits "Ready for Dev" or gets updated with specs
GitHub receives it and triggers a custom runner
Cursor agent (pre-configured with rules, skills, and context) connects to the repo
Agent reads the ticket, understands the codebase, and ships a PR

No manual handoff. No "AI tried but got confused." Just working automation.

Where the Real Magic Happens

The webhook setup is straightforward. The breakthrough is the repo design.

Cursor and Claude are only as good as the context you provide. Rovo fails because it tries to be everything. Our Cursor agent succeeds because the repo is designed for AI collaboration:

1. Cursor Rules (`.cursorrules`)

We define:

Coding standards
Naming conventions
Testing requirements
Architectural patterns

When the agent writes code, it already knows:

What our API responses look like
How we structure components
Commit message format
Which libraries to use (and avoid)

2. Skills Directory (`skills/`)

Domain knowledge documentation:

Common patterns (auth flows, error handling)
Edge cases we've solved
Integration quirks (third-party APIs, legacy systems)

The agent references this before touching code—it's not guessing, it's using institutional knowledge.

3. Agent Context (ADRs + Architecture)

We include:

Architecture decision records
Service boundaries
Deployment constraints
Performance considerations

When evaluating a JIRA ticket, the agent understands why our system is shaped the way it is—not just what the code does.

Why This Beats Enterprise Tools

Rovo and similar tools try to be everything to everyone. They promise "AI that understands your business" but deliver:

Shallow codebase context
Generic responses that miss team conventions
Product demo polish, not production depth

Our approach inverts the problem: we shaped our repo to work with AI instead of waiting for vendors to catch up.

The result? Cursor agents that:

✅ Understand our architecture from day one

✅ Write code that passes review without major rewrites

✅ Learn from documented patterns instead of re-inventing solutions

Key Takeaways

Enterprise AI tools optimize for demos, not production complexity
The breakthrough is repo design, not webhook plumbing
Context (rules + skills + architecture) makes AI useful
Build the glue yourself—don't wait for vendors

What's Next

This is Part 1 of a series. Coming up:

Part 2: The JIRA → GitHub webhook architecture (setup, failures, wins)
Part 3: GitHub runner + Cursor agent config (rules, skills, agent setup)
Part 4: Results, trade-offs, and iterations

If you're building AI automation, the lesson is simple: design your systems to work with AI, not against it.

Have you tried Rovo or built your own automation? What worked (or didn't)? Drop a comment—I'd love to hear what other teams are doing.

DEV Community: Mitko Tschimev

What Reviewing AI-Generated PRs Actually Looks Like (Part 4)

What Reviewing AI-Generated PRs Actually Looks Like (Part 4)

The Review Shift

Before agents

With agents

Why Human Review Is Mandatory

Agents can't be fully trusted yet

The review process is a living document

Trust Evolution

Phase 1: Skepticism (Early PRs)

Phase 2: Calibration (Mid-Volume)

Phase 3: Routine (Later PRs)

What Changed in Practice

What Still Needs Humans

PR Description Quality Matters

Why this matters

Review Checklist Changes Over Time

v1 (initial)

v2 (after 20 PRs)

v3 (after 50 PRs)

v4 (current)

What Breaks (and How We Fixed It)

Problem 1: Agent ignores architectural constraints

Problem 2: Agent writes tests that pass but don't test the right thing

Problem 3: PR descriptions were too vague

Key Takeaways

What's Next

AI Agents Write Production Code in CI — Full Test Pyramid, Multi-Perspective Review, and Bug Investigation

AI Agents Write Production Code in CI — Full Test Pyramid, Multi-Perspective Review, and Bug Investigation

Why CI-Based Agents?

The Agent Chain

1. Implement Agent

2. Internal Review Agents

3. Synthesize Review Results

4. Fix Agent

5. Push PR and Transition Jira

Bug Investigation Flow

What We Learned

Agent-written PRs need structure

Review loops need synthesis

Environment setup is heavyweight

Full ecosystem in CI enables meaningful test coverage

Bug investigation flow is a forcing function

From Ticket to PR in One Workflow Run

What Breaks When You Try This?

The Five Stages That Make AI Task Automation Work (Part 2)

The Five Transitions

1. Refinement (human)

2. Agent: Refine

3. Plan: Review (human)

4. Agent: Implement

5. Review (human)

Why This Works

What We Learned

What's Next

How We Built AI Task Automation That Actually Works

The Problem with AI Task Automation

The Flow

Where the Real Magic Happens

1. Cursor Rules (.cursorrules)

2. Skills Directory (skills/)

3. Agent Context (ADRs + Architecture)

Why This Works

Real Results

Key Takeaways

What's Next

The Lesson

Why Atlassian Rovo Failed Us (and What We Built Instead)

Why Atlassian Rovo Failed Us (and What We Built Instead)

The Problem with AI Task Automation

What We Built Instead

The Flow

Where the Real Magic Happens

1. Cursor Rules (.cursorrules)

2. Skills Directory (skills/)

3. Agent Context (ADRs + Architecture)

Why This Beats Enterprise Tools

Key Takeaways

What's Next

1. Cursor Rules (`.cursorrules`)

2. Skills Directory (`skills/`)

1. Cursor Rules (`.cursorrules`)

2. Skills Directory (`skills/`)

1. Cursor Rules (`.cursorrules`)

2. Skills Directory (`skills/`)