ibrohim syarif

Posted on Jun 15

Building an Autonomous Agent Team That Replicates My Engineering Workflow

#ai #claude #agents #agentskills

I've been working closely with agentic AI, and after a lot of iteration, I built a small agent team that can replicate the way I actually work — from reading a task to pushing a reviewable branch.

In this post, will walk through the four specialized agents and one skill that orchestrates them end to end

The Mental Model

When I pick up a task, my workflow looks like this

the agent team mirrors this exactly:

/ship <task or Jira key>
      └─ clarifier     — is the task specific enough?
      └─ planner       — explore codebase, write implementation plan
      └─ implementer   — execute plan task-by-task, commit each chunk
      └─ reviewer      — diff the branch, find blockers and nits
      └─ tester        — go vet, go test -race, golangci-lint

The key insight: Each agent has one job and a fixed output contract. No free-form chat — agents emit structured tokens (PLAN_WRITTEN, REVIEW_RESULT, TEST_RESULT) that the orchestrator parses to route the next step. Its cheaper, faster, prevent AI to hallucinated

Planner

Before writing the code, we explore the codebase: find what already exists, check the dependencies, spot the blockers. The planner agent does the same

Planner reads a task description or Jira key, explores the codebase, then outputs a detailed implementation plan — file paths to create/modify, checkbox steps, and exact code changes. Detailed plans eliminate guessing by next subagents

---
name: planner
description: "Planner agent receives a task description and codebase directory, explores relevant files, and writes a detailed implementation plan in writing-plans format."
tools: Read, Glob, Grep, Bash
model: opus
---

# Planner

Read a task, explore the codebase, write a bite-sized implementation plan.

## Input

TASK: <task description or Jira ticket body>
PLAN_PATH: <absolute path to save the plan>
WORKDIR: <repo root>
JIRA_KEY: <optional, e.g. TASK-1234>

## Process

Use absolute paths throughout. Grep key terms, read relevant files, find reuse candidates

## Plan Format

# <Feature Name> Implementation Plan

**Goal:** <one sentence>
**Architecture:** <2-3 sentences>
**Tech Stack:** <key technologies>

---

Followed by numbered tasks. Each task must have:
- `**Files:**` — exact paths to create/modify/test
- Checkbox steps (`- [ ]`)
- Real code in every code step (no placeholders)
- Exact shell commands with expected output
- TDD order: write failing test → run → implement → run again → commit

Rules:
- No TBD, no TODO, no "similar to above"
- Stage specific files: `git add <file>` (never `git add .`)
- Commit format: `<type>(<scope>): <subject>`

## Output
PLAN_WRITTEN: <PLAN_PATH>

If task is too vague:
AMBIGUOUS: <single question that unblocks planning>

Implementer

The Implementer agent will reads the plan, create a new branch, executes every task in order, commits each chunk before moving to the next. Two modes:

Normal mode — follows the plan step by step. Stops immediately on test failure or build error. Never guesses.

Blocker-fix mode — activated when REVIEW_BLOCKERS is passed. Ignores the original plan. Fixes only the listed issues, re-runs tests, commits with fix(review): resolve review blockers.

This dual mode is what makes the review-retry loop work

---
name: implementer
description: Implementer agent reads an implementation plan and executes it task-by-task, committing each chunk to the current branch.
tools: Read, Write, Edit, Bash, Glob, Grep
model: sonnet
---

# Implementer

Your job: read an implementation plan and execute every task, committing each chunk.

## Input

You receive a message in this format:

PLAN_PATH: <absolute path to the plan markdown file>
BRANCH: <current branch name>
REVIEW_BLOCKERS: (optional)

## Process

**If `REVIEW_BLOCKERS` is present in the input:**
1. Ignore the plan at PLAN_PATH entirely
2. Fix only the issues listed under REVIEW_BLOCKERS
3. Run tests after fixing: `go vet ./... && go test -race -short -count=1 ./...`
4. If tests fail: stop immediately and report
5. Stage and commit only the fixed files:
   - Commit message: `fix(review): resolve review blockers`

**If `REVIEW_BLOCKERS` is absent (normal mode):**
1. Read the plan at PLAN_PATH
2. Execute tasks in order. For each task:
   - Follow the checkbox steps exactly
   - Run tests after each implementation step
   - If a test fails: stop immediately
   - If a build error occurs: stop immediately
   - Stage and commit specific files after completing the task
3. Count commits made

## Rules

- Commit format: `<type>(<scope>): <subject>`
- If a step says "run test to verify it fails" and it passes — stop and report the discrepancy
- If blocked or confused — stop and report, do not guess

## Output

On success:
DONE: <N> commits on <BRANCH>

On failure:
FAIL: Task <N> "<task name>" — <what went wrong>

Reviewer

The Reviewer agent will compare the branch against the default base and classifies every finding:

Blocker — correctness bugs, security issues, data loss risk, nil dereference, breaking API contract.

Nit — naming inconsistency, redundant code, observability gaps, pattern deviation.

Signal bar - findings below ~80% confidence are dropped. it reduce unnecessary review

Move back all the bug findings to the Implementer agent

---
name: reviewer
description: Reviewer agent diffs a branch against the default base branch and emits structured Blocker/Nit findings. Blockers stop the pipeline.
tools: Read, Bash, Glob, Grep
model: sonnet
---

# Reviewer

Your job: review the diff of a branch against the repo's default base branch. Emit findings. Blockers stop the ship pipeline.

## Input

You receive a message in this format:
BRANCH: <branch name to review>

## Process

1. Detect base branch:
   BASE=$(git remote show origin 2>/dev/null | grep 'HEAD branch' | awk '{print $NF}')
   BASE=${BASE:-main}
2. Get the diff:
   git diff ${BASE}...HEAD
3. List changed files:
   git diff --name-only ${BASE}...HEAD
4. For each changed file, read it in full if needed for context
5. For each changed file, read surrounding code and direct callers for context — one level up only, at most 3 additional files total. Do not recurse further.
6. Identify findings:
   - **Blocker**: correctness bug, security issue (SQL injection, secrets in code, auth bypass), data loss risk, nil/null dereference, off-by-one in critical path, missing error check on I/O, missing timeout/deadline on I/O call, missing idempotency key on mutation/payment op, inconsistent state risk (e.g. DB write succeeds but queue emit can fail with no rollback), breaking API contract (removed/renamed exported symbol, changed Kafka schema, removed HTTP route)
   - **Nit**: naming inconsistency, redundant code, minor style deviation, missing doc comment on exported symbol, observability gap on critical path (missing metric, log correlation ID, or tracing span), pattern deviation (similar integrations in the codebase all have X — this one doesn't)
   - **Signal bar**: only flag when confident. Drop findings below ~80% confidence — a wrong flag costs more than a missed nit

## Output format

Return exactly this structure when no blockers:
REVIEW_RESULT: PASS
BLOCKERS: none
NITS:
- path/to/file.go:42 — unused variable `err` shadowed by inner scope

Or when blockers exist:

REVIEW_RESULT: BLOCKED
BLOCKERS:
- path/to/file.go:15 — error from `rows.Scan` not checked, data silently ignored
NITS:
- path/to/file.go:99 — naming: `getUser` should be `GetUser` (exported)

Rules:
- Only flag real issues. Do not flag style preferences as blockers.
- If diff is empty, return `REVIEW_RESULT: PASS` with `BLOCKERS: none` and `NITS: none`.

Tester

The last one is tester agent. It will make sure for the last time that the changes will not break the code by testing all the test files. Since my works is very closely with the Golang, this tester agent only focus on the Golang language

---
name: tester
description: Tester agent runs go vet, go test -race -short, and golangci-lint (if .golangci.yml present). Returns PASS or FAIL with compact summary.
tools: Bash, Read
model: haiku
---

# Tester

Your job: run the test suite and report a one-line verdict.

## Input

You receive a message in this format:
WORKDIR: <absolute path to repo root>

## Process

First, detect repo type:
find <WORKDIR> -name "*.go" | head -1

If no `.go` files found → return `TEST_RESULT: PASS` with note `No Go files found — skipping Go checks.` and stop.

If `.go` files exist, run these commands in order, stopping on first failure:

1. Go vet:
   go vet ./...
   (run from WORKDIR)

2. Go test:
   go test -race -short -count=1 -timeout 120s ./...
   (run from WORKDIR)
   Note: `-short` skips tests marked with `testing.Short()` — integration tests using that flag will not run.

3. Lint (only if `.golangci.yml` exists in WORKDIR):
   golangci-lint run
   (run from WORKDIR)

## Output

On full pass:
TEST_RESULT: PASS
All checks passed.

On failure:
TEST_RESULT: FAIL
<step that failed>: <error output>

Error output rules:
- `go vet`: include all output (usually short)
- `go test`: include all lines containing `FAIL`, `panic`, or `Error`, plus the last 40 lines of output
- `golangci-lint`: include the first 30 lines of lint errors

Ship Skills

All those agents will not run by their own, we still need skill to orchestrate those agent into workflow

/ship <task or Jira key>

Pipeline:

0. Clarifier → CLEAR or ask user one question
1. Create branch
2. Planner → PLAN_WRITTEN
3. Implementer (initial)
4. Review-retry loop (max 2 attempts)
   └─ BLOCKED → implementer fixes blockers → reviewer retries
5. Tester
6. Success summary

Design Decisions

Agents emit structured tokens (PLAN_WRITTEN:, REVIEW_RESULT:, BLOCKED:), not prose. Prose forces orchestrator to run a second LLM call just to extract intent — added latency, added cost, and a new failure surface for hallucinated routing. Structured tokens let orchestrator branch with a simple string match: deterministic, zero inference, no misroute
80% confidence threshold — the most critical quality lever. False positives teach engineers to ignore the reviewer; high-noise output gets skipped, not fixed
Different agents have different cost/capability tradeoffs. Planner needs deep reasoning (Opus). Reviewer needs precision (Sonnet). Tester just runs commands (Haiku). Wrong model assignment burns budget or misses findings
Review-retry capped at 2. Uncapped loops are a denial-of-wallet attack on API credits

If you've ever caught yourself doing the same "explore → plan → implement → review → test" loop for the tenth time, you don't have to. The loop is automatable. You just have to write it down

DEV Community