mgd43b for AgentEnsemble

Posted on Apr 12 • Originally published at agentensemble.net

Quality Gates on Agent Pipelines: Phase Review and Feedback Injection in Java

#java #ai #agents #architecture

Most agent pipelines treat quality as a post-run concern. The pipeline runs, you look at the output, you decide if it's acceptable. If not, you re-run the whole thing or manually patch the result. That approach gets harder to sustain as pipelines grow in complexity and the cost of a bad output increases.

The question worth asking is: where in the pipeline should quality enforcement sit? And when a phase produces inadequate output, how should the feedback get back to the tasks responsible?

PhaseReview answers both questions. It attaches a quality gate to any phase, fires after the phase completes, and based on the reviewer's decision either approves the output, triggers a retry with injected feedback, pushes the work back to a predecessor phase, or rejects the pipeline entirely. The review task is itself a first-class AgentEnsemble task -- AI-backed, deterministic handler, or human-in-the-loop -- and the framework handles feedback injection and retry orchestration automatically.

Attaching a Review to a Phase

The minimal setup attaches a reviewer task to a phase via PhaseReview.of(reviewTask):

Task draftReport = Task.builder()
    .description("draft a summary report of the quarterly results")
    .chatModel(model)
    .build();

Task reviewReport = Task.builder()
    .description("""
        Review the report for completeness and accuracy.
        Output exactly one of:
        APPROVE
        RETRY:<specific feedback for the author>
        REJECT:<reason>
        """)
    .chatModel(model)
    .context(List.of(draftReport))
    .build();

Phase reporting = Phase.builder()
    .name("reporting")
    .tasks(List.of(draftReport))
    .review(PhaseReview.of(reviewReport))
    .build();

Ensemble.builder()
    .phases(List.of(reporting))
    .chatModel(model)
    .build()
    .run();

The review task reads the draft output via .context(List.of(draftReport)). After the phase tasks complete, the framework runs the reviewer, parses its decision, and acts on it. draftReport is retried with feedback. The reviewer never appears in the phase task list -- it runs as a gate after the phase completes.

The Four Decisions

A review task produces one of four outcomes:

Decision	Behavior
`APPROVE`	Phase output is accepted; pipeline continues
`RETRY:<feedback>`	Phase tasks are re-run; feedback injected into prompts
`RETRY_PREDECESSOR:<feedback>`	Predecessor phase re-runs first, then this phase re-runs
`REJECT:<reason>`	Pipeline fails with a `PhaseReviewRejectionException`

The APPROVE/RETRY/REJECT parsing is handled by the framework. RETRY_PREDECESSOR is useful when the inadequate output in the current phase is caused by incomplete or incorrect work in an earlier phase.

Feedback Injection

When a retry is triggered, the framework injects reviewer feedback directly into the prompt of the tasks being retried. The injected section looks like this:

[original task instructions here]

## Revision Instructions (Attempt 2)

The report is missing Q3 comparisons and does not address margin compression. Expand the analysis section with specific numbers.

Previous output:
[output from attempt 1]

The task sees the original instructions, the feedback, and its prior output. No changes to the task definition are required. The injection happens entirely in the prompt construction layer.

For attempt 3 and beyond, the section header updates to Attempt 3, and both the latest feedback and the most recent prior output are included.

Reviewer Types

AI Reviewer

The reviewer task is an AI-backed agent that reads the phase output via context() and generates a decision:

Task aiReviewer = Task.builder()
    .description("""
        You are a quality reviewer for financial reports.
        Read the draft report provided in context.
        Check for: completeness, accurate numbers, professional tone.
        Output exactly: APPROVE, RETRY:<specific instructions>, or REJECT:<reason>.
        """)
    .chatModel(model)
    .context(List.of(draftReport))
    .build();

The reviewer's LLM call is separate from the main phase tasks. You can use a different model for the reviewer if appropriate.

Deterministic Reviewer

For rule-based quality checks, a deterministic handler reads the phase output and returns a decision string:

Task qualityCheck = Task.builder()
    .description("programmatic-quality-check")
    .context(List.of(draftReport))
    .handler(ctx -> {
        String output = ctx.contextOutputs().get(0).getRaw();
        if (output.length() < 500) {
            return ToolResult.success("RETRY: output is too short -- expand each section to at least two paragraphs");
        }
        if (!output.contains("Q3") || !output.contains("Q4")) {
            return ToolResult.success("RETRY: report must include both Q3 and Q4 data");
        }
        return ToolResult.success("APPROVE");
    })
    .build();

Deterministic reviewers are useful when the quality criteria are precise and don't require LLM judgment.

Human Reviewer

For steps that need a human sign-off, the human review API blocks until the reviewer responds:

Task humanGate = Task.builder()
    .description("human-review-gate")
    .context(List.of(draftReport))
    .review(Review.required())
    .build();

The human reviewer sees the task output and enters APPROVE, RETRY with feedback, or REJECT in the console (or a custom reviewer UI). This integrates with the same retry loop.

Controlling Retry Limits

By default, PhaseReview.of(reviewTask) allows up to 2 self-retries and 2 predecessor retries. Both are configurable:

Phase reporting = Phase.builder()
    .name("reporting")
    .tasks(List.of(draftReport))
    .review(PhaseReview.of(reviewReport, 3))
    .build();

Or with the builder for full control:

Phase reporting = Phase.builder()
    .name("reporting")
    .tasks(List.of(draftReport))
    .review(PhaseReview.builder()
        .task(reviewReport)
        .maxRetries(3)
        .maxPredecessorRetries(2)
        .build())
    .build();

When the retry limit is exhausted, the framework treats the phase as failed and throws. The pipeline does not silently accept a low-quality output when retries are exhausted.

Predecessor Retry

When the reviewer determines that the current phase's inadequate output is caused by problems in an earlier phase, it can request a predecessor retry:

Task analysisReviewer = Task.builder()
    .description("""
        Review the analysis. If the data from the ingestion phase appears incomplete or incorrect,
        output: RETRY_PREDECESSOR:<what needs to be fixed in data ingestion>
        If the analysis itself is the problem, output: RETRY:<what needs to be revised>
        If acceptable, output: APPROVE
        """)
    .chatModel(model)
    .context(List.of(analysisTask))
    .build();

Phase ingestion = Phase.builder()
    .name("ingestion")
    .tasks(List.of(ingestTask))
    .build();

Phase analysis = Phase.builder()
    .name("analysis")
    .tasks(List.of(analysisTask))
    .after(ingestion)
    .review(PhaseReview.of(analysisReviewer))
    .build();

If the reviewer outputs RETRY_PREDECESSOR:, the framework re-runs the ingestion phase with the feedback injected into ingestion tasks, then re-runs the analysis phase. The review fires again after the second analysis completes.

The predecessor is the phase declared in .after(). Predecessor retry does not cascade further back automatically.

Accessing Review Results

After the ensemble completes, review results are available in the phase output:

EnsembleOutput result = ensemble.run();

PhaseOutput reportingOut = result.getPhaseOutputs().get("reporting");
ReviewRecord reviewRecord = reportingOut.reviewRecord();

System.out.println("Attempts: " + reviewRecord.attemptCount());
System.out.println("Decision: " + reviewRecord.finalDecision());

This is useful for logging, auditing, or downstream branching based on whether the output was approved on the first attempt or required iterations.

Tradeoffs

Review tasks add LLM calls to the pipeline. For AI reviewers, each retry cycle adds a reviewer call plus the retried task calls. In cost-sensitive pipelines, deterministic reviewers can enforce the most common criteria without adding model calls.

Feedback injection is prompt-based. Tasks see the feedback as text in their prompt, not as a structured signal. The quality of the retry depends on how well the reviewer communicates what needs to change.

Predecessor retry re-runs the entire predecessor phase, not individual tasks. If the predecessor phase is expensive, predecessor retries can be costly. Design predecessor phases with this in mind if predecessor retry is expected.

The review task reads phase outputs via context() declarations -- the same mechanism any task uses. Context resolution works across retries; the framework rebuilds task identity consistently so the reviewer always reads the most recent attempt's output.

The guide at agentensemble.net/guides/phase-review/ covers the full API including custom reviewer implementations and review event callbacks. The example source is runnable from the repository.

AgentEnsemble is open-source under the MIT license.

DEV Community