3-Agent Cross-Review

How independent reviewers catch failures that self-review structurally cannot.

ACE Engineering MethodologyQuality assurance • Adversarial reviewClient-facing publication

3-Agent Cross-Review: Independent Eyes on Every Deliverable

Running in production since early 2025 — before "adversarial review" became standard terminology in the AI engineering community.


The Problem: Single-Reviewer Blindness

An engineer reviewing their own work is structurally incapable of catching certain classes of error. This is not a quality issue with any individual reviewer; it is a property of the review context being contaminated by the authoring context.

When the same person who built the analysis turns around to review it:

  • The reasoning that justified each decision is still in their head, so the decisions look obviously correct
  • Assumptions made during authoring feel like established facts
  • Missing requirements are invisible because they were never considered in the first place
  • Edge cases the author did not think of remain unconsidered

This is the engineering equivalent of a developer reviewing their own code. They might catch typos. They almost never catch architectural misalignment.

The Evidence

Before we enforced cross-review on our own work, the failure pattern was consistent: plans shipped with unstated assumptions; implementations diverged from plans because the same person rationalized the drift; the few self-reviews that did happen caught surface issues but missed structural problems. Independent review compliance was running at 4 percent. With cross-review enforced, that number is now consistently above 80 percent — and the issues caught are qualitatively different.


Three Independent Reviewers

For every plan and every deliverable, we engage three independent reviewers. Each operates in a fresh context, without access to the authoring session's reasoning or rationalizations.

Reviewer RolePrimary FocusReview Strengths
Orchestrator-reviewerTask framing and plan coherenceScope verification, cross-file consistency, convention alignment
Implementation-reviewerCode and analysis correctnessOff-by-one errors, test coverage, file-path verification, diff analysis
Adversarial reviewerArchitecture and completenessMissing edge cases, requirement gaps, large-context structural review

The key word is independent. A reviewer operating in the same session as the author is not independent — they share the same context contamination. Cross-review works because each reviewer starts with only the artifact and the specification, not the authoring session's history of dead ends, retries, and rationalizations.


The Two-Stage Review Model

Cross-review happens at two distinct stages, not just one. Skipping either stage produces predictable failure modes.

Stage 1: Plan Review

Author creates implementation plan
    |
    v
Implementation-reviewer reviews plan --> finds execution concerns
    |   "This approach won't handle concurrent access"
    v
Adversarial reviewer reviews plan --> finds architectural concerns
    |   "This duplicates logic already in the existing toolchain"
    v
Author updates plan based on both reviews
    |
    v
Client / project lead approves the updated plan

Stage 2: Artifact Review

Implementation-reviewer's plan is executed
    |
    v
Orchestrator-reviewer checks against plan
    |   "The plan called for 7 checkpoints; only 5 are present"
    v
Adversarial reviewer checks architecture
    |   "This component should follow the same protocol as its peer"
    v
Issues resolved --> deliverable ships

Both stages are mandatory. Plans receive adversarial review before approval. Code, analysis output, and deliverable artifacts receive adversarial review before completion. There is no path to ship that bypasses either stage.


Why Different Reviewers Catch Different Things

Different review perspectives have characteristic blind spots and strengths. These are not random — they are consistent enough to be exploited deliberately.

Failure ModeBest Caught ByWhy
Plan-implementation driftOrchestrator-reviewerMaintains the plan context, can cross-reference
Off-by-one errors in code or formulasImplementation-reviewerStrong at line-level analysis
Missing edge cases in planAdversarial reviewerLarge context window, good at "what about..." questions
Scope creepAny independent reviewerFresh context = no investment in existing approach
Incorrect file or reference pathsImplementation-reviewerGood at verifying paths exist where claimed
Architectural inconsistencyAdversarial reviewerHolds the entire repo structure in context
Unstated assumptionsAny independent reviewerAuthor's assumptions are not in the reviewer's context

The probability that all three reviewers miss the same issue is far lower than the probability that any single one misses it. That is the structural reason cross-review works.


How the Workflow Is Enforced

Cross-review is not voluntary. It is enforced at the strongest level we operate — technical gates that block operations rather than warnings that suggest review.

GateTriggerWhat It Enforces
Plan-execution gateImplementation startBlocks execution without plan review
PR creation gatePull request openedBlocks if no review evidence is present
Pre-push review gatePush to shared branchesBlocks feature commits lacking review
Ship gateFinal deliveryBlocks without test pairing and review

The gates check four evidence sources: review results files, recorded reviews in the planning archive, formal review reports, and review keywords in commit messages. If any source has evidence for the change in question, the gate passes. If none do, the gate blocks — regardless of who initiated the operation or what deadline pressure exists.


Review Verdicts

Reviews resolve to one of three explicit verdicts:

VerdictMeaningAction
APPROVENo issues foundProceed to ship
MINORSmall issues, non-blockingFix recommended; not required
MAJORSignificant issues foundMust fix before shipping

MAJOR findings require resolution and re-review. They do not count as passing evidence on their own — only resolution + re-review closes the gate.


What Cross-Review Actually Catches

Real examples from production engagements:

  1. Configuration mismatches between policy and implementation. A review caught a threshold configured at one value in policy but a different value in the enforcing hook — a silent drift that would have made our own configuration misleading.
  2. Default-mode misalignment. A reviewer noticed that the "advisory" mode was the default for a strict-by-stated-intent enforcement script. The contradiction was caught and corrected before it shipped to clients.
  3. Concurrency failure modes that only appear at scale. Reviewers identified file-locking races, file-handle issues, and policy hooks blocking final writes — failure modes that would only have surfaced under client load, where they would have been far costlier.

None of these were caught by self-review. All were caught by an independent reviewer with no investment in the original approach.


Key Takeaway

Cross-review works because it exploits a structural property: independent reviewers have non-overlapping blind spots. A deliverable reviewed through three different lenses, with three different sets of assumptions, in three fresh contexts, has been examined more thoroughly than any single review can achieve.

The enforcement is what makes it real. Without technical gates, review compliance was 4 percent. With gates, every shippable artifact is examined. The combination of independent reviewers and automatic enforcement creates a quality floor that no single reviewer can produce alone.

Want engineering deliverables with three layers of independent review built into the workflow? Contact ACE Engineering to learn how this pattern applies to your project governance.

Three reviewers. Every deliverable. No exceptions.

Cross-review is not a checklist. It is enforced at every shipping gate so that the work you receive has already been adversarially examined.

Talk to ACE EngineeringDownload Capability Summary (PDF, 1 page)