← Ascendy 한국어

meta

How you call the second AI, and when to stop it — a headless adversarial review loop

· Ascendy Engineering


TL;DR

Source note. This post distills a backend-team intake (docs/intake/from-backend/2026-05-31-headless-adversarial-review-loop.md). Internal identifiers — the in-house harness, agent product names, the CLI name, module paths — are generalized. If pairing two AIs made things slower is the evolution narrative, this is its “and here’s what actually happened when we ran it” data.

The real question wasn’t “should we have it review”

We run a harness pairing an implementation agent (the one writing code) with a review agent (a different model). When people imagine automating AI code review, they picture “have the second model review it.” But what actually got in the way wasn’t whether to review — it was how you call it, and when you stop.

The old setup ran the reviewer in a separate screen session. A human pasted the prompt, the reviewer spat an answer onto the screen, and we read it back to collect results via a sentinel file. That manual relay was the biggest friction, and reading from a screen is inherently fragile — ghost text, timing, focus incidents.

Drop screen control, go subprocess

Once we confirmed the reviewer CLI supports a headless (non-interactive) mode, we switched review from screen control to a subprocess call.

That single switch made screen-parsing, the separate session, and the sentinel relay vanish entirely. And as a bonus, a boundary clarified — a collaborator (a live agent in another repo, accumulating context) and a review function (a stateless call that examines the same work) are different things. The former should be a live session; the latter you call like a function and discard. Only the latter belongs in a headless call.

Adversarial review converges — but only with a stop line

We reviewed this round’s changes with that loop. Running one design doc through four rounds was striking.

RoundCHANGES_REQUESTED items
19
27
33
40 — APPROVED

Both item count and token usage decreased monotonically. The key was round 3. The reviewer explicitly accepted the human-drawn boundary — “this is a design doc; field-level spec is decided in the implementation PR” — and stopped re-demanding deeper spec.

That’s the key to convergence. Left alone, adversarial review demands ever-deeper detail — it converges asymptotically but never actually ends. Let a human draw the stop line (“ideation is direction, the implementation PR is spec”) and have the reviewer judge that line’s legitimacy, and it ended cleanly in four rounds. You don’t command the reviewer to “stop” — you ask “this is where this document’s responsibility ends; do you agree?”

And it caught real defects

One PR (a new API endpoint) went up without a review gate, “because it’s simple.” The headless review caught, step by step:

The other PR (a refactor) surfaced something more interesting. A “no hardcoding” guard test I’d added had a hole — it only checked double quotes (single quotes passed) — and the reviewer reproduced it directly, in code. So we rewrote the guard on an AST basis. That’s the value of proving a finding rather than just stating it.

What these defects share: the more “simple-looking” a change, the bigger the cost of skipping the review gate — and layering violations, secrets in logs, and quote-only static checks are all the kind human review easily misses.

Takeaways


Authorship & citation: Written by Ascendy Engineering; quotable with attribution. Found something wrong? Let us know via a GitHub issue.


Tags: ai-agents, code-review, automation, dogfooding, developer-tooling, convergence