← Ascendy 한국어

meta

We paired two AI models and got slower — so we routed work by tier

· Ascendy Engineering


TL;DR

Why pair two models in the first place

It started from a simple hypothesis: two models cross-checking each other beat one model alone. The reasoning is just as simple. Each vendor keeps its training direction and methodology private, so models end up with different strength/weakness distributions — what one misses, the other catches. I’d already felt this running Gemini and GPT side by side.

I was skeptical at first. I started as a believer in a single coding agent (Claude Code), and the project itself had moved across vendors (GPT → Gemini → Claude). People kept suggesting I pair in another agent (Codex), but I doubted it. Wasn’t one model enough?

It was, until I actually tried.

The harness, in three stages

v1 — tdd-batch: top quality, worst speed

The first shape split the two models along tests. One model wrote the plan and the test code, the other wrote the implementation that made those tests pass. With tests as both the spec and the point of agreement, code quality was very high.

The problem was speed. Even a few-line fix dragged into a long ping-pong between the two models. The write-test → implement → fail → rewrite cycle ran in full for trivial changes too. Pushing quality to 100 cost the floor of speed.

v2 — pair-agent: fast, but a thin margin

So I dropped the test requirement. One model writes code, the other reviews. The ping-pong shortens, so it’s fast. But the safety margin v1 gave — the regression defense that tests enforced — got thinner.

Putting the two side by side made it obvious: v1 leans on quality, v2 on speed. Both are right, but neither is right for every task.

v3 — tier routing: the task carries its own tier

The premise of the third shape: speed and quality can’t be had at the same seat simultaneously. So rather than forcing every task through one process, you split tasks by their nature.

When a task arrives, it’s classified into a tier first, then routed to the matching process. Message passing between sessions is handled by an inter-session agent communication tool — without it, a multi-agent workflow eventually breaks down into the operator copy-pasting between windows by hand.

Low · Mid

High

Incoming task / issue

Tier classification

(security · business logic · difficulty · architecture impact)

pair: code → review

speed first, single PR cycle

Agent Committee

Pre: multiple models adversarially review the plan

Implement along the agreed direction

Everyone observes: decision log + review trail

Mermaid source
flowchart TD
  T["Incoming task / issue"] --> C{"Tier classification<br/>(security · business logic · difficulty · architecture impact)"}
  C -->|"Low · Mid"| P["pair: code → review<br/>speed first, single PR cycle"]
  C -->|"High"| K["Agent Committee"]
  K --> K1["Pre: multiple models adversarially review the plan"]
  K --> K2["Implement along the agreed direction"]
  K --> K3["Everyone observes: decision log + review trail"]

The operating-room analogy

The best single picture for this structure is an operating room.

An appendectomy is handled by one skilled surgeon. It’s standardized and low-risk — low tier, a pair is enough. A complex multidisciplinary surgery is different. Specialists from each domain hold an intense planning meeting beforehand, and the operation runs with everyone able to observe — high tier, a committee.

Both are legitimate. Convening the whole team for an appendectomy is waste; handing a multidisciplinary surgery to one person is risk. That’s why the triage — deciding what goes where — is the heart of the system.

The three harnesses, side by side

HarnessQualitySpeedOverheadBest for
tdd-batch★★★★★★★★★★Core libraries / security-critical
pair-agent★★★★★★★★★General features / bug fixes
Tier routingper-tierper-tier★★★All cases (auto-routed)

(Stars are a qualitative read.)

Decisions / tradeoffs

Part of the inspiration came from outside. Andrej Karpathy’s publicly shared CLAUDE.md was one early starting point, and a harness concept shared by a small YouTube creator nudged the direction too — cases where a public post and video fed straight into a real workflow.

What’s next


Authorship & citation: This post was written by Ascendy Engineering and may be re-cited with attribution. If you find an error, please let us know via a GitHub issue.


Tags: llm-agents, pair-programming, claude-code, codex, agent-os, dogfooding, developer-workflow