Coding Agents in Practice (4/5) — Multi-Agent Patterns: Orchestrator and Specialist Separation
When a single LLM session does everything, the context breaks. Role separation fixes it.
ํต์ฌ ์์ฝ
- Multi-agent is not "let's use more model" — it's "let's separate context"
- Primary sources: Anthropic Claude Code subagents docs; OpenClaw / Hermes operational experience
- Three core patterns: orchestrator + specialists / pipeline / collaboration
- Cost is not a simple multiplication — context savings and reduced retries often make multi-agent cheaper overall
- Common pitfalls: permission leakage, context contamination, debugging difficulty
1. Why one agent doing everything fails
When a single LLM session handles design + implementation + testing + review + docs, three failure modes appear:
- Context contamination: details from one task pollute another. Debug-time scratch code becomes a PR-review baseline.
- Context limit overflow: even 200K tokens fill quickly under heavy load. Auto-compaction shaves the reasoning behind decisions.
- Permission asymmetry: the same session has publish + credential access + code change authority. One mistake has wide blast radius.
Solution: separate agents by task type and have one orchestrator on top.
2. Orchestrator + specialists pattern
The most common topology.
[Orchestrator = main]
│
┌────────┼────────┬────────┐
▼ ▼ ▼ ▼
[Edit] [Test] [Review] [Document]
executor tester reviewer writer
Orchestrator's role: - Receive a request and decide which specialist to call. - Absorb specialist results in summary form (not full output). - Decide the next step.
Specialist's role: - Work deeply in their specialty. - Return a condensed report — protecting the orchestrator's context.
Core principle: orchestrator sees the whole picture, specialists see depth. Two kinds of context, neither overloaded.
3. Three topologies
3.1 Supervisor (hub-and-spoke)
- One orchestrator + N specialists, as above.
- Simple, predictable, easy to debug.
- Limit: when specialists need to talk directly, every hop goes through the orchestrator.
3.2 Pipeline (serial)
- Specialist A → B → C, output of one becomes input of the next.
- Strong fit for clear-stage work — data transform, validation, publish.
- Limit: a failure at one stage halts everything.
3.3 Collaboration (mesh / consensus)
- Multiple specialists see the same input in parallel and produce answers; results are merged or voted.
- Good for code review where different lenses (security / performance / readability) matter.
- Most expensive: same tokens processed N times.
Selection rule: clear stages → pipeline; clear division of labor → supervisor; different lenses → collaboration.
4. Claude Code's subagents — Real implementation
Claude Code calls subagents through the Agent tool, with subagent_type selecting the specialist type and a self-contained prompt.
Prompt-writing principles (where most teams stumble): - The subagent does not know the main agent's context. Put what / why / what's been tried all into the prompt. - Specify the output shape: "report in 200 words," "JSON result," "return only the URL," etc. - Don't delegate understanding: instead of "analyze and decide," say "in file X line Y, change Z to W." The main agent must understand first, then delegate.
Parallel vs serial: - Independent investigations can fan out in parallel. - Dependent work runs serial. - Parallel is faster but the total token cost is roughly the same.
5. Cost — Not a simple multiplication
"Multi-agent = N× cost" is half-true.
What costs more: - Each subagent re-loads its own context (CLAUDE.md, rules, tool defs). - Orchestrator-specialist message round-trips themselves cost tokens.
What costs less: - The main context stays light, so less compaction — and compaction triggers re-asks. - Specialists can run on smaller models (e.g., Haiku) when the work allows. - Fewer retries — when the main is contaminated, decision quality drops and whole tasks restart.
Heuristic: simple 1–2 step tasks → single agent, almost always cheaper. 5+ step or mixed-specialty work → multi-agent often wins on total cost.
6. Four common pitfalls
6.1 Cross-contamination
- The main agent ingests a specialist's full output and stores it in memory → contamination.
- Fix: specialists return summaries only; the main agent trusts only the summary.
6.2 Permission leakage
- Every agent gets every tool → wide blast radius from one mistake.
- Fix: per-agent tool whitelists. Only a designated agent can publish.
6.3 Debugging difficulty
- "Which agent went wrong, and where?" is hard to trace.
- Fix: every agent call logs a correlation ID + result summary. At session end, archive logs separately.
6.4 Over-engineering
- Inserting multi-agent into trivial work.
- Fix: a hard rule like "3+ step tasks enter multi-agent flow." Below that, single agent + self-check.
7. At a glance
| Pattern | Best for | Strength | Weakness |
|---|---|---|---|
| Orchestrator + specialists (supervisor) | Clear division of labor | Simple, easy to debug | Specialists can't talk directly |
| Pipeline | Clear stages | Trace easy | One stage fails → whole halts |
| Collaboration (mesh / consensus) | Different lenses | Diversity | Most expensive |
| Single agent | ≤2 step tasks | Cheapest | Context contamination risk |
Design rule: task types diverge → agents diverge. Otherwise, single agent.
Next up
Part 5/5: Coding Agent Cost Management — Tokens, Caching, Routing. If multi-agent splits structure, the next post splits cost.
References
- Anthropic, Subagents in Claude Code — code.claude.com/docs/agents (verified 2026-05-05).
- Anthropic, How We Built Our Multi-Agent Research System — anthropic.com/research/built-multi-agent-research-system (verified 2026-05-05).
- The "What Is Harness Engineering?" series — theoretical background.
This is part 4/5 of the Coding Agents in Practice series.
๋๊ธ
๋๊ธ ์ฐ๊ธฐ