Coding Agents in Practice (4/5) — Multi-Agent Patterns: Orchestrator and Specialist Separation

AI ์ฝ”๋”ฉ ์—์ด์ „ํŠธ ์‹ค์ „ (4/5) — ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ํŒจํ„ด: ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ดํ„ฐ์™€ ์ „๋ฌธ๊ฐ€ ๋ถ„๋ฆฌ

When a single LLM session does everything, the context breaks. Role separation fixes it.


ํ•ต์‹ฌ ์š”์•ฝ

  • Multi-agent is not "let's use more model" — it's "let's separate context"
  • Primary sources: Anthropic Claude Code subagents docs; OpenClaw / Hermes operational experience
  • Three core patterns: orchestrator + specialists / pipeline / collaboration
  • Cost is not a simple multiplication — context savings and reduced retries often make multi-agent cheaper overall
  • Common pitfalls: permission leakage, context contamination, debugging difficulty

1. Why one agent doing everything fails

When a single LLM session handles design + implementation + testing + review + docs, three failure modes appear:

  • Context contamination: details from one task pollute another. Debug-time scratch code becomes a PR-review baseline.
  • Context limit overflow: even 200K tokens fill quickly under heavy load. Auto-compaction shaves the reasoning behind decisions.
  • Permission asymmetry: the same session has publish + credential access + code change authority. One mistake has wide blast radius.

Solution: separate agents by task type and have one orchestrator on top.


2. Orchestrator + specialists pattern

The most common topology.

        [Orchestrator = main]
            │
   ┌────────┼────────┬────────┐
   ▼        ▼        ▼        ▼
 [Edit]  [Test]  [Review]  [Document]
 executor  tester  reviewer   writer

Orchestrator's role: - Receive a request and decide which specialist to call. - Absorb specialist results in summary form (not full output). - Decide the next step.

Specialist's role: - Work deeply in their specialty. - Return a condensed report — protecting the orchestrator's context.

Core principle: orchestrator sees the whole picture, specialists see depth. Two kinds of context, neither overloaded.


3. Three topologies

3.1 Supervisor (hub-and-spoke)

  • One orchestrator + N specialists, as above.
  • Simple, predictable, easy to debug.
  • Limit: when specialists need to talk directly, every hop goes through the orchestrator.

3.2 Pipeline (serial)

  • Specialist A → B → C, output of one becomes input of the next.
  • Strong fit for clear-stage work — data transform, validation, publish.
  • Limit: a failure at one stage halts everything.

3.3 Collaboration (mesh / consensus)

  • Multiple specialists see the same input in parallel and produce answers; results are merged or voted.
  • Good for code review where different lenses (security / performance / readability) matter.
  • Most expensive: same tokens processed N times.

Selection rule: clear stages → pipeline; clear division of labor → supervisor; different lenses → collaboration.


4. Claude Code's subagents — Real implementation

Claude Code calls subagents through the Agent tool, with subagent_type selecting the specialist type and a self-contained prompt.

Prompt-writing principles (where most teams stumble): - The subagent does not know the main agent's context. Put what / why / what's been tried all into the prompt. - Specify the output shape: "report in 200 words," "JSON result," "return only the URL," etc. - Don't delegate understanding: instead of "analyze and decide," say "in file X line Y, change Z to W." The main agent must understand first, then delegate.

Parallel vs serial: - Independent investigations can fan out in parallel. - Dependent work runs serial. - Parallel is faster but the total token cost is roughly the same.


5. Cost — Not a simple multiplication

"Multi-agent = N× cost" is half-true.

What costs more: - Each subagent re-loads its own context (CLAUDE.md, rules, tool defs). - Orchestrator-specialist message round-trips themselves cost tokens.

What costs less: - The main context stays light, so less compaction — and compaction triggers re-asks. - Specialists can run on smaller models (e.g., Haiku) when the work allows. - Fewer retries — when the main is contaminated, decision quality drops and whole tasks restart.

Heuristic: simple 1–2 step tasks → single agent, almost always cheaper. 5+ step or mixed-specialty work → multi-agent often wins on total cost.


6. Four common pitfalls

6.1 Cross-contamination

  • The main agent ingests a specialist's full output and stores it in memory → contamination.
  • Fix: specialists return summaries only; the main agent trusts only the summary.

6.2 Permission leakage

  • Every agent gets every tool → wide blast radius from one mistake.
  • Fix: per-agent tool whitelists. Only a designated agent can publish.

6.3 Debugging difficulty

  • "Which agent went wrong, and where?" is hard to trace.
  • Fix: every agent call logs a correlation ID + result summary. At session end, archive logs separately.

6.4 Over-engineering

  • Inserting multi-agent into trivial work.
  • Fix: a hard rule like "3+ step tasks enter multi-agent flow." Below that, single agent + self-check.

7. At a glance

Pattern Best for Strength Weakness
Orchestrator + specialists (supervisor) Clear division of labor Simple, easy to debug Specialists can't talk directly
Pipeline Clear stages Trace easy One stage fails → whole halts
Collaboration (mesh / consensus) Different lenses Diversity Most expensive
Single agent ≤2 step tasks Cheapest Context contamination risk

Design rule: task types diverge → agents diverge. Otherwise, single agent.


Next up

Part 5/5: Coding Agent Cost Management — Tokens, Caching, Routing. If multi-agent splits structure, the next post splits cost.


References

  • Anthropic, Subagents in Claude Code — code.claude.com/docs/agents (verified 2026-05-05).
  • Anthropic, How We Built Our Multi-Agent Research System — anthropic.com/research/built-multi-agent-research-system (verified 2026-05-05).
  • The "What Is Harness Engineering?" series — theoretical background.

This is part 4/5 of the Coding Agents in Practice series.

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System