"7 Harness Design Decisions (2/4) — Single vs Multi-Agent, Thin vs Thick Harness"
There is no single correct harness architecture. But there are recurring decisions that shape whether a system stays understandable in production. Should one agent handle the whole job, or should roles split? Should the harness stay thin, or absorb more policy and verification logic? What belongs to the model, and what must be owned by the system? This article frames those choices as operational tradeoffs rather than as ideology.
Key Takeaways
- Harness design is closer to a decision table than to a feature catalog.
- The most important axes are
agent count,harness thickness,tool exposure,verification placement,memory ownership,permission boundaries, andorchestration responsibility. Single vs multi-agentandthin vs thick harnessare not stylistic preferences. They are choices about failure cost and operating complexity.- The best architecture is rarely the most advanced-looking one. It is usually the smallest structure that removes the current failure mode.
- So the right design questions are not about novelty first, but about where things break, who owns the state, and where rollback is possible.
1. Why design should be discussed as decisions
The Chapter 12 notes in sources/260518_ํ๋ค์ค์์ง๋์ด๋ง_15์ฅ_๋ธ๋ก๊ทธํ์ฉ๋
ธํธ.md push toward a useful mindset: harness design becomes clearer when you surface the major architectural choices before implementation details bury them.
Examples:
- should one agent do everything or should responsibilities split
- should tools be broadly available or exposed minimally by stage
- should verification happen mostly at the end or throughout the flow
- should memory and policy live in provider surfaces or in assets you own
When these choices stay implicit, architecture becomes harder to reason about later.
2. Decision 1: single agent vs multi-agent
This is the most visible tradeoff, and one teams often overcomplicate too early.
| Choice | Advantages | Costs |
|---|---|---|
| single agent | simpler debugging, clearer context continuity, lower coordination overhead | role conflicts can bloat prompts |
| multi-agent | role separation, possible parallelism, evaluator separation | handoffs, observability, and failure propagation become harder |
A practical rule of thumb:
- stay single-agent while one workspace is enough
- split only when repeated role conflict appears
- do not split just because multi-agent systems look more advanced
Multi-agent structures create real value when separation has a concrete purpose, such as independent evaluation.
3. Decision 2: thin harness vs thick harness
A thin harness connects model and tools with minimal extra structure. A thick harness absorbs more approvals, policy, verification, observability, and handoff logic.
| Choice | Best fit | Risk |
|---|---|---|
| thin harness | prototypes, exploration, low-risk personal workflows | limited control when failure appears |
| thick harness | team operations, higher-risk automation, long-running tasks | slower implementation and more upkeep |
The key is not to celebrate thickness for its own sake.
Start as thin as possible, then make the harness thicker only where repeated failures justify the added control.
For example, wrapping every action in approval logic is often excessive. Wrapping publishing or broad edits is not.
4. Decision 3: broad tool exposure vs stage-specific minimal exposure
Should the model always see every available tool, or only the minimum set needed at a given stage?
Taken together, drafts/blog/260429_ํ๋ค์ค์๋ฆฌ์ฆ04_๋๊ตฌ์์๋๋ฐ์ฑ_๋ธ๋ก๊ทธ.md and drafts/blog/260429_ํ๋ค์ค์๋ฆฌ์ฆ05_๋ผ์ฐํ
_๋ธ๋ก๊ทธ.md point to the same lesson: tools impose selection cost.
| Choice | Advantages | Risk |
|---|---|---|
| broad exposure | simpler wiring, more flexibility | wrong-tool selection, excessive permissions |
| minimal staged exposure | lower confusion, clearer policy boundaries | requires orchestration design |
In practice, narrow tool surfaces are usually safer early on, especially when different actions carry very different risk.
5. Decision 4: act immediately vs plan first
Should the system reason and act in one continuous loop, or pause to make a plan before execution?
| Choice | Strength | Weakness |
|---|---|---|
| immediate action | fast and natural for short tasks | drifts more easily in long or constrained tasks |
| plan first | clearer scope control, easier review | adds upfront overhead |
Low-risk short tasks often benefit from immediate action. Multi-file, long-form, or heavily constrained work often benefits from explicit planning first. In a repository with ownership boundaries, a plan can also serve as a scope confirmation artifact.
6. Decision 5: end-only verification vs verification throughout
Many teams think of validation as something that happens at the end. In operations, intermediate checks are often cheaper.
| Choice | Advantages | Risk |
|---|---|---|
| end-only validation | simpler flow | expensive rework when failure is discovered late |
| verification throughout | early failure detection, lower waste | slightly more design effort |
The fast sensor principle from drafts/blog/260519_ํ๋ค์ค์๋ฆฌ์ฆC01_์์ด์ ํธํ๊ฐํ๋ค์ค_๋ธ๋ก๊ทธ.md applies directly here. Deterministic failures such as ownership, schema, or format violations should be caught early.
7. Decision 6: provider-dependent memory vs owned memory
Where should memory and policy history live?
| Choice | Advantages | Risk |
|---|---|---|
| provider-dependent memory | fast startup, less infrastructure | lock-in, weaker portability, limited auditability |
| owned memory | portability, policy control, stronger auditability | more operational responsibility |
This is not only a storage choice. It is a decision about who owns the durable user and organizational experience. drafts/blog/260519_ํ๋ค์ค์๋ฆฌ์ฆC04_๋ฉ๋ชจ๋ฆฌ์์ ๊ถ_๋ธ๋ก๊ทธ.md explores that argument in depth.
8. Decision 7: model responsibility vs harness responsibility
This is the deepest design question: what should remain model judgment, and what should be externalized into enforceable system structure?
| Responsibility area | Model-centered | Harness-centered |
|---|---|---|
| tool selection | prompt guidance | control by exposure and policy |
| format compliance | ask nicely | enforce with schema, templates, post-checks |
| safety boundaries | instruction warnings | enforce with permissions, sandboxing, approvals |
| task resumption | rely on conversation state | externalize with handoff artifacts |
As systems mature, repeated rules usually move downward into the harness because models are variable, while harnesses can enforce.
9. Reading single vs multi-agent together with thin vs thick
These two axes are easy to misunderstand in isolation. They are better read together.
| Combination | Best fit | Watch out for |
|---|---|---|
| single + thin | personal drafting, exploratory work | weak operational safety |
| single + thick | constrained real workflows | prompt bloat and rigidity |
| multi + thin | experimental delegation | weak traceability and unclear responsibility |
| multi + thick | high-risk long-running automation | maximum cost and complexity |
Most systems should begin at single + thin, then move selectively toward single + thick. Multi-agent design is usually a later concern.
10. Applying this to our repository
This workspace already encodes several architectural choices.
- publishing is gated by explicit approval rather than silent automation
config/is protected because its ownership and risk profile are different- work scope is kept narrow through explicit file boundaries
- user-facing and internal language responsibilities are separated
These are not just workflow preferences. They are statements about what should not be delegated to the model.
11. Common failure modes
Defaulting to multi-agent too early
Coordination overhead often exceeds the actual benefit.
Treating thin harnesses as inherently better
They are fast early, but repeated operational failures can make them more expensive over time.
Leaving verification to the end
This delays the discovery of deterministic failures that could have been blocked cheaply.
Postponing ownership decisions
If memory, logs, and policy assets scatter across surfaces, later migration and auditing become much harder.
12. A practical decision checklist
Before expanding architecture, answer these questions first.
- What is the most expensive current failure?
- Can a single structure solve it, or is real role separation needed?
- How far forward should approvals and verification move?
- Who owns the tool, memory, and policy boundaries?
- How will the impact of this architectural choice be verified?
Good harness design is not about choosing fashionable abstractions. It is about deciding where to buy complexity and where to protect simplicity.
References
docs/blog_series_ํ๋ค์ค์์ง๋์ด๋ง_์ด๊ด_design.mdsources/260518_ํ๋ค์ค์์ง๋์ด๋ง_15์ฅ_๋ธ๋ก๊ทธํ์ฉ๋ ธํธ.mddrafts/blog/260429_ํ๋ค์ค์๋ฆฌ์ฆ04_๋๊ตฌ์์๋๋ฐ์ฑ_๋ธ๋ก๊ทธ.mddrafts/blog/260429_ํ๋ค์ค์๋ฆฌ์ฆ05_๋ผ์ฐํ _๋ธ๋ก๊ทธ.mddrafts/blog/260429_ํ๋ค์ค์๋ฆฌ์ฆ06_ํ๊ฐ์ด์_๋ธ๋ก๊ทธ.mddrafts/blog/260519_ํ๋ค์ค์๋ฆฌ์ฆC01_์์ด์ ํธํ๊ฐํ๋ค์ค_๋ธ๋ก๊ทธ.mddrafts/blog/260519_ํ๋ค์ค์๋ฆฌ์ฆC04_๋ฉ๋ชจ๋ฆฌ์์ ๊ถ_๋ธ๋ก๊ทธ.md
This is Part 2/4 of the Patterns, Strategy, and Cases series. Previous: 12 harness patterns. Next: why the harness is everything, from ACI to agent-first engineering.
Series overview: Harness Engineering Series Guide
๋๊ธ
๋๊ธ ์ฐ๊ธฐ