"7 Harness Design Decisions (2/4) — Single vs Multi-Agent, Thin vs Thick Harness"

5월 18, 2026

There is no single correct harness architecture. But there are recurring decisions that shape whether a system stays understandable in production. Should one agent handle the whole job, or should roles split? Should the harness stay thin, or absorb more policy and verification logic? What belongs to the model, and what must be owned by the system? This article frames those choices as operational tradeoffs rather than as ideology.

Key Takeaways

Harness design is closer to a decision table than to a feature catalog.
The most important axes are agent count, harness thickness, tool exposure, verification placement, memory ownership, permission boundaries, and orchestration responsibility.
Single vs multi-agent and thin vs thick harness are not stylistic preferences. They are choices about failure cost and operating complexity.
The best architecture is rarely the most advanced-looking one. It is usually the smallest structure that removes the current failure mode.
So the right design questions are not about novelty first, but about where things break, who owns the state, and where rollback is possible.

1. Why design should be discussed as decisions

The Chapter 12 notes in sources/260518_하네스엔지니어링_15장_블로그활용노트.md push toward a useful mindset: harness design becomes clearer when you surface the major architectural choices before implementation details bury them.

Examples:

should one agent do everything or should responsibilities split
should tools be broadly available or exposed minimally by stage
should verification happen mostly at the end or throughout the flow
should memory and policy live in provider surfaces or in assets you own

When these choices stay implicit, architecture becomes harder to reason about later.

2. Decision 1: single agent vs multi-agent

This is the most visible tradeoff, and one teams often overcomplicate too early.

Choice	Advantages	Costs
single agent	simpler debugging, clearer context continuity, lower coordination overhead	role conflicts can bloat prompts
multi-agent	role separation, possible parallelism, evaluator separation	handoffs, observability, and failure propagation become harder

A practical rule of thumb:

stay single-agent while one workspace is enough
split only when repeated role conflict appears
do not split just because multi-agent systems look more advanced

Multi-agent structures create real value when separation has a concrete purpose, such as independent evaluation.

3. Decision 2: thin harness vs thick harness

A thin harness connects model and tools with minimal extra structure. A thick harness absorbs more approvals, policy, verification, observability, and handoff logic.

Choice	Best fit	Risk
thin harness	prototypes, exploration, low-risk personal workflows	limited control when failure appears
thick harness	team operations, higher-risk automation, long-running tasks	slower implementation and more upkeep

The key is not to celebrate thickness for its own sake.

Start as thin as possible, then make the harness thicker only where repeated failures justify the added control.

For example, wrapping every action in approval logic is often excessive. Wrapping publishing or broad edits is not.

4. Decision 3: broad tool exposure vs stage-specific minimal exposure

Should the model always see every available tool, or only the minimum set needed at a given stage?

Taken together, drafts/blog/260429_하네스시리즈04_도구와샌드박싱_블로그.md and drafts/blog/260429_하네스시리즈05_라우팅_블로그.md point to the same lesson: tools impose selection cost.

Choice	Advantages	Risk
broad exposure	simpler wiring, more flexibility	wrong-tool selection, excessive permissions
minimal staged exposure	lower confusion, clearer policy boundaries	requires orchestration design

In practice, narrow tool surfaces are usually safer early on, especially when different actions carry very different risk.

5. Decision 4: act immediately vs plan first

Should the system reason and act in one continuous loop, or pause to make a plan before execution?

Choice	Strength	Weakness
immediate action	fast and natural for short tasks	drifts more easily in long or constrained tasks
plan first	clearer scope control, easier review	adds upfront overhead

Low-risk short tasks often benefit from immediate action. Multi-file, long-form, or heavily constrained work often benefits from explicit planning first. In a repository with ownership boundaries, a plan can also serve as a scope confirmation artifact.

6. Decision 5: end-only verification vs verification throughout

Many teams think of validation as something that happens at the end. In operations, intermediate checks are often cheaper.

Choice	Advantages	Risk
end-only validation	simpler flow	expensive rework when failure is discovered late
verification throughout	early failure detection, lower waste	slightly more design effort

The fast sensor principle from drafts/blog/260519_하네스시리즈C01_에이전트평가하네스_블로그.md applies directly here. Deterministic failures such as ownership, schema, or format violations should be caught early.

7. Decision 6: provider-dependent memory vs owned memory

Where should memory and policy history live?

Choice	Advantages	Risk
provider-dependent memory	fast startup, less infrastructure	lock-in, weaker portability, limited auditability
owned memory	portability, policy control, stronger auditability	more operational responsibility

This is not only a storage choice. It is a decision about who owns the durable user and organizational experience. drafts/blog/260519_하네스시리즈C04_메모리소유권_블로그.md explores that argument in depth.

8. Decision 7: model responsibility vs harness responsibility

This is the deepest design question: what should remain model judgment, and what should be externalized into enforceable system structure?

Responsibility area	Model-centered	Harness-centered
tool selection	prompt guidance	control by exposure and policy
format compliance	ask nicely	enforce with schema, templates, post-checks
safety boundaries	instruction warnings	enforce with permissions, sandboxing, approvals
task resumption	rely on conversation state	externalize with handoff artifacts

As systems mature, repeated rules usually move downward into the harness because models are variable, while harnesses can enforce.

9. Reading single vs multi-agent together with thin vs thick

These two axes are easy to misunderstand in isolation. They are better read together.

Combination	Best fit	Watch out for
single + thin	personal drafting, exploratory work	weak operational safety
single + thick	constrained real workflows	prompt bloat and rigidity
multi + thin	experimental delegation	weak traceability and unclear responsibility
multi + thick	high-risk long-running automation	maximum cost and complexity

Most systems should begin at single + thin, then move selectively toward single + thick. Multi-agent design is usually a later concern.

10. Applying this to our repository

This workspace already encodes several architectural choices.

publishing is gated by explicit approval rather than silent automation
config/ is protected because its ownership and risk profile are different
work scope is kept narrow through explicit file boundaries
user-facing and internal language responsibilities are separated

These are not just workflow preferences. They are statements about what should not be delegated to the model.

11. Common failure modes

Defaulting to multi-agent too early

Coordination overhead often exceeds the actual benefit.

Treating thin harnesses as inherently better

They are fast early, but repeated operational failures can make them more expensive over time.

Leaving verification to the end

This delays the discovery of deterministic failures that could have been blocked cheaply.

Postponing ownership decisions

If memory, logs, and policy assets scatter across surfaces, later migration and auditing become much harder.

12. A practical decision checklist

Before expanding architecture, answer these questions first.

What is the most expensive current failure?
Can a single structure solve it, or is real role separation needed?
How far forward should approvals and verification move?
Who owns the tool, memory, and policy boundaries?
How will the impact of this architectural choice be verified?

Good harness design is not about choosing fashionable abstractions. It is about deciding where to buy complexity and where to protect simplicity.

References

docs/blog_series_하네스엔지니어링_총괄_design.md
sources/260518_하네스엔지니어링_15장_블로그활용노트.md
drafts/blog/260429_하네스시리즈04_도구와샌드박싱_블로그.md
drafts/blog/260429_하네스시리즈05_라우팅_블로그.md
drafts/blog/260429_하네스시리즈06_평가운영_블로그.md
drafts/blog/260519_하네스시리즈C01_에이전트평가하네스_블로그.md
drafts/blog/260519_하네스시리즈C04_메모리소유권_블로그.md

This is Part 2/4 of the Patterns, Strategy, and Cases series. Previous: 12 harness patterns. Next: why the harness is everything, from ACI to agent-first engineering.

Series overview: Harness Engineering Series Guide

이 블로그 검색

MaJu Tech Notes