"7 Harness Design Decisions (2/4) — Single vs Multi-Agent, Thin vs Thick Harness"

There is no single correct harness architecture. But there are recurring decisions that shape whether a system stays understandable in production. Should one agent handle the whole job, or should roles split? Should the harness stay thin, or absorb more policy and verification logic? What belongs to the model, and what must be owned by the system? This article frames those choices as operational tradeoffs rather than as ideology.


Key Takeaways

  • Harness design is closer to a decision table than to a feature catalog.
  • The most important axes are agent count, harness thickness, tool exposure, verification placement, memory ownership, permission boundaries, and orchestration responsibility.
  • Single vs multi-agent and thin vs thick harness are not stylistic preferences. They are choices about failure cost and operating complexity.
  • The best architecture is rarely the most advanced-looking one. It is usually the smallest structure that removes the current failure mode.
  • So the right design questions are not about novelty first, but about where things break, who owns the state, and where rollback is possible.

1. Why design should be discussed as decisions

The Chapter 12 notes in sources/260518_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_15์žฅ_๋ธ”๋กœ๊ทธํ™œ์šฉ๋…ธํŠธ.md push toward a useful mindset: harness design becomes clearer when you surface the major architectural choices before implementation details bury them.

Examples:

  • should one agent do everything or should responsibilities split
  • should tools be broadly available or exposed minimally by stage
  • should verification happen mostly at the end or throughout the flow
  • should memory and policy live in provider surfaces or in assets you own

When these choices stay implicit, architecture becomes harder to reason about later.

2. Decision 1: single agent vs multi-agent

This is the most visible tradeoff, and one teams often overcomplicate too early.

Choice Advantages Costs
single agent simpler debugging, clearer context continuity, lower coordination overhead role conflicts can bloat prompts
multi-agent role separation, possible parallelism, evaluator separation handoffs, observability, and failure propagation become harder

A practical rule of thumb:

  • stay single-agent while one workspace is enough
  • split only when repeated role conflict appears
  • do not split just because multi-agent systems look more advanced

Multi-agent structures create real value when separation has a concrete purpose, such as independent evaluation.

3. Decision 2: thin harness vs thick harness

A thin harness connects model and tools with minimal extra structure. A thick harness absorbs more approvals, policy, verification, observability, and handoff logic.

Choice Best fit Risk
thin harness prototypes, exploration, low-risk personal workflows limited control when failure appears
thick harness team operations, higher-risk automation, long-running tasks slower implementation and more upkeep

The key is not to celebrate thickness for its own sake.

Start as thin as possible, then make the harness thicker only where repeated failures justify the added control.

For example, wrapping every action in approval logic is often excessive. Wrapping publishing or broad edits is not.

4. Decision 3: broad tool exposure vs stage-specific minimal exposure

Should the model always see every available tool, or only the minimum set needed at a given stage?

Taken together, drafts/blog/260429_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆ04_๋„๊ตฌ์™€์ƒŒ๋“œ๋ฐ•์‹ฑ_๋ธ”๋กœ๊ทธ.md and drafts/blog/260429_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆ05_๋ผ์šฐํŒ…_๋ธ”๋กœ๊ทธ.md point to the same lesson: tools impose selection cost.

Choice Advantages Risk
broad exposure simpler wiring, more flexibility wrong-tool selection, excessive permissions
minimal staged exposure lower confusion, clearer policy boundaries requires orchestration design

In practice, narrow tool surfaces are usually safer early on, especially when different actions carry very different risk.

5. Decision 4: act immediately vs plan first

Should the system reason and act in one continuous loop, or pause to make a plan before execution?

Choice Strength Weakness
immediate action fast and natural for short tasks drifts more easily in long or constrained tasks
plan first clearer scope control, easier review adds upfront overhead

Low-risk short tasks often benefit from immediate action. Multi-file, long-form, or heavily constrained work often benefits from explicit planning first. In a repository with ownership boundaries, a plan can also serve as a scope confirmation artifact.

6. Decision 5: end-only verification vs verification throughout

Many teams think of validation as something that happens at the end. In operations, intermediate checks are often cheaper.

Choice Advantages Risk
end-only validation simpler flow expensive rework when failure is discovered late
verification throughout early failure detection, lower waste slightly more design effort

The fast sensor principle from drafts/blog/260519_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆC01_์—์ด์ „ํŠธํ‰๊ฐ€ํ•˜๋„ค์Šค_๋ธ”๋กœ๊ทธ.md applies directly here. Deterministic failures such as ownership, schema, or format violations should be caught early.

7. Decision 6: provider-dependent memory vs owned memory

Where should memory and policy history live?

Choice Advantages Risk
provider-dependent memory fast startup, less infrastructure lock-in, weaker portability, limited auditability
owned memory portability, policy control, stronger auditability more operational responsibility

This is not only a storage choice. It is a decision about who owns the durable user and organizational experience. drafts/blog/260519_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆC04_๋ฉ”๋ชจ๋ฆฌ์†Œ์œ ๊ถŒ_๋ธ”๋กœ๊ทธ.md explores that argument in depth.

8. Decision 7: model responsibility vs harness responsibility

This is the deepest design question: what should remain model judgment, and what should be externalized into enforceable system structure?

Responsibility area Model-centered Harness-centered
tool selection prompt guidance control by exposure and policy
format compliance ask nicely enforce with schema, templates, post-checks
safety boundaries instruction warnings enforce with permissions, sandboxing, approvals
task resumption rely on conversation state externalize with handoff artifacts

As systems mature, repeated rules usually move downward into the harness because models are variable, while harnesses can enforce.

9. Reading single vs multi-agent together with thin vs thick

These two axes are easy to misunderstand in isolation. They are better read together.

Combination Best fit Watch out for
single + thin personal drafting, exploratory work weak operational safety
single + thick constrained real workflows prompt bloat and rigidity
multi + thin experimental delegation weak traceability and unclear responsibility
multi + thick high-risk long-running automation maximum cost and complexity

Most systems should begin at single + thin, then move selectively toward single + thick. Multi-agent design is usually a later concern.

10. Applying this to our repository

This workspace already encodes several architectural choices.

  • publishing is gated by explicit approval rather than silent automation
  • config/ is protected because its ownership and risk profile are different
  • work scope is kept narrow through explicit file boundaries
  • user-facing and internal language responsibilities are separated

These are not just workflow preferences. They are statements about what should not be delegated to the model.

11. Common failure modes

Defaulting to multi-agent too early

Coordination overhead often exceeds the actual benefit.

Treating thin harnesses as inherently better

They are fast early, but repeated operational failures can make them more expensive over time.

Leaving verification to the end

This delays the discovery of deterministic failures that could have been blocked cheaply.

Postponing ownership decisions

If memory, logs, and policy assets scatter across surfaces, later migration and auditing become much harder.

12. A practical decision checklist

Before expanding architecture, answer these questions first.

  1. What is the most expensive current failure?
  2. Can a single structure solve it, or is real role separation needed?
  3. How far forward should approvals and verification move?
  4. Who owns the tool, memory, and policy boundaries?
  5. How will the impact of this architectural choice be verified?

Good harness design is not about choosing fashionable abstractions. It is about deciding where to buy complexity and where to protect simplicity.

References

  • docs/blog_series_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_์ด๊ด„_design.md
  • sources/260518_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_15์žฅ_๋ธ”๋กœ๊ทธํ™œ์šฉ๋…ธํŠธ.md
  • drafts/blog/260429_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆ04_๋„๊ตฌ์™€์ƒŒ๋“œ๋ฐ•์‹ฑ_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260429_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆ05_๋ผ์šฐํŒ…_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260429_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆ06_ํ‰๊ฐ€์šด์˜_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260519_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆC01_์—์ด์ „ํŠธํ‰๊ฐ€ํ•˜๋„ค์Šค_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260519_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆC04_๋ฉ”๋ชจ๋ฆฌ์†Œ์œ ๊ถŒ_๋ธ”๋กœ๊ทธ.md

This is Part 2/4 of the Patterns, Strategy, and Cases series. Previous: 12 harness patterns. Next: why the harness is everything, from ACI to agent-first engineering.

Series overview: Harness Engineering Series Guide

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System