"12 Harness Patterns (1/4) — Repeatable Structures for Building Reliable AI Agents"
When teams try to improve an agent, they often start by changing the model or tweaking prompts. But repeatability usually comes from structure, not from one clever setting. Harness patterns are a way to name those structures so they can be reused, reviewed, and grown intentionally.
Key Takeaways
- Harness patterns are not a feature catalog. They are repeatable design structures for building stable agent work environments.
- You do not need all patterns at once. In practice, they tend to grow in the order of
instruction -> tools -> verification -> operations. - The twelve patterns in this article can be grouped into three maturity stages:
foundation,growth, andoperations. - The real goal of maturity is not adding more moving parts. It is reducing failure radius while increasing reproducibility.
- So the right question is not "what else can we add?" but "which pattern removes the biggest current failure mode?"
1. Why a pattern language helps
The Chapter 11 notes in sources/260518_ํ๋ค์ค์์ง๋์ด๋ง_15์ฅ_๋ธ๋ก๊ทธํ์ฉ๋
ธํธ.md point in an important direction: harness design is easier to scale when it is discussed as a pattern language rather than as a bag of product features.
That matters because:
- the same model behaves very differently under different work environments
- successful setups need named structures if they are to be reused
- teams need a vocabulary richer than "should we add this tool?"
Patterns are therefore not cosmetic taxonomy. They are a compact way to discuss repeatable architecture.
2. The 12-pattern map
This D1 entry groups the patterns into four buckets.
| Bucket | Count | Main question |
|---|---|---|
| instructions and context | 3 | does the model know where and how it is working |
| tools and actions | 3 | is the action surface narrow and understandable |
| verification and recovery | 3 | can failure be detected early and resumed cleanly |
| operations and ownership | 3 | does the system remain reliable over time |
This is really a strategic reframing of ideas already introduced across Series A and C.
3. Four foundation patterns
The goal at this stage is not "make it powerful." It is make it understandable and repeatable.
Pattern 1. Explicit instruction surface
Separate the first-layer rules the agent reads, such as AGENTS.md, CLAUDE.md, or system instructions. The surface should show role, constraints, and output criteria clearly.
Pattern 2. Context layering
Do not mix user requests, durable rules, reference material, and tool output into one undifferentiated blob. Layering improves both reliability and debugging.
Pattern 3. Narrow tool surface
Prefer a small set of clearly named tools over a large ambiguous tool catalog. As drafts/blog/260429_ํ๋ค์ค์๋ฆฌ์ฆ04_๋๊ตฌ์์๋๋ฐ์ฑ_๋ธ๋ก๊ทธ.md argues, tool quality is often about selection clarity, not count.
Pattern 4. Fixed output contract
Stabilize the shape of the result: structured answer, checklist, JSON, frontmatter, or another predictable format. A stable contract makes downstream verification possible.
Without these four, later patterns often become unstable.
4. Four growth patterns
At this stage, the goal is to turn a fragile loop into a controllable execution flow.
Pattern 5. Stage-based workflow
Externalize the major phases when needed: observe, plan, act, verify. Long tasks especially benefit from clearer stage boundaries.
Pattern 6. Failure-first verification
Move cheap deterministic checks to the front. Schema violations, ownership violations, missing sections, and invalid paths should be caught before expensive judgment layers.
Pattern 7. Handoff artifacts
For long-running work, write resumable external state: current stage, next action, blockers, verification status. Memory helps later, but handoff often matters first.
Pattern 8. Approval boundaries
Do not treat reading, writing, execution, and external publishing as the same risk category. High-impact actions need separate approval or denial boundaries.
This is where a harness starts becoming operational rather than merely interactive.
5. Four operational patterns
The goal at this stage is not a smarter demo. It is a system that keeps working over time.
Pattern 9. Regression evaluation set
Keep real failure cases, not just success demos. This makes model changes, policy changes, and tool changes easier to evaluate safely.
Pattern 10. Observability
Track cost, latency, tool usage, and failure reasons. A system that occasionally answers well but cannot explain slowdown or failure is hard to operate.
Pattern 11. Memory ownership
Keep long-term memory in organization-controlled storage and schemas whenever possible. This improves portability, auditability, and vendor independence.
Pattern 12. Role separation
Separate generator and evaluator, executor and approver, primary agent and delegated agent when the value is real. The point is not fashionable multi-agent design. It is separating bias and failure domains.
6. Reading the patterns as a maturity roadmap
The important question is not whether all twelve exist. It is which ones matter most now.
| Stage | Highest-value patterns | Typical failure | Main goal |
|---|---|---|---|
| early | 1, 2, 3, 4 | instruction conflict, wrong tool choice, unstable format | reproducible minimum loop |
| growing | 5, 6, 7, 8 | long-task collapse, policy mistakes, missing verification | controllable execution |
| operating | 9, 10, 11, 12 | undetected regressions, cost spikes, lock-in, unclear responsibility | sustained quality |
This is why maturity should not be confused with complexity. If the output contract is still unstable, adding multi-agent delegation usually spreads the confusion instead of solving it.
7. Mapping this to our repository
This workspace already distributes several patterns across explicit artifacts and boundaries.
| Pattern | Repository example |
|---|---|
| explicit instruction surface | AGENTS.md, CLAUDE.md |
| context layering | tasks/plan.md, docs/memory-map.md, user request separation |
| handoff artifacts | tasks/handoffs/, tasks/sessions/ |
| approval boundaries | explicit confirmation before publishing, no edits in config/ |
| regression discipline | verification flows around scripts/** |
Patterns become useful only when they map to real files, hooks, and operating boundaries.
8. Common failure modes
Treating patterns like collectibles
If patterns are added without a failure model behind them, complexity rises faster than reliability.
Pulling in operational patterns too early
Heavy orchestration on top of unstable basics usually slows the system without making it safer.
Equating maturity with multi-agent design
Role separation is a late-stage pattern, not a badge of seriousness.
Leaving ownership implicit
A pattern with no document, boundary, or storage owner is not yet an operational asset.
9. Practical starting point
Use the pattern language against your current failures.
- Identify the failure that happens most often.
- Find the smallest pattern that would reduce it.
- Decide whether that pattern belongs in docs, tools, verification, or approvals.
- Define how you will verify that the pattern helped.
Good harnesses rarely begin by doing more. They usually begin by failing less predictably.
References
docs/blog_series_ํ๋ค์ค์์ง๋์ด๋ง_์ด๊ด_design.mdsources/260518_ํ๋ค์ค์์ง๋์ด๋ง_15์ฅ_๋ธ๋ก๊ทธํ์ฉ๋ ธํธ.mddrafts/blog/260429_ํ๋ค์ค์๋ฆฌ์ฆ04_๋๊ตฌ์์๋๋ฐ์ฑ_๋ธ๋ก๊ทธ.mddrafts/blog/260429_ํ๋ค์ค์๋ฆฌ์ฆ05_๋ผ์ฐํ _๋ธ๋ก๊ทธ.mddrafts/blog/260429_ํ๋ค์ค์๋ฆฌ์ฆ06_ํ๊ฐ์ด์_๋ธ๋ก๊ทธ.mddrafts/blog/260519_ํ๋ค์ค์๋ฆฌ์ฆC01_์์ด์ ํธํ๊ฐํ๋ค์ค_๋ธ๋ก๊ทธ.md
This is Part 1/4 of the Patterns, Strategy, and Cases series. Next: seven harness design decisions, including single vs multi-agent, thin vs thick harnesses, and where ownership boundaries should live.
Series overview: Harness Engineering Series Guide
๋๊ธ
๋๊ธ ์ฐ๊ธฐ