"12 Harness Patterns (1/4) — Repeatable Structures for Building Reliable AI Agents"

When teams try to improve an agent, they often start by changing the model or tweaking prompts. But repeatability usually comes from structure, not from one clever setting. Harness patterns are a way to name those structures so they can be reused, reviewed, and grown intentionally.


Key Takeaways

  • Harness patterns are not a feature catalog. They are repeatable design structures for building stable agent work environments.
  • You do not need all patterns at once. In practice, they tend to grow in the order of instruction -> tools -> verification -> operations.
  • The twelve patterns in this article can be grouped into three maturity stages: foundation, growth, and operations.
  • The real goal of maturity is not adding more moving parts. It is reducing failure radius while increasing reproducibility.
  • So the right question is not "what else can we add?" but "which pattern removes the biggest current failure mode?"

1. Why a pattern language helps

The Chapter 11 notes in sources/260518_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_15์žฅ_๋ธ”๋กœ๊ทธํ™œ์šฉ๋…ธํŠธ.md point in an important direction: harness design is easier to scale when it is discussed as a pattern language rather than as a bag of product features.

That matters because:

  • the same model behaves very differently under different work environments
  • successful setups need named structures if they are to be reused
  • teams need a vocabulary richer than "should we add this tool?"

Patterns are therefore not cosmetic taxonomy. They are a compact way to discuss repeatable architecture.

2. The 12-pattern map

This D1 entry groups the patterns into four buckets.

Bucket Count Main question
instructions and context 3 does the model know where and how it is working
tools and actions 3 is the action surface narrow and understandable
verification and recovery 3 can failure be detected early and resumed cleanly
operations and ownership 3 does the system remain reliable over time

This is really a strategic reframing of ideas already introduced across Series A and C.

3. Four foundation patterns

The goal at this stage is not "make it powerful." It is make it understandable and repeatable.

Pattern 1. Explicit instruction surface

Separate the first-layer rules the agent reads, such as AGENTS.md, CLAUDE.md, or system instructions. The surface should show role, constraints, and output criteria clearly.

Pattern 2. Context layering

Do not mix user requests, durable rules, reference material, and tool output into one undifferentiated blob. Layering improves both reliability and debugging.

Pattern 3. Narrow tool surface

Prefer a small set of clearly named tools over a large ambiguous tool catalog. As drafts/blog/260429_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆ04_๋„๊ตฌ์™€์ƒŒ๋“œ๋ฐ•์‹ฑ_๋ธ”๋กœ๊ทธ.md argues, tool quality is often about selection clarity, not count.

Pattern 4. Fixed output contract

Stabilize the shape of the result: structured answer, checklist, JSON, frontmatter, or another predictable format. A stable contract makes downstream verification possible.

Without these four, later patterns often become unstable.

4. Four growth patterns

At this stage, the goal is to turn a fragile loop into a controllable execution flow.

Pattern 5. Stage-based workflow

Externalize the major phases when needed: observe, plan, act, verify. Long tasks especially benefit from clearer stage boundaries.

Pattern 6. Failure-first verification

Move cheap deterministic checks to the front. Schema violations, ownership violations, missing sections, and invalid paths should be caught before expensive judgment layers.

Pattern 7. Handoff artifacts

For long-running work, write resumable external state: current stage, next action, blockers, verification status. Memory helps later, but handoff often matters first.

Pattern 8. Approval boundaries

Do not treat reading, writing, execution, and external publishing as the same risk category. High-impact actions need separate approval or denial boundaries.

This is where a harness starts becoming operational rather than merely interactive.

5. Four operational patterns

The goal at this stage is not a smarter demo. It is a system that keeps working over time.

Pattern 9. Regression evaluation set

Keep real failure cases, not just success demos. This makes model changes, policy changes, and tool changes easier to evaluate safely.

Pattern 10. Observability

Track cost, latency, tool usage, and failure reasons. A system that occasionally answers well but cannot explain slowdown or failure is hard to operate.

Pattern 11. Memory ownership

Keep long-term memory in organization-controlled storage and schemas whenever possible. This improves portability, auditability, and vendor independence.

Pattern 12. Role separation

Separate generator and evaluator, executor and approver, primary agent and delegated agent when the value is real. The point is not fashionable multi-agent design. It is separating bias and failure domains.

6. Reading the patterns as a maturity roadmap

The important question is not whether all twelve exist. It is which ones matter most now.

Stage Highest-value patterns Typical failure Main goal
early 1, 2, 3, 4 instruction conflict, wrong tool choice, unstable format reproducible minimum loop
growing 5, 6, 7, 8 long-task collapse, policy mistakes, missing verification controllable execution
operating 9, 10, 11, 12 undetected regressions, cost spikes, lock-in, unclear responsibility sustained quality

This is why maturity should not be confused with complexity. If the output contract is still unstable, adding multi-agent delegation usually spreads the confusion instead of solving it.

7. Mapping this to our repository

This workspace already distributes several patterns across explicit artifacts and boundaries.

Pattern Repository example
explicit instruction surface AGENTS.md, CLAUDE.md
context layering tasks/plan.md, docs/memory-map.md, user request separation
handoff artifacts tasks/handoffs/, tasks/sessions/
approval boundaries explicit confirmation before publishing, no edits in config/
regression discipline verification flows around scripts/**

Patterns become useful only when they map to real files, hooks, and operating boundaries.

8. Common failure modes

Treating patterns like collectibles

If patterns are added without a failure model behind them, complexity rises faster than reliability.

Pulling in operational patterns too early

Heavy orchestration on top of unstable basics usually slows the system without making it safer.

Equating maturity with multi-agent design

Role separation is a late-stage pattern, not a badge of seriousness.

Leaving ownership implicit

A pattern with no document, boundary, or storage owner is not yet an operational asset.

9. Practical starting point

Use the pattern language against your current failures.

  1. Identify the failure that happens most often.
  2. Find the smallest pattern that would reduce it.
  3. Decide whether that pattern belongs in docs, tools, verification, or approvals.
  4. Define how you will verify that the pattern helped.

Good harnesses rarely begin by doing more. They usually begin by failing less predictably.

References

  • docs/blog_series_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_์ด๊ด„_design.md
  • sources/260518_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_15์žฅ_๋ธ”๋กœ๊ทธํ™œ์šฉ๋…ธํŠธ.md
  • drafts/blog/260429_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆ04_๋„๊ตฌ์™€์ƒŒ๋“œ๋ฐ•์‹ฑ_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260429_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆ05_๋ผ์šฐํŒ…_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260429_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆ06_ํ‰๊ฐ€์šด์˜_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260519_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆC01_์—์ด์ „ํŠธํ‰๊ฐ€ํ•˜๋„ค์Šค_๋ธ”๋กœ๊ทธ.md

This is Part 1/4 of the Patterns, Strategy, and Cases series. Next: seven harness design decisions, including single vs multi-agent, thin vs thick harnesses, and where ownership boundaries should live.

Series overview: Harness Engineering Series Guide

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System