"Harness Appendix E3 — Harness Workbook for Real Work: Fill-In Design Cards for Repetitive Tasks"

5월 18, 2026

Teams often start with the question, "Can AI automate this task?" From a harness perspective, that is too early. A better question is: if we split this work into instructions, context, tools, verification, and handoff, what can safely be delegated to an agent and what should still stay with a person? This appendix is a card-style workbook for that translation step.

Key Takeaways

Before handing recurring work to an agent, first decompose the work into harness surfaces.
Good early candidates are tasks that repeat often, have relatively stable inputs and outputs, and can fail visibly.
The workbook is easiest to fill in this order: task definition -> input/output -> instruction surface -> tool surface -> verification surface -> handoff surface -> approval boundaries.
The first goal is not broader automation. It is narrower failure radius.
This appendix is not about writing clever prompts. It is about turning work into an operable harness.

1. Why a workbook helps

docs/blog_series_하네스엔지니어링_총괄_design.md describes E3 as a practical workbook made of application cards. That framing matters because reading about harness engineering and actually designing one are very different activities.

In practice, teams often fail in a familiar sequence:

the task is scoped too broadly
instructions and tools are mixed together
execution starts before verification is defined
long-running work has no resumption path
approval boundaries are added too late

This workbook exists to reduce those failures.

2. First question: is this task even a good harness candidate

Not every task should be an early agent candidate.

Question	If yes, candidate quality improves
does it repeat in a recognizable shape	weekly research summaries, structured drafting, classification work
are inputs relatively stable	a known folder, document class, link set, or form
can output criteria be made explicit	checklist, table, frontmatter, summary format
can failure be detected cheaply	missing-field checks, path checks, format checks
if long-running, can it resume from handoff	the next session can continue cleanly

3. Worksheet 1: task-definition card

The first card clarifies what the job actually is.

Field	Prompt to fill in
task name	what is the one-line name for this job
recurrence	daily, weekly, or event-driven
start condition	what input or trigger starts it
completion condition	what output means the work is done
failure cost	what breaks if this goes wrong

4. Worksheet 2: input-output card

This stage stabilizes the materials and the finish line.

Field	Prompt to fill in
required inputs	what must exist before work starts
optional inputs	what helps but is not required
forbidden inputs	what should not be trusted
output format	paragraph, table, JSON, frontmatter, or other contract
output check	how do you know the output is complete

The goal is not bigger context. It is clearer input trust and output shape.

5. Worksheet 3: instruction-surface card

Now translate the task into the agent's rule surface.

Field	Prompt to fill in
role	editor, researcher, classifier, planner, or something else
required actions	what must be done every time
forbidden actions	what must not happen
prerequisite reading	what files or docs must be read first
output rules	language, structure, length, or style constraints

The point is not longer prompting. It is a sharper boundary.

6. Worksheet 4: tool-surface card

Tools should be narrower than the task, not broader.

Field	Prompt to fill in
read tools needed	what files, searches, or web checks are necessary
write tools needed	which files may be edited
risky tools	what actions create meaningful operational risk
blocked tools	what should not be used at all for this task
approval triggers	which actions require a person before continuing

This card often reveals that the hardest problem is not capability. It is boundary discipline.

7. Worksheet 5: verification-surface card

Good harnesses are often defined more clearly by validation than by generation.

Field	Prompt to fill in
cheap checks	what can be checked deterministically first
meaning checks	what still needs human or secondary review
regression points	what failure should be remembered next time
stop conditions	when should the task halt and escalate

8. Worksheet 6: handoff and memory card

This card matters as soon as work is long-running or multi-session.

Field	Prompt to fill in
handoff artifact	what should the next session read first
long-term memory candidates	what rule or pattern should survive this task
storage location	`tasks/`, `docs/`, or another controlled layer
promotion rule	what qualifies to become reusable knowledge

This is where the E1 distinction becomes operational:

handoff is for resumption
memory is for reuse

9. Worksheet 7: approval-boundary card

An appendix workbook is incomplete without action boundaries.

Action	Auto	Confirm	Block
local draft creation	✓
broad rewrite of existing content		✓
external publishing		✓	default block
credential editing			✓

10. Minimal one-page template

If time is short, fill only these seven lines.

What is the task?
What inputs are required?
What does finished output look like?
What tools are actually needed?
What is the cheapest useful validation?
What should the next session read?
Which action still requires human approval?

11. Example: turning draft creation into a harness

Card	Example answer
task definition	create appendix-series drafts
inputs	design doc, source note, prior appendix voice
output	two KR/EN drafts with frontmatter
tools	file reads and edits on target drafts only
verification	changed-file scope, title/label/nav checks
handoff	next-entry linkage and remaining fact-check points
approval boundary	no external publish, no config edits

12. Common failure modes

Scoping the task too broadly

"Automate blog operations" is too large for an early harness.

Mixing inputs with evidence

If reference inputs and factual evidence are blended, the E2 source boundary collapses.

Skipping handoff

Long-running work usually needs resumability before it needs richer memory.

Delaying approval boundaries

If risky actions are not bounded early, the harness becomes too wide from the start.

13. What this appendix adds

E1 organized terms. E2 organized evidence. E3 turns both into a working design board.

E1: what does this term mean
E2: what kind of source supports this claim
E3: how should this actual task be shaped as a harness

References

AGENTS.md
docs/blog_series_하네스엔지니어링_총괄_design.md
docs/memory-map.md
sources/260518_하네스엔지니어링_15장_블로그활용노트.md
drafts/blog/260519_하네스부록E01_용어집과치트시트_블로그.md
drafts/blog/260519_하네스부록E02_출처지도와검증법_블로그.md
drafts/blog/260519_하네스시리즈C02_장시간에이전트운영_블로그.md

This is Appendix Companion E3. Next: when delegation is enough, when a multi-agent team is justified, and how to separate subagents from agent teams without overbuilding.

Series overview: Harness Engineering Series Guide

이 블로그 검색

MaJu Tech Notes