"Harness Appendix E3 — Harness Workbook for Real Work: Fill-In Design Cards for Repetitive Tasks"
Teams often start with the question, "Can AI automate this task?" From a harness perspective, that is too early. A better question is: if we split this work into instructions, context, tools, verification, and handoff, what can safely be delegated to an agent and what should still stay with a person? This appendix is a card-style workbook for that translation step.
Key Takeaways
- Before handing recurring work to an agent, first decompose the work into harness surfaces.
- Good early candidates are tasks that repeat often, have relatively stable inputs and outputs, and can fail visibly.
- The workbook is easiest to fill in this order:
task definition -> input/output -> instruction surface -> tool surface -> verification surface -> handoff surface -> approval boundaries. - The first goal is not broader automation. It is narrower failure radius.
- This appendix is not about writing clever prompts. It is about turning work into an operable harness.
1. Why a workbook helps
docs/blog_series_ํ๋ค์ค์์ง๋์ด๋ง_์ด๊ด_design.md describes E3 as a practical workbook made of application cards. That framing matters because reading about harness engineering and actually designing one are very different activities.
In practice, teams often fail in a familiar sequence:
- the task is scoped too broadly
- instructions and tools are mixed together
- execution starts before verification is defined
- long-running work has no resumption path
- approval boundaries are added too late
This workbook exists to reduce those failures.
2. First question: is this task even a good harness candidate
Not every task should be an early agent candidate.
| Question | If yes, candidate quality improves |
|---|---|
| does it repeat in a recognizable shape | weekly research summaries, structured drafting, classification work |
| are inputs relatively stable | a known folder, document class, link set, or form |
| can output criteria be made explicit | checklist, table, frontmatter, summary format |
| can failure be detected cheaply | missing-field checks, path checks, format checks |
| if long-running, can it resume from handoff | the next session can continue cleanly |
3. Worksheet 1: task-definition card
The first card clarifies what the job actually is.
| Field | Prompt to fill in |
|---|---|
| task name | what is the one-line name for this job |
| recurrence | daily, weekly, or event-driven |
| start condition | what input or trigger starts it |
| completion condition | what output means the work is done |
| failure cost | what breaks if this goes wrong |
4. Worksheet 2: input-output card
This stage stabilizes the materials and the finish line.
| Field | Prompt to fill in |
|---|---|
| required inputs | what must exist before work starts |
| optional inputs | what helps but is not required |
| forbidden inputs | what should not be trusted |
| output format | paragraph, table, JSON, frontmatter, or other contract |
| output check | how do you know the output is complete |
The goal is not bigger context. It is clearer input trust and output shape.
5. Worksheet 3: instruction-surface card
Now translate the task into the agent's rule surface.
| Field | Prompt to fill in |
|---|---|
| role | editor, researcher, classifier, planner, or something else |
| required actions | what must be done every time |
| forbidden actions | what must not happen |
| prerequisite reading | what files or docs must be read first |
| output rules | language, structure, length, or style constraints |
The point is not longer prompting. It is a sharper boundary.
6. Worksheet 4: tool-surface card
Tools should be narrower than the task, not broader.
| Field | Prompt to fill in |
|---|---|
| read tools needed | what files, searches, or web checks are necessary |
| write tools needed | which files may be edited |
| risky tools | what actions create meaningful operational risk |
| blocked tools | what should not be used at all for this task |
| approval triggers | which actions require a person before continuing |
This card often reveals that the hardest problem is not capability. It is boundary discipline.
7. Worksheet 5: verification-surface card
Good harnesses are often defined more clearly by validation than by generation.
| Field | Prompt to fill in |
|---|---|
| cheap checks | what can be checked deterministically first |
| meaning checks | what still needs human or secondary review |
| regression points | what failure should be remembered next time |
| stop conditions | when should the task halt and escalate |
8. Worksheet 6: handoff and memory card
This card matters as soon as work is long-running or multi-session.
| Field | Prompt to fill in |
|---|---|
| handoff artifact | what should the next session read first |
| long-term memory candidates | what rule or pattern should survive this task |
| storage location | tasks/, docs/, or another controlled layer |
| promotion rule | what qualifies to become reusable knowledge |
This is where the E1 distinction becomes operational:
- handoff is for resumption
- memory is for reuse
9. Worksheet 7: approval-boundary card
An appendix workbook is incomplete without action boundaries.
| Action | Auto | Confirm | Block |
|---|---|---|---|
| local draft creation | ✓ | ||
| broad rewrite of existing content | ✓ | ||
| external publishing | ✓ | default block | |
| credential editing | ✓ |
10. Minimal one-page template
If time is short, fill only these seven lines.
- What is the task?
- What inputs are required?
- What does finished output look like?
- What tools are actually needed?
- What is the cheapest useful validation?
- What should the next session read?
- Which action still requires human approval?
11. Example: turning draft creation into a harness
| Card | Example answer |
|---|---|
| task definition | create appendix-series drafts |
| inputs | design doc, source note, prior appendix voice |
| output | two KR/EN drafts with frontmatter |
| tools | file reads and edits on target drafts only |
| verification | changed-file scope, title/label/nav checks |
| handoff | next-entry linkage and remaining fact-check points |
| approval boundary | no external publish, no config edits |
12. Common failure modes
Scoping the task too broadly
"Automate blog operations" is too large for an early harness.
Mixing inputs with evidence
If reference inputs and factual evidence are blended, the E2 source boundary collapses.
Skipping handoff
Long-running work usually needs resumability before it needs richer memory.
Delaying approval boundaries
If risky actions are not bounded early, the harness becomes too wide from the start.
13. What this appendix adds
E1 organized terms. E2 organized evidence. E3 turns both into a working design board.
- E1: what does this term mean
- E2: what kind of source supports this claim
- E3: how should this actual task be shaped as a harness
References
AGENTS.mddocs/blog_series_ํ๋ค์ค์์ง๋์ด๋ง_์ด๊ด_design.mddocs/memory-map.mdsources/260518_ํ๋ค์ค์์ง๋์ด๋ง_15์ฅ_๋ธ๋ก๊ทธํ์ฉ๋ ธํธ.mddrafts/blog/260519_ํ๋ค์ค๋ถ๋กE01_์ฉ์ด์ง๊ณผ์นํธ์ํธ_๋ธ๋ก๊ทธ.mddrafts/blog/260519_ํ๋ค์ค๋ถ๋กE02_์ถ์ฒ์ง๋์๊ฒ์ฆ๋ฒ_๋ธ๋ก๊ทธ.mddrafts/blog/260519_ํ๋ค์ค์๋ฆฌ์ฆC02_์ฅ์๊ฐ์์ด์ ํธ์ด์_๋ธ๋ก๊ทธ.md
This is Appendix Companion E3. Next: when delegation is enough, when a multi-agent team is justified, and how to separate subagents from agent teams without overbuilding.
Series overview: Harness Engineering Series Guide
๋๊ธ
๋๊ธ ์ฐ๊ธฐ