"Harness Appendix E3 — Harness Workbook for Real Work: Fill-In Design Cards for Repetitive Tasks"

Teams often start with the question, "Can AI automate this task?" From a harness perspective, that is too early. A better question is: if we split this work into instructions, context, tools, verification, and handoff, what can safely be delegated to an agent and what should still stay with a person? This appendix is a card-style workbook for that translation step.


Key Takeaways

  • Before handing recurring work to an agent, first decompose the work into harness surfaces.
  • Good early candidates are tasks that repeat often, have relatively stable inputs and outputs, and can fail visibly.
  • The workbook is easiest to fill in this order: task definition -> input/output -> instruction surface -> tool surface -> verification surface -> handoff surface -> approval boundaries.
  • The first goal is not broader automation. It is narrower failure radius.
  • This appendix is not about writing clever prompts. It is about turning work into an operable harness.

1. Why a workbook helps

docs/blog_series_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_์ด๊ด„_design.md describes E3 as a practical workbook made of application cards. That framing matters because reading about harness engineering and actually designing one are very different activities.

In practice, teams often fail in a familiar sequence:

  • the task is scoped too broadly
  • instructions and tools are mixed together
  • execution starts before verification is defined
  • long-running work has no resumption path
  • approval boundaries are added too late

This workbook exists to reduce those failures.

2. First question: is this task even a good harness candidate

Not every task should be an early agent candidate.

Question If yes, candidate quality improves
does it repeat in a recognizable shape weekly research summaries, structured drafting, classification work
are inputs relatively stable a known folder, document class, link set, or form
can output criteria be made explicit checklist, table, frontmatter, summary format
can failure be detected cheaply missing-field checks, path checks, format checks
if long-running, can it resume from handoff the next session can continue cleanly

3. Worksheet 1: task-definition card

The first card clarifies what the job actually is.

Field Prompt to fill in
task name what is the one-line name for this job
recurrence daily, weekly, or event-driven
start condition what input or trigger starts it
completion condition what output means the work is done
failure cost what breaks if this goes wrong

4. Worksheet 2: input-output card

This stage stabilizes the materials and the finish line.

Field Prompt to fill in
required inputs what must exist before work starts
optional inputs what helps but is not required
forbidden inputs what should not be trusted
output format paragraph, table, JSON, frontmatter, or other contract
output check how do you know the output is complete

The goal is not bigger context. It is clearer input trust and output shape.

5. Worksheet 3: instruction-surface card

Now translate the task into the agent's rule surface.

Field Prompt to fill in
role editor, researcher, classifier, planner, or something else
required actions what must be done every time
forbidden actions what must not happen
prerequisite reading what files or docs must be read first
output rules language, structure, length, or style constraints

The point is not longer prompting. It is a sharper boundary.

6. Worksheet 4: tool-surface card

Tools should be narrower than the task, not broader.

Field Prompt to fill in
read tools needed what files, searches, or web checks are necessary
write tools needed which files may be edited
risky tools what actions create meaningful operational risk
blocked tools what should not be used at all for this task
approval triggers which actions require a person before continuing

This card often reveals that the hardest problem is not capability. It is boundary discipline.

7. Worksheet 5: verification-surface card

Good harnesses are often defined more clearly by validation than by generation.

Field Prompt to fill in
cheap checks what can be checked deterministically first
meaning checks what still needs human or secondary review
regression points what failure should be remembered next time
stop conditions when should the task halt and escalate

8. Worksheet 6: handoff and memory card

This card matters as soon as work is long-running or multi-session.

Field Prompt to fill in
handoff artifact what should the next session read first
long-term memory candidates what rule or pattern should survive this task
storage location tasks/, docs/, or another controlled layer
promotion rule what qualifies to become reusable knowledge

This is where the E1 distinction becomes operational:

  • handoff is for resumption
  • memory is for reuse

9. Worksheet 7: approval-boundary card

An appendix workbook is incomplete without action boundaries.

Action Auto Confirm Block
local draft creation
broad rewrite of existing content
external publishing default block
credential editing

10. Minimal one-page template

If time is short, fill only these seven lines.

  1. What is the task?
  2. What inputs are required?
  3. What does finished output look like?
  4. What tools are actually needed?
  5. What is the cheapest useful validation?
  6. What should the next session read?
  7. Which action still requires human approval?

11. Example: turning draft creation into a harness

Card Example answer
task definition create appendix-series drafts
inputs design doc, source note, prior appendix voice
output two KR/EN drafts with frontmatter
tools file reads and edits on target drafts only
verification changed-file scope, title/label/nav checks
handoff next-entry linkage and remaining fact-check points
approval boundary no external publish, no config edits

12. Common failure modes

Scoping the task too broadly

"Automate blog operations" is too large for an early harness.

Mixing inputs with evidence

If reference inputs and factual evidence are blended, the E2 source boundary collapses.

Skipping handoff

Long-running work usually needs resumability before it needs richer memory.

Delaying approval boundaries

If risky actions are not bounded early, the harness becomes too wide from the start.

13. What this appendix adds

E1 organized terms. E2 organized evidence. E3 turns both into a working design board.

  • E1: what does this term mean
  • E2: what kind of source supports this claim
  • E3: how should this actual task be shaped as a harness

References

  • AGENTS.md
  • docs/blog_series_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_์ด๊ด„_design.md
  • docs/memory-map.md
  • sources/260518_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_15์žฅ_๋ธ”๋กœ๊ทธํ™œ์šฉ๋…ธํŠธ.md
  • drafts/blog/260519_ํ•˜๋„ค์Šค๋ถ€๋กE01_์šฉ์–ด์ง‘๊ณผ์น˜ํŠธ์‹œํŠธ_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260519_ํ•˜๋„ค์Šค๋ถ€๋กE02_์ถœ์ฒ˜์ง€๋„์™€๊ฒ€์ฆ๋ฒ•_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260519_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆC02_์žฅ์‹œ๊ฐ„์—์ด์ „ํŠธ์šด์˜_๋ธ”๋กœ๊ทธ.md

This is Appendix Companion E3. Next: when delegation is enough, when a multi-agent team is justified, and how to separate subagents from agent teams without overbuilding.

Series overview: Harness Engineering Series Guide

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System