"Harness Appendix E1 — Glossary and Cheat Sheet: From AGENTS.md to Handoff"

5월 18, 2026

Harness engineering becomes harder to follow when familiar-sounding terms blur together. What is the practical difference between AGENTS.md and CLAUDE.md? When does handoff differ from memory? How is MCP related to tools without being the same thing? This appendix is a compact companion reference for reading the rest of the series in operational language.

Key Takeaways

The point of a glossary is not memorization. It is boundary clarity.
Similar terms become easier to manage when you ask three questions: who reads it, when does it apply, and what does it control.
A good harness is easier to reason about when it is broken into instruction surface, tool surface, verification surface, persistence surface, and operating surface.
One distinction matters especially often: handoff is mainly for resuming work, while memory is mainly for reusing knowledge.
The cheat sheets below are intended as a practical companion for the A through E series entries.

1. The big map first

Most recurring terms in harness engineering fit into a small number of layers.

Layer	Main question	Typical examples
instruction surface	what rules does the agent start with	`AGENTS.md`, `CLAUDE.md`, system instructions
context surface	what is loaded now and what is omitted	user request, reference docs, tool results
tool surface	what can the agent actually do	shell, file read/write, web, MCP-connected tools
verification surface	what catches mistakes first	tests, hooks, evals, reviews
persistence surface	what survives the session	handoff notes, session notes, docs, long-term memory
operating surface	how risk and cost are bounded	permissions, sandboxing, approvals, audit

2. Fifteen core terms

1. Harness

Definition: the full work environment and operating structure around the model
Distinction: it includes instructions, tools, validation, memory, and permissions, not just prompting

2. Agent

Definition: an execution actor that uses a model, instructions, and tools in a loop
Distinction: broader than chat because it includes tool calls and state transitions

3. AGENTS.md

Definition: a repository-level instruction file that defines shared rules and boundaries
Distinction: strongest for workspace-wide constraints and ownership rules

4. CLAUDE.md

Definition: a runtime instruction file for Claude-oriented working behavior
Distinction: strongest for task execution style and operational guidance inside the session

5. Skill

Definition: a reusable procedure separated into its own instruction unit
Distinction: used to keep the main instruction surface short while preserving repeatable workflows

6. Tool

Definition: a concrete action surface the agent can call
Distinction: file access, shell execution, search, and external API actions all live here

7. MCP

Definition: a protocol layer for connecting external tools and data systems
Distinction: MCP is the connection standard; a tool is the actual callable action

8. Context window

Definition: the actual input budget the model sees in one run
Distinction: different from what exists in storage; this is about what is loaded now

9. Handoff

Definition: a structured transfer artifact for the next session or next worker
Distinction: optimized for work resumption rather than broad knowledge reuse

10. Memory

Definition: reusable state or knowledge preserved outside the immediate session
Distinction: broader than handoff; may include durable preferences, patterns, and rules

11. Eval

Definition: a repeatable quality-checking structure for outputs or behaviors
Distinction: may be exact-match, rubric-based, or regression-based

12. Hook

Definition: policy or validation logic forced at specific lifecycle moments
Distinction: not advice, but enforcement

13. Sandbox

Definition: an isolated execution environment that limits action radius
Distinction: both a safety control and an operational containment device

14. Approval policy

Definition: the rule set that separates what is auto-allowed, confirmed, or denied
Distinction: reading, writing, execution, and external transmission should not share the same risk level

15. Memory ownership

Definition: control over where long-term memory lives, how it moves, who can inspect it, and how it can be deleted
Distinction: not a convenience feature but an operational control question

3. Pairs that get confused most often

Pair	Useful distinction
`AGENTS.md` vs `CLAUDE.md`	shared repository boundary vs runtime working behavior
prompt vs context	instruction text vs the full material actually shown to the model
tool vs MCP	action unit vs connection standard
handoff vs memory	resumption artifact vs reusable knowledge layer
eval vs review	repeatable evaluation structure vs interpretive human or secondary-agent judgment
hook vs guideline	enforcement vs recommendation
sandbox vs permission	execution isolation vs allow/deny policy boundary

4. Where should this rule live

What you need to store	Default location	Why
workspace-wide hard boundary	`AGENTS.md`	everyone should inherit it
a repeatable tool procedure	skill or reference doc	keeps the top-level instruction surface compact
the next step for this task	handoff or session note	the next session must resume quickly
durable reusable rule	`docs/`, `tasks/lessons.md`, or a memory layer	it should be read again later
a must-not-break check	hook, validator, or test	advice is weaker than enforcement

5. Failure diagnosis cheat sheet

Symptom	Suspect this layer first	First question
output shape keeps drifting	instruction surface	is the output contract fixed clearly
the wrong tool keeps being chosen	tool surface	are tool names and descriptions actually separable
long tasks lose direction	persistence surface	is there a resumable handoff artifact
risky files or external actions get touched	operating surface	are approvals and permission boundaries separated
the answer sounds good but is often wrong	verification surface	is there a cheap deterministic check before judgment
the same mistake repeats	memory or evaluation surface	were failures promoted into lessons or regression checks

6. One-sentence summaries for the whole series

Series	One line to remember
A Basics	the model matters less than the workbench around it
B Implementation	OpenAI and Claude differ more in operating philosophy than in surface features
C Operations	evaluation, handoff, guardrails, and memory determine reliability
D Strategy	good harnesses grow through reusable patterns and explicit design decisions
E Appendix	companion assets like glossaries and source maps accelerate real application

7. Repository mapping

Concept	Repository example
instruction surface	`AGENTS.md`
memory index	`docs/memory-map.md`
long-running resumption	`tasks/handoffs/`, `tasks/sessions/`
reusable operating rules	`tasks/lessons.md`
bounded publishing rule	explicit user confirmation before external posting

8. Common mistakes

Treating vocabulary as understanding

The real question is which failure each term is trying to reduce.

Assuming memory replaces handoff

The two overlap, but long-running work usually needs handoff first.

Assuming MCP automatically improves capability

A larger connection surface can also make tool choice worse if the surface gets ambiguous.

Assuming a written rule is automatically enforced

Anything that must not break should move into hooks, validators, or permissions.

9. If you remember only five lines

AGENTS.md is for shared boundaries; CLAUDE.md is for runtime behavior.
A tool is an action unit; MCP is a connection protocol.
Handoff is for resumption; memory is for reuse.
A hook enforces; a guideline recommends.
Memory ownership is an operating-control question, not just a feature question.

References

AGENTS.md
docs/blog_series_하네스엔지니어링_총괄_design.md
docs/memory-map.md
sources/260518_하네스엔지니어링_15장_블로그활용노트.md
drafts/blog/260519_하네스시리즈A03_컨텍스트설계와지시파일_블로그.md
drafts/blog/260519_Claude하네스B02_CLAUDEmd_Skills_Hooks_Permissions_블로그.md
drafts/blog/260519_하네스시리즈C02_장시간에이전트운영_블로그.md
drafts/blog/260519_하네스시리즈C04_메모리소유권_블로그.md

This is Appendix Companion E1. Next: a source map and fact-checking method for deciding what counts as a primary source, what belongs in working notes, and how to verify claims before publication.

Series overview: Harness Engineering Series Guide