"Harness Appendix E1 — Glossary and Cheat Sheet: From AGENTS.md to Handoff"
Harness engineering becomes harder to follow when familiar-sounding terms blur together. What is the practical difference between
AGENTS.mdandCLAUDE.md? When does handoff differ from memory? How is MCP related to tools without being the same thing? This appendix is a compact companion reference for reading the rest of the series in operational language.
Key Takeaways
- The point of a glossary is not memorization. It is boundary clarity.
- Similar terms become easier to manage when you ask three questions: who reads it, when does it apply, and what does it control.
- A good harness is easier to reason about when it is broken into instruction surface, tool surface, verification surface, persistence surface, and operating surface.
- One distinction matters especially often: handoff is mainly for resuming work, while memory is mainly for reusing knowledge.
- The cheat sheets below are intended as a practical companion for the A through E series entries.
1. The big map first
Most recurring terms in harness engineering fit into a small number of layers.
| Layer | Main question | Typical examples |
|---|---|---|
| instruction surface | what rules does the agent start with | AGENTS.md, CLAUDE.md, system instructions |
| context surface | what is loaded now and what is omitted | user request, reference docs, tool results |
| tool surface | what can the agent actually do | shell, file read/write, web, MCP-connected tools |
| verification surface | what catches mistakes first | tests, hooks, evals, reviews |
| persistence surface | what survives the session | handoff notes, session notes, docs, long-term memory |
| operating surface | how risk and cost are bounded | permissions, sandboxing, approvals, audit |
2. Fifteen core terms
1. Harness
- Definition: the full work environment and operating structure around the model
- Distinction: it includes instructions, tools, validation, memory, and permissions, not just prompting
2. Agent
- Definition: an execution actor that uses a model, instructions, and tools in a loop
- Distinction: broader than chat because it includes tool calls and state transitions
3. AGENTS.md
- Definition: a repository-level instruction file that defines shared rules and boundaries
- Distinction: strongest for workspace-wide constraints and ownership rules
4. CLAUDE.md
- Definition: a runtime instruction file for Claude-oriented working behavior
- Distinction: strongest for task execution style and operational guidance inside the session
5. Skill
- Definition: a reusable procedure separated into its own instruction unit
- Distinction: used to keep the main instruction surface short while preserving repeatable workflows
6. Tool
- Definition: a concrete action surface the agent can call
- Distinction: file access, shell execution, search, and external API actions all live here
7. MCP
- Definition: a protocol layer for connecting external tools and data systems
- Distinction: MCP is the connection standard; a tool is the actual callable action
8. Context window
- Definition: the actual input budget the model sees in one run
- Distinction: different from what exists in storage; this is about what is loaded now
9. Handoff
- Definition: a structured transfer artifact for the next session or next worker
- Distinction: optimized for work resumption rather than broad knowledge reuse
10. Memory
- Definition: reusable state or knowledge preserved outside the immediate session
- Distinction: broader than handoff; may include durable preferences, patterns, and rules
11. Eval
- Definition: a repeatable quality-checking structure for outputs or behaviors
- Distinction: may be exact-match, rubric-based, or regression-based
12. Hook
- Definition: policy or validation logic forced at specific lifecycle moments
- Distinction: not advice, but enforcement
13. Sandbox
- Definition: an isolated execution environment that limits action radius
- Distinction: both a safety control and an operational containment device
14. Approval policy
- Definition: the rule set that separates what is auto-allowed, confirmed, or denied
- Distinction: reading, writing, execution, and external transmission should not share the same risk level
15. Memory ownership
- Definition: control over where long-term memory lives, how it moves, who can inspect it, and how it can be deleted
- Distinction: not a convenience feature but an operational control question
3. Pairs that get confused most often
| Pair | Useful distinction |
|---|---|
AGENTS.md vs CLAUDE.md |
shared repository boundary vs runtime working behavior |
| prompt vs context | instruction text vs the full material actually shown to the model |
| tool vs MCP | action unit vs connection standard |
| handoff vs memory | resumption artifact vs reusable knowledge layer |
| eval vs review | repeatable evaluation structure vs interpretive human or secondary-agent judgment |
| hook vs guideline | enforcement vs recommendation |
| sandbox vs permission | execution isolation vs allow/deny policy boundary |
4. Where should this rule live
| What you need to store | Default location | Why |
|---|---|---|
| workspace-wide hard boundary | AGENTS.md |
everyone should inherit it |
| a repeatable tool procedure | skill or reference doc | keeps the top-level instruction surface compact |
| the next step for this task | handoff or session note | the next session must resume quickly |
| durable reusable rule | docs/, tasks/lessons.md, or a memory layer |
it should be read again later |
| a must-not-break check | hook, validator, or test | advice is weaker than enforcement |
5. Failure diagnosis cheat sheet
| Symptom | Suspect this layer first | First question |
|---|---|---|
| output shape keeps drifting | instruction surface | is the output contract fixed clearly |
| the wrong tool keeps being chosen | tool surface | are tool names and descriptions actually separable |
| long tasks lose direction | persistence surface | is there a resumable handoff artifact |
| risky files or external actions get touched | operating surface | are approvals and permission boundaries separated |
| the answer sounds good but is often wrong | verification surface | is there a cheap deterministic check before judgment |
| the same mistake repeats | memory or evaluation surface | were failures promoted into lessons or regression checks |
6. One-sentence summaries for the whole series
| Series | One line to remember |
|---|---|
| A Basics | the model matters less than the workbench around it |
| B Implementation | OpenAI and Claude differ more in operating philosophy than in surface features |
| C Operations | evaluation, handoff, guardrails, and memory determine reliability |
| D Strategy | good harnesses grow through reusable patterns and explicit design decisions |
| E Appendix | companion assets like glossaries and source maps accelerate real application |
7. Repository mapping
| Concept | Repository example |
|---|---|
| instruction surface | AGENTS.md |
| memory index | docs/memory-map.md |
| long-running resumption | tasks/handoffs/, tasks/sessions/ |
| reusable operating rules | tasks/lessons.md |
| bounded publishing rule | explicit user confirmation before external posting |
8. Common mistakes
Treating vocabulary as understanding
The real question is which failure each term is trying to reduce.
Assuming memory replaces handoff
The two overlap, but long-running work usually needs handoff first.
Assuming MCP automatically improves capability
A larger connection surface can also make tool choice worse if the surface gets ambiguous.
Assuming a written rule is automatically enforced
Anything that must not break should move into hooks, validators, or permissions.
9. If you remember only five lines
AGENTS.mdis for shared boundaries;CLAUDE.mdis for runtime behavior.- A tool is an action unit; MCP is a connection protocol.
- Handoff is for resumption; memory is for reuse.
- A hook enforces; a guideline recommends.
- Memory ownership is an operating-control question, not just a feature question.
References
AGENTS.mddocs/blog_series_ํ๋ค์ค์์ง๋์ด๋ง_์ด๊ด_design.mddocs/memory-map.mdsources/260518_ํ๋ค์ค์์ง๋์ด๋ง_15์ฅ_๋ธ๋ก๊ทธํ์ฉ๋ ธํธ.mddrafts/blog/260519_ํ๋ค์ค์๋ฆฌ์ฆA03_์ปจํ ์คํธ์ค๊ณ์์ง์ํ์ผ_๋ธ๋ก๊ทธ.mddrafts/blog/260519_Claudeํ๋ค์คB02_CLAUDEmd_Skills_Hooks_Permissions_๋ธ๋ก๊ทธ.mddrafts/blog/260519_ํ๋ค์ค์๋ฆฌ์ฆC02_์ฅ์๊ฐ์์ด์ ํธ์ด์_๋ธ๋ก๊ทธ.mddrafts/blog/260519_ํ๋ค์ค์๋ฆฌ์ฆC04_๋ฉ๋ชจ๋ฆฌ์์ ๊ถ_๋ธ๋ก๊ทธ.md
This is Appendix Companion E1. Next: a source map and fact-checking method for deciding what counts as a primary source, what belongs in working notes, and how to verify claims before publication.
Series overview: Harness Engineering Series Guide
๋๊ธ
๋๊ธ ์ฐ๊ธฐ