"Harness Appendix E1 — Glossary and Cheat Sheet: From AGENTS.md to Handoff"

Harness engineering becomes harder to follow when familiar-sounding terms blur together. What is the practical difference between AGENTS.md and CLAUDE.md? When does handoff differ from memory? How is MCP related to tools without being the same thing? This appendix is a compact companion reference for reading the rest of the series in operational language.


Key Takeaways

  • The point of a glossary is not memorization. It is boundary clarity.
  • Similar terms become easier to manage when you ask three questions: who reads it, when does it apply, and what does it control.
  • A good harness is easier to reason about when it is broken into instruction surface, tool surface, verification surface, persistence surface, and operating surface.
  • One distinction matters especially often: handoff is mainly for resuming work, while memory is mainly for reusing knowledge.
  • The cheat sheets below are intended as a practical companion for the A through E series entries.

1. The big map first

Most recurring terms in harness engineering fit into a small number of layers.

Layer Main question Typical examples
instruction surface what rules does the agent start with AGENTS.md, CLAUDE.md, system instructions
context surface what is loaded now and what is omitted user request, reference docs, tool results
tool surface what can the agent actually do shell, file read/write, web, MCP-connected tools
verification surface what catches mistakes first tests, hooks, evals, reviews
persistence surface what survives the session handoff notes, session notes, docs, long-term memory
operating surface how risk and cost are bounded permissions, sandboxing, approvals, audit

2. Fifteen core terms

1. Harness

  • Definition: the full work environment and operating structure around the model
  • Distinction: it includes instructions, tools, validation, memory, and permissions, not just prompting

2. Agent

  • Definition: an execution actor that uses a model, instructions, and tools in a loop
  • Distinction: broader than chat because it includes tool calls and state transitions

3. AGENTS.md

  • Definition: a repository-level instruction file that defines shared rules and boundaries
  • Distinction: strongest for workspace-wide constraints and ownership rules

4. CLAUDE.md

  • Definition: a runtime instruction file for Claude-oriented working behavior
  • Distinction: strongest for task execution style and operational guidance inside the session

5. Skill

  • Definition: a reusable procedure separated into its own instruction unit
  • Distinction: used to keep the main instruction surface short while preserving repeatable workflows

6. Tool

  • Definition: a concrete action surface the agent can call
  • Distinction: file access, shell execution, search, and external API actions all live here

7. MCP

  • Definition: a protocol layer for connecting external tools and data systems
  • Distinction: MCP is the connection standard; a tool is the actual callable action

8. Context window

  • Definition: the actual input budget the model sees in one run
  • Distinction: different from what exists in storage; this is about what is loaded now

9. Handoff

  • Definition: a structured transfer artifact for the next session or next worker
  • Distinction: optimized for work resumption rather than broad knowledge reuse

10. Memory

  • Definition: reusable state or knowledge preserved outside the immediate session
  • Distinction: broader than handoff; may include durable preferences, patterns, and rules

11. Eval

  • Definition: a repeatable quality-checking structure for outputs or behaviors
  • Distinction: may be exact-match, rubric-based, or regression-based

12. Hook

  • Definition: policy or validation logic forced at specific lifecycle moments
  • Distinction: not advice, but enforcement

13. Sandbox

  • Definition: an isolated execution environment that limits action radius
  • Distinction: both a safety control and an operational containment device

14. Approval policy

  • Definition: the rule set that separates what is auto-allowed, confirmed, or denied
  • Distinction: reading, writing, execution, and external transmission should not share the same risk level

15. Memory ownership

  • Definition: control over where long-term memory lives, how it moves, who can inspect it, and how it can be deleted
  • Distinction: not a convenience feature but an operational control question

3. Pairs that get confused most often

Pair Useful distinction
AGENTS.md vs CLAUDE.md shared repository boundary vs runtime working behavior
prompt vs context instruction text vs the full material actually shown to the model
tool vs MCP action unit vs connection standard
handoff vs memory resumption artifact vs reusable knowledge layer
eval vs review repeatable evaluation structure vs interpretive human or secondary-agent judgment
hook vs guideline enforcement vs recommendation
sandbox vs permission execution isolation vs allow/deny policy boundary

4. Where should this rule live

What you need to store Default location Why
workspace-wide hard boundary AGENTS.md everyone should inherit it
a repeatable tool procedure skill or reference doc keeps the top-level instruction surface compact
the next step for this task handoff or session note the next session must resume quickly
durable reusable rule docs/, tasks/lessons.md, or a memory layer it should be read again later
a must-not-break check hook, validator, or test advice is weaker than enforcement

5. Failure diagnosis cheat sheet

Symptom Suspect this layer first First question
output shape keeps drifting instruction surface is the output contract fixed clearly
the wrong tool keeps being chosen tool surface are tool names and descriptions actually separable
long tasks lose direction persistence surface is there a resumable handoff artifact
risky files or external actions get touched operating surface are approvals and permission boundaries separated
the answer sounds good but is often wrong verification surface is there a cheap deterministic check before judgment
the same mistake repeats memory or evaluation surface were failures promoted into lessons or regression checks

6. One-sentence summaries for the whole series

Series One line to remember
A Basics the model matters less than the workbench around it
B Implementation OpenAI and Claude differ more in operating philosophy than in surface features
C Operations evaluation, handoff, guardrails, and memory determine reliability
D Strategy good harnesses grow through reusable patterns and explicit design decisions
E Appendix companion assets like glossaries and source maps accelerate real application

7. Repository mapping

Concept Repository example
instruction surface AGENTS.md
memory index docs/memory-map.md
long-running resumption tasks/handoffs/, tasks/sessions/
reusable operating rules tasks/lessons.md
bounded publishing rule explicit user confirmation before external posting

8. Common mistakes

Treating vocabulary as understanding

The real question is which failure each term is trying to reduce.

Assuming memory replaces handoff

The two overlap, but long-running work usually needs handoff first.

Assuming MCP automatically improves capability

A larger connection surface can also make tool choice worse if the surface gets ambiguous.

Assuming a written rule is automatically enforced

Anything that must not break should move into hooks, validators, or permissions.

9. If you remember only five lines

  1. AGENTS.md is for shared boundaries; CLAUDE.md is for runtime behavior.
  2. A tool is an action unit; MCP is a connection protocol.
  3. Handoff is for resumption; memory is for reuse.
  4. A hook enforces; a guideline recommends.
  5. Memory ownership is an operating-control question, not just a feature question.

References

  • AGENTS.md
  • docs/blog_series_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_์ด๊ด„_design.md
  • docs/memory-map.md
  • sources/260518_ํ•˜๋„ค์Šค์—”์ง€๋‹ˆ์–ด๋ง_15์žฅ_๋ธ”๋กœ๊ทธํ™œ์šฉ๋…ธํŠธ.md
  • drafts/blog/260519_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆA03_์ปจํ…์ŠคํŠธ์„ค๊ณ„์™€์ง€์‹œํŒŒ์ผ_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260519_Claudeํ•˜๋„ค์ŠคB02_CLAUDEmd_Skills_Hooks_Permissions_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260519_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆC02_์žฅ์‹œ๊ฐ„์—์ด์ „ํŠธ์šด์˜_๋ธ”๋กœ๊ทธ.md
  • drafts/blog/260519_ํ•˜๋„ค์Šค์‹œ๋ฆฌ์ฆˆC04_๋ฉ”๋ชจ๋ฆฌ์†Œ์œ ๊ถŒ_๋ธ”๋กœ๊ทธ.md

This is Appendix Companion E1. Next: a source map and fact-checking method for deciding what counts as a primary source, what belongs in working notes, and how to verify claims before publication.

Series overview: Harness Engineering Series Guide

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System